What do people find to be the most useful metrics for monitoring their
cluster? Both for performance and long term planning.
Presumably requests and disk-space are high up the list.
Is there any useful ones that aren't covered by the metrics package?
~Tims
Compaction Queue size usually explains a lot. That along with load and disk
utilization are what I use the most. I am definitely interested in what
others use, especially metrics that indicate early for problems.
Thanks.
On Wed, Feb 9, 2011 at 1:42 PM, Tim Sell trs...@gmail.com wrote:
What do
See https://github.com/trendmicro/jgit-hbase
Use branch 'jgit.storage.hbase.v4'
Last night I loaded all of the following repositories into a small HBase
cluster running on my laptop (zk + master + 3 rs):
cascading
cascading.hbase
cascading.jruby
cascalog
flume
gremlins
Well done! Perhaps you can sell this to Google and they can finally
kill the svn googlecode feature!
Or maybe hit up github :-)
-ryan
On Wed, Feb 9, 2011 at 11:06 AM, Andrew Purtell apurt...@apache.org wrote:
See https://github.com/trendmicro/jgit-hbase
Use branch 'jgit.storage.hbase.v4'
Are endrows inclusive or exclusive? The docs say exclusive, but then the
question arises as to how to form the last split for getSplits(). The
code below runs fine, but I believe it is omitting some rows, perhaps
b/c of the exclusive end row. For the final split, should the endrow be
null? I tried
You shouldn't need to write your own getSplits() method to run a map
reduce, I never did at least...
-ryan
On Wed, Feb 9, 2011 at 11:36 PM, Geoff Hendrey ghend...@decarta.com wrote:
Are endrows inclusive or exclusive? The docs say exclusive, but then the
question arises as to how to form the
Oh, I definitely don't *need* my own to run mapreduce. However, if I want to
control the number of records handled by each mapper (splitsize) and the
startrow and endrow, then I thought I had to write my own getSplits(). Is there
another way to accomplish this, because I do need the combination
By default each map gets the contents of 1 region. A region is by
default a maximum of 256MB. There is no trivial way to generally
bisect a region in half, in terms of row count, by just knowing what
we known (start, end key).
For very large tables that have 100 regions, this algorithm works