Most useful metrics?

2011-02-09 Thread Tim Sell
What do people find to be the most useful metrics for monitoring their cluster? Both for performance and long term planning. Presumably requests and disk-space are high up the list. Is there any useful ones that aren't covered by the metrics package? ~Tims

Re: Most useful metrics?

2011-02-09 Thread Wayne
Compaction Queue size usually explains a lot. That along with load and disk utilization are what I use the most. I am definitely interested in what others use, especially metrics that indicate early for problems. Thanks. On Wed, Feb 9, 2011 at 1:42 PM, Tim Sell trs...@gmail.com wrote: What do

working jgit+hbase and reasonable test result

2011-02-09 Thread Andrew Purtell
See https://github.com/trendmicro/jgit-hbase Use branch 'jgit.storage.hbase.v4' Last night I loaded all of the following repositories into a small HBase cluster running on my laptop (zk + master + 3 rs): cascading cascading.hbase cascading.jruby cascalog flume gremlins

Re: working jgit+hbase and reasonable test result

2011-02-09 Thread Ryan Rawson
Well done! Perhaps you can sell this to Google and they can finally kill the svn googlecode feature! Or maybe hit up github :-) -ryan On Wed, Feb 9, 2011 at 11:06 AM, Andrew Purtell apurt...@apache.org wrote: See https://github.com/trendmicro/jgit-hbase Use branch 'jgit.storage.hbase.v4'

getSplits question

2011-02-09 Thread Geoff Hendrey
Are endrows inclusive or exclusive? The docs say exclusive, but then the question arises as to how to form the last split for getSplits(). The code below runs fine, but I believe it is omitting some rows, perhaps b/c of the exclusive end row. For the final split, should the endrow be null? I tried

Re: getSplits question

2011-02-09 Thread Ryan Rawson
You shouldn't need to write your own getSplits() method to run a map reduce, I never did at least... -ryan On Wed, Feb 9, 2011 at 11:36 PM, Geoff Hendrey ghend...@decarta.com wrote: Are endrows inclusive or exclusive? The docs say exclusive, but then the question arises as to how to form the

RE: getSplits question

2011-02-09 Thread Geoff Hendrey
Oh, I definitely don't *need* my own to run mapreduce. However, if I want to control the number of records handled by each mapper (splitsize) and the startrow and endrow, then I thought I had to write my own getSplits(). Is there another way to accomplish this, because I do need the combination

Re: getSplits question

2011-02-09 Thread Ryan Rawson
By default each map gets the contents of 1 region. A region is by default a maximum of 256MB. There is no trivial way to generally bisect a region in half, in terms of row count, by just knowing what we known (start, end key). For very large tables that have 100 regions, this algorithm works