Re: anyway to turn off per-region metrics?

2013-07-30 Thread Oliver Meyn (GBIF)
/SchemaMetrics.java#L212 That will remove all of the per table metrics. For 0.95 and above this will be controlled by filters in the metrics properties file. On Mon, Jul 29, 2013 at 4:06 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: Hi All, My ganglia server is being overwhelmed and I need

Re: Scan performance on compressed column families

2012-11-09 Thread Oliver Meyn (GBIF)
Hi David, I wrote that blog post and I know that Lars George has much more experience than me with tuning HBase, especially in different environments, so weight our opinions accordingly. As he says, it will usually help, and the unusual cases of lower spec'd hardware (that I did those tests

Re: resource usage of ResultScanner's IteratorResult

2012-11-02 Thread Oliver Meyn (GBIF)
On 2012-10-26, at 9:59 PM, Stack wrote: On Thu, Oct 25, 2012 at 1:24 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: Hi all, I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on a column value (e.g. give me all keys where column dataset = 1234). That's

resource usage of ResultScanner's IteratorResult

2012-10-25 Thread Oliver Meyn (GBIF)
Hi all, I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on a column value (e.g. give me all keys where column dataset = 1234). That's straightforward using a scan and filter. The trick is that I want to return an Iterator over my key type (Integer) rather than

Optimizing writes/compactions/storefiles

2012-07-11 Thread Oliver Meyn (GBIF)
Hi all, We just spent some time figuring out how to get writes to work properly in our cluster on cdh3, and I wrote it up in a blog post. Might be of interest to some of you: http://gbif.blogspot.dk/2012/07/optimizing-writes-in-hbase.html Cheers, Oliver -- Oliver Meyn Software Developer

Re: Pre-split table using shell

2012-06-12 Thread Oliver Meyn (GBIF)
Hi Simon, I might be wrong but I'm pretty sure the splits file you specify is assumed to be full of strings. So even though they look like bytes they're being interpreted as the string value (like '\x00') instead of the actual byte \x00. The only way I could get the byte representation of

Re: PerformanceEvaluation results

2012-06-08 Thread Oliver Meyn (GBIF)
:53 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: Apologies for responding to myself, but after some more testing I've concluded that we had a minor network bottleneck that was partially masking the real problem: not enough disks. Deductions based on ganglia metrics in a follow-up blog post

Re: HBase Performance Improvements?

2012-05-10 Thread Oliver Meyn (GBIF)
for the response -:) I will take any code you can provide even if it's a hack! I will even send you an Amazon gift card - not that you care or need it -:) Can you share some performance statistics? Thanks again. On Wed, May 9, 2012 at 8:02 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: Heya

Re: HBase Performance Improvements?

2012-05-09 Thread Oliver Meyn (GBIF)
Heya Something, I had a similar task recently and by far the best way to go about this is with bulk loading after pre-splitting your target table. As you know ImportTsv doesn't understand Avro files so I hacked together my own ImportAvro class to create the Hfiles that I eventually moved into

Re: Doumentation broken

2012-04-13 Thread Oliver Meyn (GBIF)
Looks like /book got moved under another /book, so something is definitely wrong. You can try an unstyled version at: http://hbase.apache.org/book/book/book.html Cheers, Oliver On 2012-04-13, at 9:59 AM, Nitin Pawar wrote: Hello, Is there any maintenance going on with hbase.apache.org?

Re: PerformanceEvaluation results

2012-03-20 Thread Oliver Meyn (GBIF)
-evaluation-continued.html Cheers, Oliver On 2012-02-28, at 5:10 PM, Oliver Meyn (GBIF) wrote: Hi all, I've spent the last couple of weeks working with PerformanceEvaluation, trying to understand scan performance in our little cluster. I've written a blog post with the results and would

ethernet channel bonding experiences

2012-03-19 Thread Oliver Meyn (GBIF)
Hi all, I've been experimenting with PerformanceEvaluation in the last weeks and on a whim thought I'd give channel bonding a try to see if it was networking bandwidth that was acting as the bottleneck. It would seem that it's not quite as trivial as it sounds, so I'm looking for other

PerformanceEvaluation results

2012-02-28 Thread Oliver Meyn (GBIF)
Hi all, I've spent the last couple of weeks working with PerformanceEvaluation, trying to understand scan performance in our little cluster. I've written a blog post with the results and would really welcome any input you may have.

Re: strange PerformanceEvaluation behaviour

2012-02-16 Thread Oliver Meyn (GBIF)
On 2012-02-15, at 5:39 PM, Stack wrote: On Wed, Feb 15, 2012 at 1:53 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: So hacking around reveals that key collision is indeed the problem. I thought the modulo part of the getRandomRow method was suspect but while removing it improved

Re: strange PerformanceEvaluation behaviour

2012-02-15 Thread Oliver Meyn (GBIF)
On 2012-02-15, at 7:32 AM, Stack wrote: On Tue, Feb 14, 2012 at 8:14 AM, Stack st...@duboce.net wrote: 2) With that same randomWrite command line above, I would expect a resulting table with 10 * (1024 * 1024) rows (so 10485700 = roughly 10M rows). Instead what I'm seeing is that the

Re: strange PerformanceEvaluation behaviour

2012-02-15 Thread Oliver Meyn (GBIF)
On 2012-02-15, at 9:09 AM, Oliver Meyn (GBIF) wrote: On 2012-02-15, at 7:32 AM, Stack wrote: On Tue, Feb 14, 2012 at 8:14 AM, Stack st...@duboce.net wrote: 2) With that same randomWrite command line above, I would expect a resulting table with 10 * (1024 * 1024) rows (so 10485700 = roughly

Re: strange PerformanceEvaluation behaviour

2012-02-15 Thread Oliver Meyn (GBIF)
, 2012, at 1:53 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: On 2012-02-15, at 9:09 AM, Oliver Meyn (GBIF) wrote: On 2012-02-15, at 7:32 AM, Stack wrote: On Tue, Feb 14, 2012 at 8:14 AM, Stack st...@duboce.net wrote: 2) With that same randomWrite command line above, I would expect a resulting

strange PerformanceEvaluation behaviour

2012-02-14 Thread Oliver Meyn (GBIF)
Hi all, I've been trying to run a battery of tests to really understand our cluster's performance, and I'm employing PerformanceEvaluation to do that (picking up where Tim Robertson left off, elsewhere on the list). I'm seeing two strange things that I hope someone can help with: 1) With a

Re: snappy error during completebulkload

2012-01-10 Thread Oliver Meyn (GBIF)
Lipcon wrote: On Mon, Jan 9, 2012 at 2:42 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: It seems really weird that compression (native compression even moreso) should be required by a command that is in theory moving files from one place on a remote filesystem to another. Any light shed would