Sure - I can create a minimal testcase and send it along. Gurjeet
On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl <[email protected]> wrote: > That's interesting. > Could you share your old and new schema. I would like to track down the > performance problems you saw. > (If you had a demo program that populates your rows with 200.000 columns in a > way where you saw the performance issues, that'd be even better, but not > necessary). > > > -- Lars > > > > ________________________________ > From: Gurjeet Singh <[email protected]> > To: [email protected]; lars hofhansl <[email protected]> > Sent: Thursday, August 16, 2012 11:26 AM > Subject: Re: Slow full-table scans > > Sorry for the delay guys. > > Here are a few results: > > 1. Regions in the table = 11 > 2. The region servers don't appear to be very busy with the query ~5% > CPU (but with parallelization, they are all busy) > > Finally, I changed the format of my data, such that each cell in HBase > contains a chunk of a row instead of the single value it had. So, > stuffing each Hbase cell with 500 columns of a row, gave me a > performance boost of 1000x. It seems that the underlying issue was IO > overhead per byte of actual data stored. > > > On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <[email protected]> wrote: >> Yeah... It looks OK. >> Maybe 2G of heap is a bit low when dealing with 200.000 column rows. >> >> >> If you can I'd like to know how busy your regionservers are during these >> operations. That would be an indication on whether the parallelization is >> good or not. >> >> -- Lars >> >> >> ----- Original Message ----- >> From: Stack <[email protected]> >> To: [email protected] >> Cc: >> Sent: Wednesday, August 15, 2012 3:13 PM >> Subject: Re: Slow full-table scans >> >> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <[email protected]> wrote: >>> I am beginning to think that this is a configuration issue on my >>> cluster. Do the following configuration files seem sane ? >>> >>> hbase-env.sh https://gist.github.com/3345338 >>> >> >> Nothing wrong w/ this (Remove the -ea, you don't want asserts in >> production, and the -XX:+CMSIncrementalMode flag if >= 2 cores). >> >> >>> hbase-site.xml https://gist.github.com/3345356 >>> >> >> This is all defaults effectively. I don't see any of the configs. >> recommended by the performance section of the reference guide and/or >> those suggested by the GBIF blog. >> >> You don't answer LarsH's query about where you see the 4% difference. >> >> How many regions in your table? Whats the HBase Master UI look like >> when this scan is running? >> St.Ack >>
