That's interesting. Could you share your old and new schema. I would like to track down the performance problems you saw. (If you had a demo program that populates your rows with 200.000 columns in a way where you saw the performance issues, that'd be even better, but not necessary).
-- Lars ________________________________ From: Gurjeet Singh <[email protected]> To: [email protected]; lars hofhansl <[email protected]> Sent: Thursday, August 16, 2012 11:26 AM Subject: Re: Slow full-table scans Sorry for the delay guys. Here are a few results: 1. Regions in the table = 11 2. The region servers don't appear to be very busy with the query ~5% CPU (but with parallelization, they are all busy) Finally, I changed the format of my data, such that each cell in HBase contains a chunk of a row instead of the single value it had. So, stuffing each Hbase cell with 500 columns of a row, gave me a performance boost of 1000x. It seems that the underlying issue was IO overhead per byte of actual data stored. On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <[email protected]> wrote: > Yeah... It looks OK. > Maybe 2G of heap is a bit low when dealing with 200.000 column rows. > > > If you can I'd like to know how busy your regionservers are during these > operations. That would be an indication on whether the parallelization is > good or not. > > -- Lars > > > ----- Original Message ----- > From: Stack <[email protected]> > To: [email protected] > Cc: > Sent: Wednesday, August 15, 2012 3:13 PM > Subject: Re: Slow full-table scans > > On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <[email protected]> wrote: >> I am beginning to think that this is a configuration issue on my >> cluster. Do the following configuration files seem sane ? >> >> hbase-env.sh https://gist.github.com/3345338 >> > > Nothing wrong w/ this (Remove the -ea, you don't want asserts in > production, and the -XX:+CMSIncrementalMode flag if >= 2 cores). > > >> hbase-site.xml https://gist.github.com/3345356 >> > > This is all defaults effectively. I don't see any of the configs. > recommended by the performance section of the reference guide and/or > those suggested by the GBIF blog. > > You don't answer LarsH's query about where you see the 4% difference. > > How many regions in your table? Whats the HBase Master UI look like > when this scan is running? > St.Ack >
