Try a quick TestDFSIO to see if things are okay. ./zahoor
On Wed, Aug 22, 2012 at 6:26 AM, Mohit Anchlia <[email protected]>wrote: > It's possible that there is a bad or slower disk on Gurjeet's machine. I > think details of iostat and cpu would clear things up. > > On Tue, Aug 21, 2012 at 4:33 PM, lars hofhansl <[email protected]> > wrote: > > > I get roughly the same (~1.8s) - 100 rows, 200.000 columns, segment size > > 100 > > > > > > > > ________________________________ > > From: Gurjeet Singh <[email protected]> > > To: [email protected]; lars hofhansl <[email protected]> > > Sent: Tuesday, August 21, 2012 11:31 AM > > Subject: Re: Slow full-table scans > > > > How does that compare with the newScanTable on your build ? > > > > Gurjeet > > > > On Tue, Aug 21, 2012 at 11:18 AM, lars hofhansl <[email protected]> > > wrote: > > > Hmm... So I tried in HBase (current trunk). > > > I created 100 rows with 200.000 columns each (using your oldMakeTable). > > The creation took a bit, but scanning finished in 1.8s. (HBase in pseudo > > distributed mode - with your oldScanTable). > > > > > > -- Lars > > > > > > > > > > > > ----- Original Message ----- > > > From: lars hofhansl <[email protected]> > > > To: "[email protected]" <[email protected]> > > > Cc: > > > Sent: Monday, August 20, 2012 7:50 PM > > > Subject: Re: Slow full-table scans > > > > > > Thanks Gurjeet, > > > > > > I'll (hopefully) have a look tomorrow. > > > > > > -- Lars > > > > > > > > > > > > ----- Original Message ----- > > > From: Gurjeet Singh <[email protected]> > > > To: [email protected]; lars hofhansl <[email protected]> > > > Cc: > > > Sent: Monday, August 20, 2012 7:42 PM > > > Subject: Re: Slow full-table scans > > > > > > Hi Lars, > > > > > > Here is a testcase: > > > > > > https://gist.github.com/3410948 > > > > > > Benchmarking code: > > > > > > https://gist.github.com/3410952 > > > > > > Try running it with numRows = 100, numCols = 200000, segmentSize = 1000 > > > > > > Gurjeet > > > > > > > > > On Thu, Aug 16, 2012 at 11:40 AM, Gurjeet Singh <[email protected]> > > wrote: > > >> Sure - I can create a minimal testcase and send it along. > > >> > > >> Gurjeet > > >> > > >> On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl <[email protected]> > > wrote: > > >>> That's interesting. > > >>> Could you share your old and new schema. I would like to track down > > the performance problems you saw. > > >>> (If you had a demo program that populates your rows with 200.000 > > columns in a way where you saw the performance issues, that'd be even > > better, but not necessary). > > >>> > > >>> > > >>> -- Lars > > >>> > > >>> > > >>> > > >>> ________________________________ > > >>> From: Gurjeet Singh <[email protected]> > > >>> To: [email protected]; lars hofhansl <[email protected]> > > >>> Sent: Thursday, August 16, 2012 11:26 AM > > >>> Subject: Re: Slow full-table scans > > >>> > > >>> Sorry for the delay guys. > > >>> > > >>> Here are a few results: > > >>> > > >>> 1. Regions in the table = 11 > > >>> 2. The region servers don't appear to be very busy with the query ~5% > > >>> CPU (but with parallelization, they are all busy) > > >>> > > >>> Finally, I changed the format of my data, such that each cell in > HBase > > >>> contains a chunk of a row instead of the single value it had. So, > > >>> stuffing each Hbase cell with 500 columns of a row, gave me a > > >>> performance boost of 1000x. It seems that the underlying issue was IO > > >>> overhead per byte of actual data stored. > > >>> > > >>> > > >>> On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <[email protected]> > > wrote: > > >>>> Yeah... It looks OK. > > >>>> Maybe 2G of heap is a bit low when dealing with 200.000 column rows. > > >>>> > > >>>> > > >>>> If you can I'd like to know how busy your regionservers are during > > these operations. That would be an indication on whether the > > parallelization is good or not. > > >>>> > > >>>> -- Lars > > >>>> > > >>>> > > >>>> ----- Original Message ----- > > >>>> From: Stack <[email protected]> > > >>>> To: [email protected] > > >>>> Cc: > > >>>> Sent: Wednesday, August 15, 2012 3:13 PM > > >>>> Subject: Re: Slow full-table scans > > >>>> > > >>>> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <[email protected]> > > wrote: > > >>>>> I am beginning to think that this is a configuration issue on my > > >>>>> cluster. Do the following configuration files seem sane ? > > >>>>> > > >>>>> hbase-env.sh https://gist.github.com/3345338 > > >>>>> > > >>>> > > >>>> Nothing wrong w/ this (Remove the -ea, you don't want asserts in > > >>>> production, and the -XX:+CMSIncrementalMode flag if >= 2 cores). > > >>>> > > >>>> > > >>>>> hbase-site.xml https://gist.github.com/3345356 > > >>>>> > > >>>> > > >>>> This is all defaults effectively. I don't see any of the configs. > > >>>> recommended by the performance section of the reference guide and/or > > >>>> those suggested by the GBIF blog. > > >>>> > > >>>> You don't answer LarsH's query about where you see the 4% > difference. > > >>>> > > >>>> How many regions in your table? Whats the HBase Master UI look like > > >>>> when this scan is running? > > >>>> St.Ack > > >>>> > > >
