Re: Slow full-table scans

J Mohamed Zahoor Tue, 21 Aug 2012 22:01:02 -0700

Try a quick TestDFSIO to see if things are okay.

./zahoor


On Wed, Aug 22, 2012 at 6:26 AM, Mohit Anchlia <[email protected]>wrote:

> It's possible that there is a bad or slower disk on Gurjeet's machine. I
> think details of iostat and cpu would clear things up.
>
> On Tue, Aug 21, 2012 at 4:33 PM, lars hofhansl <[email protected]>
> wrote:
>
> > I get roughly the same (~1.8s) - 100 rows, 200.000 columns, segment size
> > 100
> >
> >
> >
> > ________________________________
> >  From: Gurjeet Singh <[email protected]>
> > To: [email protected]; lars hofhansl <[email protected]>
> > Sent: Tuesday, August 21, 2012 11:31 AM
> >  Subject: Re: Slow full-table scans
> >
> > How does that compare with the newScanTable on your build ?
> >
> > Gurjeet
> >
> > On Tue, Aug 21, 2012 at 11:18 AM, lars hofhansl <[email protected]>
> > wrote:
> > > Hmm... So I tried in HBase (current trunk).
> > > I created 100 rows with 200.000 columns each (using your oldMakeTable).
> > The creation took a bit, but scanning finished in 1.8s. (HBase in pseudo
> > distributed mode - with your oldScanTable).
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: lars hofhansl <[email protected]>
> > > To: "[email protected]" <[email protected]>
> > > Cc:
> > > Sent: Monday, August 20, 2012 7:50 PM
> > > Subject: Re: Slow full-table scans
> > >
> > > Thanks Gurjeet,
> > >
> > > I'll (hopefully) have a look tomorrow.
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Gurjeet Singh <[email protected]>
> > > To: [email protected]; lars hofhansl <[email protected]>
> > > Cc:
> > > Sent: Monday, August 20, 2012 7:42 PM
> > > Subject: Re: Slow full-table scans
> > >
> > > Hi Lars,
> > >
> > > Here is a testcase:
> > >
> > > https://gist.github.com/3410948
> > >
> > > Benchmarking code:
> > >
> > > https://gist.github.com/3410952
> > >
> > > Try running it with numRows = 100, numCols = 200000, segmentSize = 1000
> > >
> > > Gurjeet
> > >
> > >
> > > On Thu, Aug 16, 2012 at 11:40 AM, Gurjeet Singh <[email protected]>
> > wrote:
> > >> Sure - I can create a minimal testcase and send it along.
> > >>
> > >> Gurjeet
> > >>
> > >> On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl <[email protected]>
> > wrote:
> > >>> That's interesting.
> > >>> Could you share your old and new schema. I would like to track down
> > the performance problems you saw.
> > >>> (If you had a demo program that populates your rows with 200.000
> > columns in a way where you saw the performance issues, that'd be even
> > better, but not necessary).
> > >>>
> > >>>
> > >>> -- Lars
> > >>>
> > >>>
> > >>>
> > >>> ________________________________
> > >>>  From: Gurjeet Singh <[email protected]>
> > >>> To: [email protected]; lars hofhansl <[email protected]>
> > >>> Sent: Thursday, August 16, 2012 11:26 AM
> > >>> Subject: Re: Slow full-table scans
> > >>>
> > >>> Sorry for the delay guys.
> > >>>
> > >>> Here are a few results:
> > >>>
> > >>> 1. Regions in the table = 11
> > >>> 2. The region servers don't appear to be very busy with the query ~5%
> > >>> CPU (but with parallelization, they are all busy)
> > >>>
> > >>> Finally, I changed the format of my data, such that each cell in
> HBase
> > >>> contains a chunk of a row instead of the single value it had. So,
> > >>> stuffing each Hbase cell with 500 columns of a row, gave me a
> > >>> performance boost of 1000x. It seems that the underlying issue was IO
> > >>> overhead per byte of actual data stored.
> > >>>
> > >>>
> > >>> On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <[email protected]>
> > wrote:
> > >>>> Yeah... It looks OK.
> > >>>> Maybe 2G of heap is a bit low when dealing with 200.000 column rows.
> > >>>>
> > >>>>
> > >>>> If you can I'd like to know how busy your regionservers are during
> > these operations. That would be an indication on whether the
> > parallelization is good or not.
> > >>>>
> > >>>> -- Lars
> > >>>>
> > >>>>
> > >>>> ----- Original Message -----
> > >>>> From: Stack <[email protected]>
> > >>>> To: [email protected]
> > >>>> Cc:
> > >>>> Sent: Wednesday, August 15, 2012 3:13 PM
> > >>>> Subject: Re: Slow full-table scans
> > >>>>
> > >>>> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <[email protected]>
> > wrote:
> > >>>>> I am beginning to think that this is a configuration issue on my
> > >>>>> cluster. Do the following configuration files seem sane ?
> > >>>>>
> > >>>>> hbase-env.sh    https://gist.github.com/3345338
> > >>>>>
> > >>>>
> > >>>> Nothing wrong w/ this (Remove the -ea, you don't want asserts in
> > >>>> production, and the -XX:+CMSIncrementalMode flag if >= 2 cores).
> > >>>>
> > >>>>
> > >>>>> hbase-site.xml    https://gist.github.com/3345356
> > >>>>>
> > >>>>
> > >>>> This is all defaults effectively.   I don't see any of the configs.
> > >>>> recommended by the performance section of the reference guide and/or
> > >>>> those suggested by the GBIF blog.
> > >>>>
> > >>>> You don't answer LarsH's query about where you see the 4%
> difference.
> > >>>>
> > >>>> How many regions in your table?  Whats the HBase Master UI look like
> > >>>> when this scan is running?
> > >>>> St.Ack
> > >>>>
> >
>

Re: Slow full-table scans

Reply via email to