Re: more regionservers does not improve performance

Jonathan Bishop Fri, 12 Oct 2012 12:15:48 -0700

Kevin,

Sorry, I am fairly new to HBase. Can you be specific about what settings I
can change, and also where they are specified?


Pretty sure I am not hotspotting, and increasing memstore does not seem to
have any effect.

I do not seen any messages in my regionserver logs concerning blocking.

I am suspecting that I am hitting some limit in our grid, but would like to
know where that limit is being imposed.

Jon

On Fri, Oct 12, 2012 at 6:44 AM, Kevin O'dell <[email protected]>wrote:

> Jonathan,
>
>   Lets take a deeper look here.
>
> What is your memstore set at for the table/CF in question?  Lets compare
> that value with the flush size you are seeing for your regions.  If they
> are really small flushes is it all to the same region?  If so that is going
> to be schema issues.  If they are full flushes you can up your memstore
> assuming you have the heap to cover it.  If they are smaller flushes but to
> different regions you most likely are suffering from global limit pressure
> and flushing too soon.
>
> Are you flushing prematurely due to HLogs rolling?  Take a look for too
> many hlogs and look at the flushes.  It may benefit you to raise that
> value.
>
> Are you blocking?  As Suraj was saying you may be blocking in 90second
> blocks.  Check the RS logs for those messages as well and then Suraj's
> advice.
>
> This is where I would start to optimize your write path.  I hope the above
> helps.
>
> On Fri, Oct 12, 2012 at 3:34 AM, Suraj Varma <[email protected]> wrote:
>
> > What have you configured your hbase.hstore.blockingStoreFiles and
> > hbase.hregion.memstore.block.multiplier? Both of these block updates
> > when the limit is hit. Try increasing these to say 20 and 4 from the
> > default 7 and 2 and see if it helps.
> >
> > If this still doesn't help, see if you can set up ganglia to get a
> > better insight into what is bottlenecking.
> > --Suraj
> >
> >
> >
> > On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra
> > <[email protected]> wrote:
> > > OK, Looks like I missed out reading that part in your original mail.
> Did
> > you try some of the compaction tweaks and configurations as explained in
> > the following link for your data?
> > > http://hbase.apache.org/book/regions.arch.html#compaction
> > >
> > >
> > > Also, how much data are your putting into the regions, and how big is
> > one region at the end of data ingestion?
> > >
> > > Thanks and Regards
> > > Pankaj Misra
> > >
> > > -----Original Message-----
> > > From: Jonathan Bishop [mailto:[email protected]]
> > > Sent: Friday, October 12, 2012 12:04 PM
> > > To: [email protected]
> > > Subject: RE: more regionservers does not improve performance
> > >
> > > Pankaj,
> > >
> > > Thanks  for the reply.
> > >
> > > Actually, I am using MD5 hashing to evenly spread the keys among the
> > splits, so I don’t believe there is any hotspot. In fact, when I monitory
> > the web UI for HBase I see a very even load on all the regionservers.
> > >
> > > Jon
> > >
> > > Sent from my Windows 8 PC <
> http://windows.microsoft.com/consumer-preview
> > >
> > >
> > >  *From:* Pankaj Misra <[email protected]>
> > > *Sent:* Thursday, October 11, 2012 8:24:32 PM
> > > *To:* [email protected]
> > > *Subject:* RE: more regionservers does not improve performance
> > >
> > > Hi Jonathan,
> > >
> > > What seems to me is that, while doing the split across all 40 mappers,
> > the keys are not randomized enough to leverage multiple regions and the
> > pre-split strategy. This may be happening because all the 40 mappers may
> be
> > trying to write onto a single region for sometime, making it a HOT
> region,
> >  till the key falls into another region, and then the other region
> becomes
> > a HOT region hence you may seeing a high impact of compaction cycles
> > reducing your throughput.
> > >
> > > Are the keys incremental? Are the keys randomized enough across the
> > splits?
> > >
> > > Ideally when all 40 mappers are running you should see all the regions
> > being filled up in parallel for maximum throughput. Hope it helps.
> > >
> > > Thanks and Regards
> > > Pankaj Misra
> > >
> > >
> > > ________________________________________
> > > From: Jonathan Bishop [[email protected]]
> > > Sent: Friday, October 12, 2012 5:38 AM
> > > To: [email protected]
> > > Subject: more regionservers does not improve performance
> > >
> > > Hi,
> > >
> > > I am running a MR job with 40 simultaneous mappers, each of which does
> > puts to HBase. I have ganged up the puts into groups of 1000 (this seems
> to
> > help quite a bit) and also made sure that the table is pre-split into 100
> > regions, and that the row keys are randomized using MD5 hashing.
> > >
> > > My cluster size is 10, and I am allowing 4 mappers per tasktracker.
> > >
> > > In my MR job I know that the mappers are able to generate puts much
> > faster than the puts can be handled in hbase. In other words if I let the
> > mappers run without doing hbase puts then everything scales as you would
> > expect with the number of mappers created. It is the hbase puts which
> seem
> > to be the bottleneck.
> > >
> > > What is strange is that I do not get much run time improvement by
> > increasing the number regionservers beyond about 4. Indeed, it seems that
> > the system runs slower with 8 regionservers than with 4.
> > >
> > > I have added the following in hbase-env.sh hoping this would help...
> > (from the book HBase in Action)
> > >
> > > export HBASE_OPTS="-Xmx8g"
> > > export HBASE_REGIONSERVER_OPTS="-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC
> > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70"
> > >
> > > # Uncomment below to enable java garbage collection logging in the .out
> > file.
> > > export HBASE_OPTS="${HBASE_OPTS} -verbose:gc -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps -Xloggc:${HBASE_HOME}/logs/gc-hbase.log"
> > >
> > > Monitoring hbase through the web ui I see that there are pauses for
> > flushing, which seems to run pretty quickly, and for compacting, which
> > seems to take somewhat longer.
> > >
> > > Any advice for making this run faster would be greatly appreciated.
> > > Currently I am looking into installing Ganglia to better monitory my
> > cluster, but yet to have that running.
> > >
> > > I suspect an I/O issue as the regionservers do not seem terribly
> loaded.
> > >
> > > Thanks,
> > >
> > > Jon
> > >
> > > ________________________________
> > >
> > > Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.
> > >
> > > Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor
> > Interoperable Systems’ available at http://lf1.me/0E/.
> > >
> > >
> > > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> > >
> > > ________________________________
> > >
> > > Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.
> > >
> > > Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor
> > Interoperable Systems’ available at http://lf1.me/0E/.
> > >
> > >
> > > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>

Re: more regionservers does not improve performance

Reply via email to