Re: more regionservers does not improve performance

Suraj Varma Fri, 12 Oct 2012 00:35:09 -0700

What have you configured your hbase.hstore.blockingStoreFiles and
hbase.hregion.memstore.block.multiplier? Both of these block updates
when the limit is hit. Try increasing these to say 20 and 4 from the
default 7 and 2 and see if it helps.


If this still doesn't help, see if you can set up ganglia to get a
better insight into what is bottlenecking.
--Suraj



On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra
<[email protected]> wrote:
> OK, Looks like I missed out reading that part in your original mail. Did you 
> try some of the compaction tweaks and configurations as explained in the 
> following link for your data?
> http://hbase.apache.org/book/regions.arch.html#compaction
>
>
> Also, how much data are your putting into the regions, and how big is one 
> region at the end of data ingestion?
>
> Thanks and Regards
> Pankaj Misra
>
> -----Original Message-----
> From: Jonathan Bishop [mailto:[email protected]]
> Sent: Friday, October 12, 2012 12:04 PM
> To: [email protected]
> Subject: RE: more regionservers does not improve performance
>
> Pankaj,
>
> Thanks  for the reply.
>
> Actually, I am using MD5 hashing to evenly spread the keys among the splits, 
> so I don’t believe there is any hotspot. In fact, when I monitory the web UI 
> for HBase I see a very even load on all the regionservers.
>
> Jon
>
> Sent from my Windows 8 PC <http://windows.microsoft.com/consumer-preview>
>
>  *From:* Pankaj Misra <[email protected]>
> *Sent:* Thursday, October 11, 2012 8:24:32 PM
> *To:* [email protected]
> *Subject:* RE: more regionservers does not improve performance
>
> Hi Jonathan,
>
> What seems to me is that, while doing the split across all 40 mappers, the 
> keys are not randomized enough to leverage multiple regions and the pre-split 
> strategy. This may be happening because all the 40 mappers may be trying to 
> write onto a single region for sometime, making it a HOT region,  till the 
> key falls into another region, and then the other region becomes a HOT region 
> hence you may seeing a high impact of compaction cycles reducing your 
> throughput.
>
> Are the keys incremental? Are the keys randomized enough across the splits?
>
> Ideally when all 40 mappers are running you should see all the regions being 
> filled up in parallel for maximum throughput. Hope it helps.
>
> Thanks and Regards
> Pankaj Misra
>
>
> ________________________________________
> From: Jonathan Bishop [[email protected]]
> Sent: Friday, October 12, 2012 5:38 AM
> To: [email protected]
> Subject: more regionservers does not improve performance
>
> Hi,
>
> I am running a MR job with 40 simultaneous mappers, each of which does puts 
> to HBase. I have ganged up the puts into groups of 1000 (this seems to help 
> quite a bit) and also made sure that the table is pre-split into 100 regions, 
> and that the row keys are randomized using MD5 hashing.
>
> My cluster size is 10, and I am allowing 4 mappers per tasktracker.
>
> In my MR job I know that the mappers are able to generate puts much faster 
> than the puts can be handled in hbase. In other words if I let the mappers 
> run without doing hbase puts then everything scales as you would expect with 
> the number of mappers created. It is the hbase puts which seem to be the 
> bottleneck.
>
> What is strange is that I do not get much run time improvement by increasing 
> the number regionservers beyond about 4. Indeed, it seems that the system 
> runs slower with 8 regionservers than with 4.
>
> I have added the following in hbase-env.sh hoping this would help... (from 
> the book HBase in Action)
>
> export HBASE_OPTS="-Xmx8g"
> export HBASE_REGIONSERVER_OPTS="-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70"
>
> # Uncomment below to enable java garbage collection logging in the .out file.
> export HBASE_OPTS="${HBASE_OPTS} -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -Xloggc:${HBASE_HOME}/logs/gc-hbase.log"
>
> Monitoring hbase through the web ui I see that there are pauses for flushing, 
> which seems to run pretty quickly, and for compacting, which seems to take 
> somewhat longer.
>
> Any advice for making this run faster would be greatly appreciated.
> Currently I am looking into installing Ganglia to better monitory my cluster, 
> but yet to have that running.
>
> I suspect an I/O issue as the regionservers do not seem terribly loaded.
>
> Thanks,
>
> Jon
>
> ________________________________
>
> Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.
>
> Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor 
> Interoperable Systems’ available at http://lf1.me/0E/.
>
>
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.
>
> ________________________________
>
> Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.
>
> Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor 
> Interoperable Systems’ available at http://lf1.me/0E/.
>
>
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.

Re: more regionservers does not improve performance

Reply via email to