Kevin, Sorry, I am fairly new to HBase. Can you be specific about what settings I can change, and also where they are specified?
Pretty sure I am not hotspotting, and increasing memstore does not seem to have any effect. I do not seen any messages in my regionserver logs concerning blocking. I am suspecting that I am hitting some limit in our grid, but would like to know where that limit is being imposed. Jon On Fri, Oct 12, 2012 at 6:44 AM, Kevin O'dell <[email protected]>wrote: > Jonathan, > > Lets take a deeper look here. > > What is your memstore set at for the table/CF in question? Lets compare > that value with the flush size you are seeing for your regions. If they > are really small flushes is it all to the same region? If so that is going > to be schema issues. If they are full flushes you can up your memstore > assuming you have the heap to cover it. If they are smaller flushes but to > different regions you most likely are suffering from global limit pressure > and flushing too soon. > > Are you flushing prematurely due to HLogs rolling? Take a look for too > many hlogs and look at the flushes. It may benefit you to raise that > value. > > Are you blocking? As Suraj was saying you may be blocking in 90second > blocks. Check the RS logs for those messages as well and then Suraj's > advice. > > This is where I would start to optimize your write path. I hope the above > helps. > > On Fri, Oct 12, 2012 at 3:34 AM, Suraj Varma <[email protected]> wrote: > > > What have you configured your hbase.hstore.blockingStoreFiles and > > hbase.hregion.memstore.block.multiplier? Both of these block updates > > when the limit is hit. Try increasing these to say 20 and 4 from the > > default 7 and 2 and see if it helps. > > > > If this still doesn't help, see if you can set up ganglia to get a > > better insight into what is bottlenecking. > > --Suraj > > > > > > > > On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra > > <[email protected]> wrote: > > > OK, Looks like I missed out reading that part in your original mail. > Did > > you try some of the compaction tweaks and configurations as explained in > > the following link for your data? > > > http://hbase.apache.org/book/regions.arch.html#compaction > > > > > > > > > Also, how much data are your putting into the regions, and how big is > > one region at the end of data ingestion? > > > > > > Thanks and Regards > > > Pankaj Misra > > > > > > -----Original Message----- > > > From: Jonathan Bishop [mailto:[email protected]] > > > Sent: Friday, October 12, 2012 12:04 PM > > > To: [email protected] > > > Subject: RE: more regionservers does not improve performance > > > > > > Pankaj, > > > > > > Thanks for the reply. > > > > > > Actually, I am using MD5 hashing to evenly spread the keys among the > > splits, so I don’t believe there is any hotspot. In fact, when I monitory > > the web UI for HBase I see a very even load on all the regionservers. > > > > > > Jon > > > > > > Sent from my Windows 8 PC < > http://windows.microsoft.com/consumer-preview > > > > > > > > > *From:* Pankaj Misra <[email protected]> > > > *Sent:* Thursday, October 11, 2012 8:24:32 PM > > > *To:* [email protected] > > > *Subject:* RE: more regionservers does not improve performance > > > > > > Hi Jonathan, > > > > > > What seems to me is that, while doing the split across all 40 mappers, > > the keys are not randomized enough to leverage multiple regions and the > > pre-split strategy. This may be happening because all the 40 mappers may > be > > trying to write onto a single region for sometime, making it a HOT > region, > > till the key falls into another region, and then the other region > becomes > > a HOT region hence you may seeing a high impact of compaction cycles > > reducing your throughput. > > > > > > Are the keys incremental? Are the keys randomized enough across the > > splits? > > > > > > Ideally when all 40 mappers are running you should see all the regions > > being filled up in parallel for maximum throughput. Hope it helps. > > > > > > Thanks and Regards > > > Pankaj Misra > > > > > > > > > ________________________________________ > > > From: Jonathan Bishop [[email protected]] > > > Sent: Friday, October 12, 2012 5:38 AM > > > To: [email protected] > > > Subject: more regionservers does not improve performance > > > > > > Hi, > > > > > > I am running a MR job with 40 simultaneous mappers, each of which does > > puts to HBase. I have ganged up the puts into groups of 1000 (this seems > to > > help quite a bit) and also made sure that the table is pre-split into 100 > > regions, and that the row keys are randomized using MD5 hashing. > > > > > > My cluster size is 10, and I am allowing 4 mappers per tasktracker. > > > > > > In my MR job I know that the mappers are able to generate puts much > > faster than the puts can be handled in hbase. In other words if I let the > > mappers run without doing hbase puts then everything scales as you would > > expect with the number of mappers created. It is the hbase puts which > seem > > to be the bottleneck. > > > > > > What is strange is that I do not get much run time improvement by > > increasing the number regionservers beyond about 4. Indeed, it seems that > > the system runs slower with 8 regionservers than with 4. > > > > > > I have added the following in hbase-env.sh hoping this would help... > > (from the book HBase in Action) > > > > > > export HBASE_OPTS="-Xmx8g" > > > export HBASE_REGIONSERVER_OPTS="-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC > > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70" > > > > > > # Uncomment below to enable java garbage collection logging in the .out > > file. > > > export HBASE_OPTS="${HBASE_OPTS} -verbose:gc -XX:+PrintGCDetails > > -XX:+PrintGCDateStamps -Xloggc:${HBASE_HOME}/logs/gc-hbase.log" > > > > > > Monitoring hbase through the web ui I see that there are pauses for > > flushing, which seems to run pretty quickly, and for compacting, which > > seems to take somewhat longer. > > > > > > Any advice for making this run faster would be greatly appreciated. > > > Currently I am looking into installing Ganglia to better monitory my > > cluster, but yet to have that running. > > > > > > I suspect an I/O issue as the regionservers do not seem terribly > loaded. > > > > > > Thanks, > > > > > > Jon > > > > > > ________________________________ > > > > > > Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012. > > > > > > Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor > > Interoperable Systems’ available at http://lf1.me/0E/. > > > > > > > > > NOTE: This message may contain information that is confidential, > > proprietary, privileged or otherwise protected by law. The message is > > intended solely for the named addressee. If received in error, please > > destroy and notify the sender. Any use of this email is prohibited when > > received in error. Impetus does not represent, warrant and/or guarantee, > > that the integrity of this communication has been maintained nor that the > > communication is free of errors, virus, interception or interference. > > > > > > ________________________________ > > > > > > Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012. > > > > > > Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor > > Interoperable Systems’ available at http://lf1.me/0E/. > > > > > > > > > NOTE: This message may contain information that is confidential, > > proprietary, privileged or otherwise protected by law. The message is > > intended solely for the named addressee. If received in error, please > > destroy and notify the sender. Any use of this email is prohibited when > > received in error. Impetus does not represent, warrant and/or guarantee, > > that the integrity of this communication has been maintained nor that the > > communication is free of errors, virus, interception or interference. > > > > > > -- > Kevin O'Dell > Customer Operations Engineer, Cloudera >
