Can you pastebin region server log corresponding to the 34GB region ? Thanks
On Jan 26, 2014, at 3:35 AM, Rohit Dev <[email protected]> wrote: > Hi Vladimir, > > Here is my cluster status: > > Cluster Size: 26 > Server memory: 128GB > Total Writes per sec (data): 450 Mbps > Writes per sec (count) per server: avg ~800 writes/sec (some spikes > upto 3000 writes/sec) > Max Region Size: 16GB > Regions per server: ~140 (not sure if I would be able to merge some > empty regions while table is online) > We are running CDH 4.3 > > Recently I changed setttings to: > Java heap size for Region Server: 32GB > hbase.hregion.memstore.flush.size: 536870912 > hbase.hstore.blockingStoreFiles: 30 > hbase.hstore.compaction.max: 15 > hbase.hregion.memstore.block.multiplier: 3 > hbase.regionserver.maxlogs: 90 (it is too high for 512MB memstore flush size > ?) > > I'm seeing weird stuff, like one region has grown upto 34GB! and has > 21 store files. MAX_FILESIZE for this table is only 16GB. > Could this be a problem ? > > > On Sat, Jan 25, 2014 at 9:49 PM, Vladimir Rodionov > <[email protected]> wrote: >> What is the load (ingestion) rate per server in your cluster? >> >> Best regards, >> Vladimir Rodionov >> Principal Platform Engineer >> Carrier IQ, www.carrieriq.com >> e-mail: [email protected] >> >> ________________________________________ >> From: Rohit Dev [[email protected]] >> Sent: Saturday, January 25, 2014 6:09 PM >> To: [email protected] >> Subject: Re: Hbase tuning for heavy write cluster >> >> Compaction queue is ~600 in one of the Region-Server, while it is less >> than 5 is others (total 26 nodes). >> Compaction queue started going up after I increased the settings[1]. >> In general, one Major compaction takes about 18 Mins. >> >> In the same region-server I'm seeing these two log messages frequently: >> >> 2014-01-25 17:56:27,312 INFO >> org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: >> logs=167, maxlogs=32; forcing flush of 1 regions(s): >> 3788648752d1c53c1ec80fad72d3e1cc >> >> 2014-01-25 17:57:48,733 INFO >> org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for >> 'IPC Server handler 53 on 60020' on region >> tsdb,\x008WR\xE2+\x90\x00\x00\x02Qu\xF1\x00\x00(\x00\x97A\x00\x008M(7\x00\x00Bl\xE85,1390623438462.e6692a1f23b84494015d111954bf00db.: >> memstore size 1.5 G is >= than blocking 1.5 G size >> >> Any suggestion what else I can do or is ok to ignore these messages ? >> >> >> [1] >> New settings are: >> - hbase.hregion.memstore.flush.size - 536870912 >> - hbase.hstore.blockingStoreFiles - 30 >> - hbase.hstore.compaction.max - 15 >> - hbase.hregion.memstore.block.multiplier - 3 >> >> On Sat, Jan 25, 2014 at 3:00 AM, Ted Yu <[email protected]> wrote: >>> Yes, it is normal. >>> >>> On Jan 25, 2014, at 2:12 AM, Rohit Dev <[email protected]> wrote: >>> >>>> I changed these settings: >>>> - hbase.hregion.memstore.flush.size - 536870912 >>>> - hbase.hstore.blockingStoreFiles - 30 >>>> - hbase.hstore.compaction.max - 15 >>>> - hbase.hregion.memstore.block.multiplier - 3 >>>> >>>> Things seems to be getting better now, not seeing any of those >>>> annoying ' Blocking updates' messages. Except that, I'm seeing >>>> increase in 'Compaction Queue' size on some servers. >>>> >>>> I noticed memstores are getting flushed, but some with 'compaction >>>> requested=true'[1]. Is this normal ? >>>> >>>> >>>> [1] >>>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore >>>> flush of ~512.0 M/536921056, currentsize=3.0 M/3194800 for region >>>> tsdb,\x008ZR\xE1t\xC0\x00\x00\x02\x01\xB0\xF9\x00\x00(\x00\x0B]\x00\x008M((\x00\x00Bk\x9F\x0B,1390598160292.7fb65e5fd5c4cfe08121e85b7354bae9. >>>> in 3422ms, sequenceid=18522872289, compaction requested=true >>>> >>>> On Fri, Jan 24, 2014 at 6:51 PM, Bryan Beaudreault >>>> <[email protected]> wrote: >>>>> Also, I think you can up the hbase.hstore.blockingStoreFiles quite a bit >>>>> higher. You could try something like 50. It will reduce read performance >>>>> a bit, but shouldn't be too bad especially for something like opentsdb I >>>>> think. If you are going to up the blockingStoreFiles you're probably also >>>>> going to want to up hbase.hstore.compaction.max. >>>>> >>>>> For my tsdb cluster, which is 8 i2.4xlarges in EC2, we have 90 regions for >>>>> tsdb. We were also having issues with blocking, and I upped >>>>> blockingStoreFiles to 35, compaction.max to 15, and >>>>> memstore.block.multiplier to 3. We haven't had problems since. Memstore >>>>> flushsize for the tsdb table is 512MB. >>>>> >>>>> Finally, 64GB heap may prove problematic, but it's worth a shot. I'd >>>>> definitely recommend java7 with the G1 garbage collector though. In >>>>> general, Java would have a hard time with heap sizes greater than 20-25GB >>>>> without some careful tuning. >>>>> >>>>> >>>>> On Fri, Jan 24, 2014 at 9:44 PM, Bryan Beaudreault >>>>> <[email protected] >>>>>> wrote: >>>>> >>>>>> It seems from your ingestion rate you are still blowing through HFiles >>>>>> too >>>>>> fast. You're going to want to up the MEMSTORE_FLUSHSIZE for the table >>>>>> from >>>>>> the default of 128MB. If opentsdb is the only thing on this cluster, you >>>>>> can do the math pretty easily to find the maximum allowable, based on >>>>>> your >>>>>> heap size and accounting for 40% (default) used for the block cache. >>>>>> >>>>>> >>>>>> On Fri, Jan 24, 2014 at 9:38 PM, Rohit Dev <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Kevin, >>>>>>> >>>>>>> We have about 160 regions per server with 16Gig region size and 10 >>>>>>> drives for Hbase. I've looked at disk IO and that doesn't seem to be >>>>>>> any problem ( % utilization is < 2 across all disk) >>>>>>> >>>>>>> Any suggestion what heap size I should allocation, normally I allocate >>>>>>> 16GB. >>>>>>> >>>>>>> Also, I read increasing hbase.hstore.blockingStoreFiles and >>>>>>> hbase.hregion.memstore.block.multiplier is good idea for write-heavy >>>>>>> cluster, but in my case it seem to be heading to wrong direction. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Fri, Jan 24, 2014 at 6:31 PM, Kevin O'dell <[email protected]> >>>>>>> wrote: >>>>>>>> Rohit, >>>>>>>> >>>>>>>> 64GB heap is not ideal, you will run into some weird issues. How many >>>>>>>> regions are you running per server, how many drives in each node, any >>>>>>> other >>>>>>>> settings you changed from default? >>>>>>>> On Jan 24, 2014 6:22 PM, "Rohit Dev" <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We are running Opentsdb on CDH 4.3 hbase cluster, with most of the >>>>>>>>> default settings. The cluster is heavy on write and I'm trying to see >>>>>>>>> what parameters I can tune to optimize the write performance. >>>>>>>>> >>>>>>>>> >>>>>>>>> # I get messages related to Memstore[1] and Slow Response[2] very >>>>>>>>> often, is this an indication of any issue ? >>>>>>>>> >>>>>>>>> I tried increasing some parameters on one node: >>>>>>>>> - hbase.hstore.blockingStoreFiles - from default 7 to 15 >>>>>>>>> - hbase.hregion.memstore.block.multiplier - from default 2 to 8 >>>>>>>>> - and heap size from 16GB to 64GB >>>>>>>>> >>>>>>>>> * 'Compaction queue' went up to ~200 within 60 mins after restarting >>>>>>>>> region server with new parameters and the log started to get even more >>>>>>>>> noisy. >>>>>>>>> >>>>>>>>> Can anyone please suggest if I'm going to right direction with these >>>>>>>>> new settings ? or if there are other thing that I could monitor or >>>>>>>>> change to make it better. >>>>>>>>> >>>>>>>>> Thank you! >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates >>>>>>>>> for 'IPC Server handler 19 on 60020' on region >>>>>>> tsdb,\x008XR\xE0i\x90\x00\x00\x02Q\x7F\x1D\x00\x00(\x00\x0B]\x00\x008M(r\x00\x00Bl\xA7\x8C,1390556781703.0771bf90cab25c503d3400206417f6bf.: >>>>>>>>> memstore size 256.3 M is >= than blocking 256 M size >>>>>>>>> >>>>>>>>> [2] >>>>>>>>> WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): >>>>>>> {"processingtimems":17887,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@586940ea >>>>>>>>> ), >>>>>>>>> rpc version=1, client version=29, >>>>>>>>> methodsFingerPrint=0","client":"192.168.10.10:54132 >>>>>>> ","starttimems":1390587959182,"queuetimems":1498,"class":"HRegionServer","responsesize":0,"method":"multi"} >> >> Confidentiality Notice: The information contained in this message, >> including any attachments hereto, may be confidential and is intended to be >> read only by the individual or entity to whom this message is addressed. If >> the reader of this message is not the intended recipient or an agent or >> designee of the intended recipient, please note that any review, use, >> disclosure or distribution of this message or its attachments, in any form, >> is strictly prohibited. If you have received this message in error, please >> immediately notify the sender and/or [email protected] and delete >> or destroy any copy of this message and its attachments.
