Hi Lars, I changed java heap to 31GB and also reduced memstore flush size to 256MB (down from 512MB). All of the servers are running quiet, except for 1.
- This 1 particular server is doing ~100 Memstore flushes in every 5 Mins, that is about 55% of total Memstore flushes in the cluster. - CPU in this server is running ~100% and system load is also very high (50). This is 24core machine. - jstack dump from this region-server is available at http://pastebin.com/an0XvZRc , seems most of the threads are in blocked state. - io %utilization is under 15% - Compaction queue size has been building up in this server, gone up from 50 to 280 in last 4 hrs. - I noticed requestsPerSecond (from Hbase-Master web ui) goes upto 350k on this particular server, where as other server are doing < 30k. Any suggestion what could be causing high load on this one server ? Also, I'm seeing messages like this on multiple servers (about 25% in the server that has high load): INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=91, maxlogs=90; forcing flush of 7 regions(s): 30d495d3fb5cdfcdac8073d02a05df90, 3ccf33b6da357f0e2d76588895c9f2ab, 499b5ea7c51493995ab942dc5f00a8b5, 7b98e852476ee8432f3d795cd0b4b92b, 7baaf5a2bd916e12a69f390971dd5bb8, 81716547748c93a90767eff50cd2e6bf, 99f4e9d306a5570622ab18ac6d142db9 Could this be a issue ? Thank you! On Sun, Jan 26, 2014 at 9:32 AM, lars hofhansl <[email protected]> wrote: > First, you want the RegionServer to use the available memory for caching, > etc. Every byte of unused RAM is wasted. > > I would make the heap slightly smaller than 32GB, so that the JVM can > still use compressed OOPs. > So I'd set to 31GB. > > > Lastly, 800 writes/s still a bit low. How does the CPU usage look across > the RegionServers? > If CPU is high, you might want to make the memstores *smaller* (it is > expensive to read/write from/to a SkipList). > If you see bad IO, and many store files (as might be case following the > discussion below) maybe you want to increase the memstores. > > -- Lars > > > > ________________________________ > From: Rohit Dev <[email protected]> > To: [email protected] > Sent: Sunday, January 26, 2014 3:35 AM > Subject: Re: Hbase tuning for heavy write cluster > > > Hi Vladimir, > > Here is my cluster status: > > Cluster Size: 26 > Server memory: 128GB > Total Writes per sec (data): 450 Mbps > Writes per sec (count) per server: avg ~800 writes/sec (some spikes > upto 3000 writes/sec) > Max Region Size: 16GB > Regions per server: ~140 (not sure if I would be able to merge some > empty regions while table is online) > We are running CDH 4.3 > > Recently I changed setttings to: > Java heap size for Region Server: 32GB > hbase.hregion.memstore.flush.size: 536870912 > hbase.hstore.blockingStoreFiles: 30 > hbase.hstore.compaction.max: 15 > hbase.hregion.memstore.block.multiplier: 3 > hbase.regionserver.maxlogs: 90 (it is too high for 512MB memstore flush > size ?) > > I'm seeing weird stuff, like one region has grown upto 34GB! and has > 21 store files. MAX_FILESIZE for this table is only 16GB. > Could this be a problem ? > > > > On Sat, Jan 25, 2014 at 9:49 PM, Vladimir Rodionov > <[email protected]> wrote: > > What is the load (ingestion) rate per server in your cluster? > > > > Best regards, > > Vladimir Rodionov > > Principal Platform Engineer > > Carrier IQ, www.carrieriq.com > > e-mail: [email protected] > > > > ________________________________________ > > From: Rohit Dev [[email protected]] > > Sent: Saturday, January 25, 2014 6:09 PM > > To: [email protected] > > Subject: Re: Hbase tuning for heavy write cluster > > > > Compaction queue is ~600 in one of the Region-Server, while it is less > > than 5 is others (total 26 nodes). > > Compaction queue started going up after I increased the settings[1]. > > In general, one Major compaction takes about 18 Mins. > > > > In the same region-server I'm seeing these two log messages frequently: > > > > 2014-01-25 17:56:27,312 INFO > > org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: > > logs=167, maxlogs=32; forcing flush of 1 regions(s): > > 3788648752d1c53c1ec80fad72d3e1cc > > > > 2014-01-25 17:57:48,733 INFO > > org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for > > 'IPC Server handler 53 on 60020' on region > > > tsdb,\x008WR\xE2+\x90\x00\x00\x02Qu\xF1\x00\x00(\x00\x97A\x00\x008M(7\x00\x00Bl\xE85,1390623438462.e6692a1f23b84494015d111954bf00db.: > > memstore size 1.5 G is >= than blocking 1.5 G size > > > > Any suggestion what else I can do or is ok to ignore these messages ? > > > > > > [1] > > New settings are: > > - hbase.hregion.memstore.flush.size - 536870912 > > - hbase.hstore.blockingStoreFiles - 30 > > - hbase.hstore.compaction.max - 15 > > - hbase.hregion.memstore.block.multiplier - 3 > > > > On Sat, Jan 25, 2014 at 3:00 AM, Ted Yu <[email protected]> wrote: > >> Yes, it is normal. > >> > >> On Jan 25, 2014, at 2:12 AM, Rohit Dev <[email protected]> wrote: > >> > >>> I changed these settings: > >>> - hbase.hregion.memstore.flush.size - 536870912 > >>> - hbase.hstore.blockingStoreFiles - 30 > >>> - hbase.hstore.compaction.max - 15 > >>> - hbase.hregion.memstore.block.multiplier - 3 > >>> > >>> Things seems to be getting better now, not seeing any of those > >>> annoying ' Blocking updates' messages. Except that, I'm seeing > >>> increase in 'Compaction Queue' size on some servers. > >>> > >>> I noticed memstores are getting flushed, but some with 'compaction > >>> requested=true'[1]. Is this normal ? > >>> > >>> > >>> [1] > >>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore > >>> flush of ~512.0 M/536921056, currentsize=3.0 M/3194800 for region > >>> > tsdb,\x008ZR\xE1t\xC0\x00\x00\x02\x01\xB0\xF9\x00\x00(\x00\x0B]\x00\x008M((\x00\x00Bk\x9F\x0B,1390598160292.7fb65e5fd5c4cfe08121e85b7354bae9. > >>> in 3422ms, sequenceid=18522872289, compaction requested=true > >>> > >>> On Fri, Jan 24, 2014 at 6:51 PM, Bryan Beaudreault > >>> <[email protected]> wrote: > >>>> Also, I think you can up the hbase.hstore.blockingStoreFiles quite a > bit > >>>> higher. You could try something like 50. It will reduce read > performance > >>>> a bit, but shouldn't be too bad especially for something like > opentsdb I > >>>> think. If you are going to up the blockingStoreFiles you're probably > also > >>>> going to want to up hbase.hstore.compaction.max. > >>>> > >>>> For my tsdb cluster, which is 8 i2.4xlarges in EC2, we have 90 > regions for > >>>> tsdb. We were also having issues with blocking, and I upped > >>>> blockingStoreFiles to 35, compaction.max to 15, and > >>>> memstore.block.multiplier to 3. We haven't had problems since. > Memstore > >>>> flushsize for the tsdb table is 512MB. > >>>> > >>>> Finally, 64GB heap may prove problematic, but it's worth a shot. I'd > >>>> definitely recommend java7 with the G1 garbage collector though. In > >>>> general, Java would have a hard time with heap sizes greater than > 20-25GB > >>>> without some careful tuning. > >>>> > >>>> > >>>> On Fri, Jan 24, 2014 at 9:44 PM, Bryan Beaudreault < > [email protected] > >>>>> wrote: > >>>> > >>>>> It seems from your ingestion rate you are still blowing through > HFiles too > >>>>> fast. You're going to want to up the MEMSTORE_FLUSHSIZE for the > table from > >>>>> the default of 128MB. If opentsdb is the only thing on this > cluster, you > >>>>> can do the math pretty easily to find the maximum allowable, based > on your > >>>>> heap size and accounting for 40% (default) used for the block cache. > >>>>> > >>>>> > >>>>> On Fri, Jan 24, 2014 at 9:38 PM, Rohit Dev <[email protected]> > wrote: > >>>>> > >>>>>> Hi Kevin, > >>>>>> > >>>>>> We have about 160 regions per server with 16Gig region size and 10 > >>>>>> drives for Hbase. I've looked at disk IO and that doesn't seem to be > >>>>>> any problem ( % utilization is < 2 across all disk) > >>>>>> > >>>>>> Any suggestion what heap size I should allocation, normally I > allocate > >>>>>> 16GB. > >>>>>> > >>>>>> Also, I read increasing hbase.hstore.blockingStoreFiles and > >>>>>> hbase.hregion.memstore.block.multiplier is good idea for write-heavy > >>>>>> cluster, but in my case it seem to be heading to wrong direction. > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> On Fri, Jan 24, 2014 at 6:31 PM, Kevin O'dell < > [email protected]> > >>>>>> wrote: > >>>>>>> Rohit, > >>>>>>> > >>>>>>> 64GB heap is not ideal, you will run into some weird issues. How > many > >>>>>>> regions are you running per server, how many drives in each node, > any > >>>>>> other > >>>>>>> settings you changed from default? > >>>>>>> On Jan 24, 2014 6:22 PM, "Rohit Dev" <[email protected]> > wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> We are running Opentsdb on CDH 4.3 hbase cluster, with most of the > >>>>>>>> default settings. The cluster is heavy on write and I'm trying to > see > >>>>>>>> what parameters I can tune to optimize the write performance. > >>>>>>>> > >>>>>>>> > >>>>>>>> # I get messages related to Memstore[1] and Slow Response[2] very > >>>>>>>> often, is this an indication of any issue ? > >>>>>>>> > >>>>>>>> I tried increasing some parameters on one node: > >>>>>>>> - hbase.hstore.blockingStoreFiles - from default 7 to 15 > >>>>>>>> - hbase.hregion.memstore.block.multiplier - from default 2 to 8 > >>>>>>>> - and heap size from 16GB to 64GB > >>>>>>>> > >>>>>>>> * 'Compaction queue' went up to ~200 within 60 mins after > restarting > >>>>>>>> region server with new parameters and the log started to get even > more > >>>>>>>> noisy. > >>>>>>>> > >>>>>>>> Can anyone please suggest if I'm going to right direction with > these > >>>>>>>> new settings ? or if there are other thing that I could monitor or > >>>>>>>> change to make it better. > >>>>>>>> > >>>>>>>> Thank you! > >>>>>>>> > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking > updates > >>>>>>>> for 'IPC Server handler 19 on 60020' on region > >>>>>> > tsdb,\x008XR\xE0i\x90\x00\x00\x02Q\x7F\x1D\x00\x00(\x00\x0B]\x00\x008M(r\x00\x00Bl\xA7\x8C,1390556781703.0771bf90cab25c503d3400206417f6bf.: > >>>>>>>> memstore size 256.3 M is >= than blocking 256 M size > >>>>>>>> > >>>>>>>> [2] > >>>>>>>> WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): > >>>>>> > {"processingtimems":17887,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@586940ea > >>>>>>>> ), > >>>>>>>> rpc version=1, client version=29, > >>>>>>>> methodsFingerPrint=0","client":"192.168.10.10:54132 > >>>>>> > ","starttimems":1390587959182,"queuetimems":1498,"class":"HRegionServer","responsesize":0,"method":"multi"} > >>>>> > >>>>> > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
