Can you pastebin region server log corresponding to the 34GB region ?

Thanks

On Jan 26, 2014, at 3:35 AM, Rohit Dev <[email protected]> wrote:

> Hi Vladimir,
> 
> Here is my cluster status:
> 
> Cluster Size: 26
> Server memory: 128GB
> Total Writes per sec (data): 450 Mbps
> Writes per sec (count) per server: avg ~800 writes/sec (some spikes
> upto 3000 writes/sec)
> Max Region Size: 16GB
> Regions per server: ~140 (not sure if I would be able to merge some
> empty regions while table is online)
> We are running CDH 4.3
> 
> Recently I changed setttings to:
> Java heap size for Region Server: 32GB
> hbase.hregion.memstore.flush.size: 536870912
> hbase.hstore.blockingStoreFiles: 30
> hbase.hstore.compaction.max: 15
> hbase.hregion.memstore.block.multiplier: 3
> hbase.regionserver.maxlogs: 90 (it is too high for 512MB memstore flush size 
> ?)
> 
> I'm seeing weird stuff, like one region has grown upto 34GB! and has
> 21 store files. MAX_FILESIZE for this table is only 16GB.
> Could this be a problem ?
> 
> 
> On Sat, Jan 25, 2014 at 9:49 PM, Vladimir Rodionov
> <[email protected]> wrote:
>> What is the load (ingestion) rate per server in your cluster?
>> 
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: [email protected]
>> 
>> ________________________________________
>> From: Rohit Dev [[email protected]]
>> Sent: Saturday, January 25, 2014 6:09 PM
>> To: [email protected]
>> Subject: Re: Hbase tuning for heavy write cluster
>> 
>> Compaction queue is ~600 in one of the Region-Server, while it is less
>> than 5 is others (total 26 nodes).
>> Compaction queue started going up after I increased the settings[1].
>> In general, one Major compaction takes about 18 Mins.
>> 
>> In the same region-server I'm seeing these two log messages frequently:
>> 
>> 2014-01-25 17:56:27,312 INFO
>> org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs:
>> logs=167, maxlogs=32; forcing flush of 1 regions(s):
>> 3788648752d1c53c1ec80fad72d3e1cc
>> 
>> 2014-01-25 17:57:48,733 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for
>> 'IPC Server handler 53 on 60020' on region
>> tsdb,\x008WR\xE2+\x90\x00\x00\x02Qu\xF1\x00\x00(\x00\x97A\x00\x008M(7\x00\x00Bl\xE85,1390623438462.e6692a1f23b84494015d111954bf00db.:
>> memstore size 1.5 G is >= than blocking 1.5 G size
>> 
>> Any suggestion what else I can do or is ok to ignore these messages ?
>> 
>> 
>> [1]
>> New settings are:
>> - hbase.hregion.memstore.flush.size - 536870912
>> - hbase.hstore.blockingStoreFiles - 30
>> - hbase.hstore.compaction.max - 15
>> - hbase.hregion.memstore.block.multiplier - 3
>> 
>> On Sat, Jan 25, 2014 at 3:00 AM, Ted Yu <[email protected]> wrote:
>>> Yes, it is normal.
>>> 
>>> On Jan 25, 2014, at 2:12 AM, Rohit Dev <[email protected]> wrote:
>>> 
>>>> I changed these settings:
>>>> - hbase.hregion.memstore.flush.size - 536870912
>>>> - hbase.hstore.blockingStoreFiles - 30
>>>> - hbase.hstore.compaction.max - 15
>>>> - hbase.hregion.memstore.block.multiplier - 3
>>>> 
>>>> Things seems to be getting better now, not seeing any of those
>>>> annoying ' Blocking updates' messages. Except that, I'm seeing
>>>> increase in 'Compaction Queue' size on some servers.
>>>> 
>>>> I noticed memstores are getting flushed, but some with 'compaction
>>>> requested=true'[1]. Is this normal ?
>>>> 
>>>> 
>>>> [1]
>>>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore
>>>> flush of ~512.0 M/536921056, currentsize=3.0 M/3194800 for region
>>>> tsdb,\x008ZR\xE1t\xC0\x00\x00\x02\x01\xB0\xF9\x00\x00(\x00\x0B]\x00\x008M((\x00\x00Bk\x9F\x0B,1390598160292.7fb65e5fd5c4cfe08121e85b7354bae9.
>>>> in 3422ms, sequenceid=18522872289, compaction requested=true
>>>> 
>>>> On Fri, Jan 24, 2014 at 6:51 PM, Bryan Beaudreault
>>>> <[email protected]> wrote:
>>>>> Also, I think you can up the hbase.hstore.blockingStoreFiles quite a bit
>>>>> higher.  You could try something like 50.  It will reduce read performance
>>>>> a bit, but shouldn't be too bad especially for something like opentsdb I
>>>>> think.  If you are going to up the blockingStoreFiles you're probably also
>>>>> going to want to up hbase.hstore.compaction.max.
>>>>> 
>>>>> For my tsdb cluster, which is 8 i2.4xlarges in EC2, we have 90 regions for
>>>>> tsdb.  We were also having issues with blocking, and I upped
>>>>> blockingStoreFiles to 35, compaction.max to 15, and
>>>>> memstore.block.multiplier to 3.  We haven't had problems since.  Memstore
>>>>> flushsize for the tsdb table is 512MB.
>>>>> 
>>>>> Finally, 64GB heap may prove problematic, but it's worth a shot.  I'd
>>>>> definitely recommend java7 with the G1 garbage collector though.  In
>>>>> general, Java would have a hard time with heap sizes greater than 20-25GB
>>>>> without some careful tuning.
>>>>> 
>>>>> 
>>>>> On Fri, Jan 24, 2014 at 9:44 PM, Bryan Beaudreault 
>>>>> <[email protected]
>>>>>> wrote:
>>>>> 
>>>>>> It seems from your ingestion rate you are still blowing through HFiles 
>>>>>> too
>>>>>> fast.  You're going to want to up the MEMSTORE_FLUSHSIZE for the table 
>>>>>> from
>>>>>> the default of 128MB.  If opentsdb is the only thing on this cluster, you
>>>>>> can do the math pretty easily to find the maximum allowable, based on 
>>>>>> your
>>>>>> heap size and accounting for 40% (default) used for the block cache.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jan 24, 2014 at 9:38 PM, Rohit Dev <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Kevin,
>>>>>>> 
>>>>>>> We have about 160 regions per server with 16Gig region size and 10
>>>>>>> drives for Hbase. I've looked at disk IO and that doesn't seem to be
>>>>>>> any problem ( % utilization is < 2 across all disk)
>>>>>>> 
>>>>>>> Any suggestion what heap size I should allocation, normally I allocate
>>>>>>> 16GB.
>>>>>>> 
>>>>>>> Also, I read increasing  hbase.hstore.blockingStoreFiles and
>>>>>>> hbase.hregion.memstore.block.multiplier is good idea for write-heavy
>>>>>>> cluster, but in my case it seem to be heading to wrong direction.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> On Fri, Jan 24, 2014 at 6:31 PM, Kevin O'dell <[email protected]>
>>>>>>> wrote:
>>>>>>>> Rohit,
>>>>>>>> 
>>>>>>>> 64GB heap is not ideal, you will run into some weird issues. How many
>>>>>>>> regions are you running per server, how many drives in each node, any
>>>>>>> other
>>>>>>>> settings you changed from default?
>>>>>>>> On Jan 24, 2014 6:22 PM, "Rohit Dev" <[email protected]> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> We are running Opentsdb on CDH 4.3 hbase cluster, with most of the
>>>>>>>>> default settings. The cluster is heavy on write and I'm trying to see
>>>>>>>>> what parameters I can tune to optimize the write performance.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> # I get messages related to Memstore[1] and Slow Response[2] very
>>>>>>>>> often, is this an indication of any issue ?
>>>>>>>>> 
>>>>>>>>> I tried increasing some parameters on one node:
>>>>>>>>> - hbase.hstore.blockingStoreFiles - from default 7 to 15
>>>>>>>>> - hbase.hregion.memstore.block.multiplier - from default 2 to 8
>>>>>>>>> - and heap size from 16GB to 64GB
>>>>>>>>> 
>>>>>>>>> * 'Compaction queue' went up to ~200 within 60 mins after restarting
>>>>>>>>> region server with new parameters and the log started to get even more
>>>>>>>>> noisy.
>>>>>>>>> 
>>>>>>>>> Can anyone please suggest if I'm going to right direction with these
>>>>>>>>> new settings ? or if there are other thing that I could monitor or
>>>>>>>>> change to make it better.
>>>>>>>>> 
>>>>>>>>> Thank you!
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
>>>>>>>>> for 'IPC Server handler 19 on 60020' on region
>>>>>>> tsdb,\x008XR\xE0i\x90\x00\x00\x02Q\x7F\x1D\x00\x00(\x00\x0B]\x00\x008M(r\x00\x00Bl\xA7\x8C,1390556781703.0771bf90cab25c503d3400206417f6bf.:
>>>>>>>>> memstore size 256.3 M is >= than blocking 256 M size
>>>>>>>>> 
>>>>>>>>> [2]
>>>>>>>>> WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
>>>>>>> {"processingtimems":17887,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@586940ea
>>>>>>>>> ),
>>>>>>>>> rpc version=1, client version=29,
>>>>>>>>> methodsFingerPrint=0","client":"192.168.10.10:54132
>>>>>>> ","starttimems":1390587959182,"queuetimems":1498,"class":"HRegionServer","responsesize":0,"method":"multi"}
>> 
>> Confidentiality Notice:  The information contained in this message, 
>> including any attachments hereto, may be confidential and is intended to be 
>> read only by the individual or entity to whom this message is addressed. If 
>> the reader of this message is not the intended recipient or an agent or 
>> designee of the intended recipient, please note that any review, use, 
>> disclosure or distribution of this message or its attachments, in any form, 
>> is strictly prohibited.  If you have received this message in error, please 
>> immediately notify the sender and/or [email protected] and delete 
>> or destroy any copy of this message and its attachments.

Reply via email to