Re: Spikes when writing data to HBase

Serega Sheypak Tue, 11 Aug 2015 16:04:42 -0700

Probably I found something. Response time decreases when parallelism grows.
What I did:


1. wrap business logic controller into java main class. My controller does
some logic and puts/gets to hbase with checkAndPut (sometimes)
2. create HConnection
3. pass HConnection to controller
4. wrap controller execution into codahale metrics
5. execute controller in several threads simultaneously. The same happens
in servlet environment

I can't explain result.
1. I used 10 threads and 100000 iterations in each.

*RESULT:  99% <= 28.81 milliseconds which sounds GOOD!*
-- Meters
----------------------------------------------------------------------
putMeter
             count = 414914
         mean rate = 885.58 events/second
     1-minute rate = 911.56 events/second
     5-minute rate = 778.16 events/second
    15-minute rate = 549.72 events/second

-- Timers
----------------------------------------------------------------------
putTimer
             count = 414914
         mean rate = 884.66 calls/second
     1-minute rate = 911.53 calls/second
     5-minute rate = 765.60 calls/second
    15-minute rate = 515.06 calls/second
               min = 4.87 milliseconds
               max = 211.77 milliseconds
              mean = 10.81 milliseconds
            stddev = 5.43 milliseconds
            median = 10.34 milliseconds
              75% <= 11.59 milliseconds
              95% <= 14.41 milliseconds
              98% <= 19.59 milliseconds
              99% <= 28.81 milliseconds
            99.9% <= 60.67 milliseconds

I've increased count of threads to 100:
*RESULT: 99% <= 112.09 milliseconds*
-- Meters
----------------------------------------------------------------------
putMeter
             count = 1433056
         mean rate = 2476.46 events/second
     1-minute rate = 2471.18 events/second
     5-minute rate = 2483.28 events/second
    15-minute rate = 2512.52 events/second

-- Timers
----------------------------------------------------------------------
putTimer
             count = 1433058
         mean rate = 2474.61 calls/second
     1-minute rate = 2468.45 calls/second
     5-minute rate = 2446.45 calls/second
    15-minute rate = 2383.23 calls/second
               min = 10.03 milliseconds
               max = 853.05 milliseconds
              mean = 40.71 milliseconds
            stddev = 39.04 milliseconds
            median = 35.60 milliseconds
              75% <= 47.69 milliseconds
              95% <= 71.79 milliseconds
              98% <= 85.83 milliseconds
              99% <= 112.09 milliseconds
            99.9% <= 853.05 milliseconds

Is it possible to explain it? Could it be a problem in some
pooling/threading inside HConnection?

please see what happened to compactions during test:
http://www.bigdatapath.com/wp-content/uploads/2015/08/compations.png

get/put ops
http://www.bigdatapath.com/wp-content/uploads/2015/08/get_ops.png

slow ops:
http://www.bigdatapath.com/wp-content/uploads/2015/08/slow_ops.png

2015-08-11 23:43 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>:

> >How about GC activity? ApplicationStopTime? Do you track that?
> yes, jviusalm says it's ok, newrelic also doesn't show something strange.
> HBase also says it's OK.
>
> Profiler says most time thread is waiting for response from hbase side. My
> assumption is:
> 1. I have weird bug in HBase configuration
> 2. I have undiscovered problems with networking (BUT the same tomcats
> write data to flume with higher rate, no data loss at all)
> 3. I have weird problem with HConnection HConnectionManager is
> multithreaded env, when same servlet instance shared across many threads
> 4. some mystic process somewhere in the cluster....
>
> >Is the issue reproducible? or you got it first time?
> always. Spikes disappear during night, but RPM doesn't change too much.
>
> I will run my controller code out of tomcat and see how it goes. I'm going
> to isolate components...
>
>
> 2015-08-11 23:36 GMT+02:00 Vladimir Rodionov <vladrodio...@gmail.com>:
>
>> How about GC activity? ApplicationStopTime? Do you track that?
>>
>> Is the issue reproducible? or you got it first time?
>>
>> Start with RS logs and try to find anything suspicious in a period of a
>> very high latency. 1.5 sec HBase write latency does not look right.
>>
>> -Vlad
>>
>> On Tue, Aug 11, 2015 at 2:08 PM, Serega Sheypak <serega.shey...@gmail.com
>> >
>> wrote:
>>
>> > Hi Vladimir!
>> >
>> > Here are graphs. Servlet (3 tomcats on 3 different hosts write to HBase)
>> > http://www.bigdatapath.com/wp-content/uploads/2015/08/01_apps1.png
>> > See how response time jump. I can't explain it. Write load is
>> really-really
>> > low.
>> >
>> > all RS have even load. I see request-metrics in HBase master web UI.
>> > Tables are pre-splitted. I have 10 RS and pre-splitted tables on 50
>> > regions.
>> >
>> >   >1. How large is your single write?
>> > 1-2KB
>> >
>> >    >2. Do you see any RegionTooBusyException in a client log files
>> > no HBase related exceptions. Response
>> >
>> >  >  3. How large is your table ( # of regions, # of column families)
>> > 1 column familiy, table_01 150GB, table_02 130 GB
>> >
>> > I have two "major tables", here are stats for them:
>> > http://www.bigdatapath.com/wp-content/uploads/2015/08/table_02.png
>> > http://www.bigdatapath.com/wp-content/uploads/2015/08/table_01.png
>> >    >4. RS memory related config: Max heap
>> >
>> >    5. memstore size (if not default - 0.4)
>> > hbase.regionserver.global.memstore.upperLimit=0.4
>> > hbase.regionserver.global.memstore.lowerLimit=0.38
>> > RS heapsize=8GB
>> >
>> > >*Do you see any region splits?  *
>> > no, never happened since tables are pre-splitted
>> >
>> > 2015-08-11 18:54 GMT+02:00 Vladimir Rodionov <vladrodio...@gmail.com>:
>> >
>> > > *Common questions:*
>> > >
>> > >
>> > >    1. How large is your single write?
>> > >    2. Do you see any RegionTooBusyException in a client log files
>> > >    3. How large is your table ( # of regions, # of column families)
>> > >    4. RS memory related config: Max heap
>> > >    5. memstore size (if not default - 0.4)
>> > >
>> > >
>> > > Memstore flush
>> > >
>> > > hbase.hregion.memstore.flush.size = 256M
>> > > hbase.hregion.memstore.block.multiplier = N (do not block writes) N *
>> > 256M
>> > > MUST be greater than overall memstore size (HBASE_HEAPSIZE *
>> > > hbase.regionserver.global.memstore.size)
>> > >
>> > > WAL files.
>> > >
>> > > Set HDFS block size to 256MB. hbase.regionserver.hlog.blocksize = 0.95
>> > HDFS
>> > > block size (256MB * 0.95). Keep hbase.regionserver.hlog.blocksize *
>> > > hbase.regionserver.maxlogs just a bit above
>> > > hbase.regionserver.global.memstore.lowerLimit
>> > > (0.35-0.45) * HBASE_HEAPSIZE to avoid premature memstore flushing.
>> > >
>> > > *Do you see any region splits?  *
>> > >
>> > > Region split blocks writes. Try to presplit table and avoid splitting
>> > after
>> > > that. Disable splitting completely
>> > >
>> > > hbase.regionserver.region.split.policy
>> > > =org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy
>> > >
>> > > -Vlad
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, Aug 11, 2015 at 3:22 AM, Serega Sheypak <
>> > serega.shey...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi, we are using version 1.0.0+cdh5.4.4+160
>> > > > We have heavy write load, ~ 10K per econd
>> > > > We have 10 nodes 7 disks each. I read some perf notes, they state
>> that
>> > > > HBase can handle 1K per second writes per node without any problems.
>> > > >
>> > > >
>> > > > I see some spikes on "writers". Write operation timing "jumps" from
>> > > 40-50ms
>> > > > to 200-500ms Probably I hit memstore limit. RegionServer starts to
>> > flush
>> > > > memstore and stop to accept updates.
>> > > >
>> > > > I have several questions:
>> > > > 1. Does 4/(8 in hyperthreading) CPU + 7HDD node could absorb 1K
>> writes
>> > > per
>> > > > second?
>> > > > 2. What is the right way to fight with blocked writes?
>> > > > 2.1. What I did:
>> > > > hbase.hregion.memstore.flush.size to 256M to produce larger HFiles
>> when
>> > > > flushing memstore
>> > > > base.hregion.memstore.block.multiplier to 4, since I have only one
>> > > > intensive-write table. Let it grow
>> > > > hbase.regionserver.optionallogflushinterval to 10s, i CAN loose some
>> > > data,
>> > > > NP here. The idea that I reduce I/O pressure on disks.
>> > > > ===
>> > > > Not sure if I can correctly play with these parameters.
>> > > > hbase.hstore.blockingStoreFiles=10
>> > > > hbase.hstore.compactionThreshold=3
>> > > >
>> > >
>> >
>>
>
>

Re: Spikes when writing data to HBase

Reply via email to