Re: how to optimize for heavy writes scenario

Hef Thu, 23 Mar 2017 02:42:44 -0700

Hi Allan,
I didn't see any improvement either after decrease the compact threads
number or increase the memstore flush size. :(


How much write tps can your cluster handler per region server?

Thanks
Hef

On Wed, Mar 22, 2017 at 10:07 AM, Allan Yang <[email protected]> wrote:

> hbase.regionserver.thread.compaction.small = 30
> Am I seeing it right? You used 30 threads for small compaction. That's too
> much. For heavy writes scenario, you used too much resource to do
> compactions.
> We also have OpenTSDB running on HBase in our company. IMHO, the conf
> should like this:
> hbase.regionserver.thread.compaction.small = 1 or 2
> hbase.regionserver.thread.compaction.large = 1
> hbase.hstore.compaction.max = 20
> hbase.hstore.compaction.min/hbase.hstore.compactionThreshold= 8 or 10 in
> your config
> hbase.hregion.memstore.flush.size = 256MB or bigger, depend on the memory
> size, for writers like OpenTSDB, the data after encoding and compression is
> very small(by the way, have you set any encoding algo or compression
> algroritm on your table? If not, better do it now)
> hbase.regionserver.thread.compaction.throttle = 512MB
> These configs should decrease the frequency of compactions, and also
> decrease the resources(threads) compactions used.
> Maybe you can give a try.
>
>
> 2017-03-21 23:48 GMT+08:00 Dejan Menges <[email protected]>:
>
> > Regarding du -sk, take a look here
> > https://issues.apache.org/jira/browse/HADOOP-9884
> >
> > Also hardly waiting for this one to be fixed.
> >
> > On Tue, Mar 21, 2017 at 4:09 PM Hef <[email protected]> wrote:
> >
> > > There were several curious things we have observed:
> > > One the region servers, there were abnormal much more reads than
> writes:
> > > Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> > > sda             608.00      6552.00         0.00       6552          0
> > > sdb             345.00      2692.00     78868.00       2692      78868
> > > sdc             406.00     14548.00     63960.00      14548      63960
> > > sdd               2.00         0.00        32.00          0         32
> > > sde              62.00      8764.00         0.00       8764          0
> > > sdf             498.00     11100.00        32.00      11100         32
> > > sdg            2080.00     11712.00         0.00      11712          0
> > > sdh             109.00      5072.00         0.00       5072          0
> > > sdi             158.00         4.00     32228.00          4      32228
> > > sdj              43.00      5648.00        32.00       5648         32
> > > sdk             255.00      3784.00         0.00       3784          0
> > > sdl              86.00      1412.00      9176.00       1412       9176
> > >
> > > In CDH region server dashboard, the Average Disk IOPS for writes were
> > > stable on 735/s, while the reads raised from 900/s to 5000/s every 5
> > > minutes.
> > >
> > > iotop shown the following processes were eating the most io:
> > >  6447 be/4 hdfs        2.70 M/s    0.00 B/s  0.00 % 94.54 % du -sk
> > > /data/12/dfs/dn/curre~632-10.1.1.100-1457937043486
> > >  6023 be/4 hdfs        2.54 M/s    0.00 B/s  0.00 % 92.14 % du -sk
> > > /data/9/dfs/dn/curren~632-10.1.1.100-1457937043486
> > >  6186 be/4 hdfs     1379.58 K/s    0.00 B/s  0.00 % 90.78 % du -sk
> > > /data/11/dfs/dn/curre~632-10.1.1.100-1457937043486
> > >
> > > What were all this reading for? And what are thos du -sk processes?
> Could
> > > this be a reason to slow down the write throughput?
> > >
> > >
> > >
> > > On Tue, Mar 21, 2017 at 7:48 PM, Hef <[email protected]> wrote:
> > >
> > > > Hi guys,
> > > > Thanks for all your hints.
> > > > Let me summarize the tuning I have done these days.
> > > > Initially, before tuning, HBase cluster worked at an average write
> tps
> > of
> > > > 400k tps (600k tps at max). The total network TX throughputs from
> > > > clients(aggregated from multiple servers) to RegionServers  shown
> > 300Mb/s
> > > > in average.
> > > >
> > > > I adopted the following steps for tuning:
> > > > 1. optimized the HBase schema for our table, deducted the cells size
> by
> > > > 40%.
> > > >     Result:
> > > >     failed,  tps not obviously increased
> > > >
> > > > 2. Recreated the table by more evenly distribution of pre-split
> > keyspace
> > > >     Result:
> > > >     failed, tps not obviously increased
> > > >
> > > > 3. Adjusted RS GC strategy:
> > > >     Before:
> > > >         -XX:+UseParNewGC
> > > >         -XX:+UseConcMarkSweepGC
> > > >         -XX:CMSInitiatingOccupancyFraction=70
> > > >         -XX:+CMSParallelRemarkEnabled
> > > >         -Xmx100g
> > > >         -Xms100g
> > > >         -Xmn20g
> > > >
> > > >     After:
> > > >         -XX:+UseG1GC
> > > >         -XX:+UnlockExperimentalVMOptions
> > > >         -XX:MaxGCPauseMillis=50
> > > >         -XX:-OmitStackTraceInFastThrow
> > > >         -XX:ParallelGCThreads=18
> > > >         -XX:+ParallelRefProcEnabled
> > > >         -XX:+PerfDisableSharedMem
> > > >         -XX:-ResizePLAB
> > > >         -XX:G1NewSizePercent=8
> > > >         -Xms100G -Xmx100G
> > > >         -XX:MaxTenuringThreshold=1
> > > >         -XX:G1HeapWastePercent=10
> > > >         -XX:G1MixedGCCountTarget=16
> > > >         -XX:G1HeapRegionSize=32M
> > > >
> > > >     Result:
> > > >     Success. GC pause time reduced, tps increased by at least 10%
> > > >
> > > > 4. Upgraded to CDH5.9.1 HBase 1.2, also updated client lib to
> HBase1.2
> > > >     Success:
> > > >     1. total client TX  throughput raised to 700Mb/s
> > > >     2. HBase write tps raised to 600k/s in average and 800k/s at max
> > > >
> > > > 5. Other configurations:
> > > >     hbase.hstore.compactionThreshold = 10
> > > >     hbase.hstore.blockingStoreFiles = 300
> > > >     hbase.hstore.compaction.max = 20
> > > >     hbase.regionserver.thread.compaction.small = 30
> > > >
> > > >     hbase.hregion.memstore.flush.size = 128
> > > >     hbase.regionserver.global.memstore.lowerLimit = 0.3
> > > >     hbase.regionserver.global.memstore.upperLimit = 0.7
> > > >
> > > >     hbase.regionserver.maxlogs = 100
> > > >     hbase.wal.regiongrouping.numgroups = 5
> > > >     hbase.wal.provider = Multiple HDFS WAL
> > > >
> > > >
> > > >
> > > > Summary:
> > > >     1. HBase 1.2 does have better performance than 1.0
> > > >     2. 300k/s tps per RegionServer still looks not satisfied, as I
> can
> > > see
> > > > the CPU/network/IO/memory  still have a lot idle resources.
> > > >         Per RS:
> > > >         1. CPU 50% used (Not sure why cpu is so high for only 300K
> > writer
> > > > requests)
> > > >         2. JVM Heap, 40% used
> > > >         3. total disks throughput over 12 HDDs, 91MB/s on write and
> > > 40MB/s
> > > > on read
> > > >         4. Network in/out 560Mb/s on 1G NIC
> > > >
> > > >
> > > > Further questions:
> > > > Does anyone confront a similiar heavy write scenario like this?
> > > > How much concurrent writes can a RegionServer handle?  Can any one
> > share
> > > > how much tps can your RS reach at max?
> > > >
> > > > Thanks
> > > > Hef
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, Mar 18, 2017 at 1:11 PM, Yu Li <[email protected]> wrote:
> > > >
> > > >> First please try out stack's suggestion, all good ones.
> > > >>
> > > >> And some supplement: since all disks in use are HDD w/ normal IO
> > > >> capability, it's important to control big IO rate like flush and
> > > >> compaction. Try below features out:
> > > >> 1. HBASE-8329 <https://issues.apache.org/jira/browse/HBASE-8329>:
> > Limit
> > > >> compaction speed (available in 1.1.0+)
> > > >> 2. HBASE-14969 <https://issues.apache.org/jira/browse/HBASE-14969>:
> > Add
> > > >> throughput controller for flush (available in 1.3.0)
> > > >> 3. HBASE-10201 <https://issues.apache.org/jira/browse/HBASE-10201>:
> > Per
> > > >> column family flush (available in 1.1.0+)
> > > >>     * HBASE-14906 <https://issues.apache.org/
> jira/browse/HBASE-14906
> > >:
> > > >> Improvements on FlushLargeStoresPolicy (only available in 2.0, not
> > > >> released
> > > >> yet)
> > > >>
> > > >> Also try out multiple WAL, we observed ~20% write perf boost in
> prod.
> > > See
> > > >> more details in the doc attached in below JIRA:
> > > >> - HBASE-14457 <https://issues.apache.org/jira/browse/HBASE-14457>:
> > > >> Umbrella:
> > > >> Improve Multiple WAL for production usage
> > > >>
> > > >> And please note that if you decided to pick up a branch-1.1 release,
> > > make
> > > >> sure to use 1.1.3+, or you may hit some perf regression issue on
> > writes,
> > > >> see HBASE-14460 <https://issues.apache.org/jira/browse/HBASE-14460>
> > for
> > > >> more details.
> > > >>
> > > >> Hope these information helps.
> > > >>
> > > >> Best Regards,
> > > >> Yu
> > > >>
> > > >> On 18 March 2017 at 05:51, Vladimir Rodionov <
> [email protected]>
> > > >> wrote:
> > > >>
> > > >> > >> In my opinion,  1M/s input data will result in only  70MByte/s
> > > write
> > > >> >
> > > >> > Times 3 (default HDFS replication factor) Plus ...
> > > >> >
> > > >> > Do not forget about compaction read/write amplification. If you
> > flush
> > > >> 10 MB
> > > >> > and your max region size is 10 GB, with default min file to
> compact
> > > (3)
> > > >> > your amplification is 6-7 That gives us 70 x 3 x 6 = 1260 MB/s
> > > >> read/write
> > > >> > or 210 MB/sec read and writes (210 MB/s reads and 210 MB/sec
> writes)
> > > >> >
> > > >> > per RS
> > > >> >
> > > >> > This IO load is way above sustainable.
> > > >> >
> > > >> >
> > > >> > -Vlad
> > > >> >
> > > >> >
> > > >> > On Fri, Mar 17, 2017 at 2:14 PM, Kevin O'Dell <[email protected]>
> > > wrote:
> > > >> >
> > > >> > > Hey Hef,
> > > >> > >
> > > >> > >   What is the memstore size setting(how much heap is it allowed)
> > > that
> > > >> you
> > > >> > > have on that cluster?  What is your region count per node?  Are
> > you
> > > >> > writing
> > > >> > > evenly across all those regions or are only a few regions active
> > per
> > > >> > region
> > > >> > > server at a time?  Can you paste your GC settings that you are
> > > >> currently
> > > >> > > using?
> > > >> > >
> > > >> > > On Fri, Mar 17, 2017 at 3:30 PM, Stack <[email protected]>
> wrote:
> > > >> > >
> > > >> > > > On Fri, Mar 17, 2017 at 9:31 AM, Hef <[email protected]>
> > > wrote:
> > > >> > > >
> > > >> > > > > Hi group,
> > > >> > > > > I'm using HBase to store large amount of time series data,
> the
> > > >> usage
> > > >> > > case
> > > >> > > > > is heavy on writes then reads. My application stops at
> writing
> > > >> 600k
> > > >> > > > > requests per second and I can't tune up for better tps.
> > > >> > > > >
> > > >> > > > > Hardware:
> > > >> > > > > I have 6 Region Servers, each has 128G memory, 12 HDDs,
> 2cores
> > > >> with
> > > >> > > > > 24threads,
> > > >> > > > >
> > > >> > > > > Schema:
> > > >> > > > > The schema for these time series data is similar as OpenTSDB
> > > that
> > > >> the
> > > >> > > > data
> > > >> > > > > points of a same metric within an hour are store in one row,
> > and
> > > >> > there
> > > >> > > > > could be maximum 3600 columns per row.
> > > >> > > > > The cell is about 70bytes on its size, including the rowkey,
> > > >> column
> > > >> > > > > qualifier, column family and value.
> > > >> > > > >
> > > >> > > > > HBase config:
> > > >> > > > > CDH 5.6 HBase 1.0.0
> > > >> > > > >
> > > >> > > >
> > > >> > > > Can you upgrade? There's a big diff between 1.2 and 1.0.
> > > >> > > >
> > > >> > > >
> > > >> > > > > 100G memory for each RegionServer
> > > >> > > > > hbase.hstore.compactionThreshold = 50
> > > >> > > > > hbase.hstore.blockingStoreFiles = 100
> > > >> > > > > hbase.hregion.majorcompaction disable
> > > >> > > > > hbase.client.write.buffer = 20MB
> > > >> > > > > hbase.regionserver.handler.count = 100
> > > >> > > > >
> > > >> > > >
> > > >> > > > Could try halving the handler count.
> > > >> > > >
> > > >> > > >
> > > >> > > > > hbase.hregion.memstore.flush.size = 128MB
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > Why are you flushing? If it is because you are hitting this
> > > flush
> > > >> > > limit,
> > > >> > > > can you try upping it?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > > HBase Client:
> > > >> > > > > write in BufferedMutator with 100000/batch
> > > >> > > > >
> > > >> > > > > Inputs Volumes:
> > > >> > > > > The input data throughput is more than 2millions/sec from
> > Kafka
> > > >> > > > >
> > > >> > > > >
> > > >> > > > How is the distribution? Evenly over the keyspace?
> > > >> > > >
> > > >> > > >
> > > >> > > > > My writer applications are distributed, how ever I scaled
> them
> > > up,
> > > >> > the
> > > >> > > > > total write throughput won't get larger than 600K/sec.
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > Tell us more about this scaling up? How many writers?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > > The severs have 20% CPU usage and 5.6 wa,
> > > >> > > > >
> > > >> > > >
> > > >> > > > 5.6 is high enough. Is the i/o spread over the disks?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > > GC  doesn't look good though, it shows a lot 10s+.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > What settings do you have?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > > In my opinion,  1M/s input data will result in only
> 70MByte/s
> > > >> write
> > > >> > > > > throughput to the cluster, which is quite a small amount
> > compare
> > > >> to
> > > >> > > the 6
> > > >> > > > > region servers. The performance should not be bad like this.
> > > >> > > > >
> > > >> > > > > Is anybody has idea why the performance stops at 600K/s?
> > > >> > > > > Is there anything I have to tune to increase the HBase write
> > > >> > > throughput?
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > If you double the clients writing do you see an up in the
> > > >> throughput?
> > > >> > > >
> > > >> > > > If you thread dump the servers, can you tell where they are
> held
> > > >> up? Or
> > > >> > > if
> > > >> > > > they are doing any work at all relative?
> > > >> > > >
> > > >> > > > St.Ack
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Kevin O'Dell
> > > >> > > Field Engineer
> > > >> > > 850-496-1298 <(850)%20496-1298> | [email protected]
> > > >> > > @kevinrodell
> > > >> > > <http://www.rocana.com>
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: how to optimize for heavy writes scenario

Reply via email to