Re: Optimizing compactions on super-low-cost HW

Serega Sheypak Sun, 24 May 2015 13:01:08 -0700

Hi, thanks!
> hbase.hstore.blockingStoreFiles
Don't understand the idea of this setting, can I find explanation for
"dummies"?


>hbase.hregion.majorcompaction
done already

>DATA_BLOCK_ENCODING, SNAPPY
I always use it by default, CPU OK

> memstore flush size
done


>I assume only the 300g partitions are mirrored, right? (not the entire 2t
drive)
Aha

>Can you add more machines?
Will do it when earn money.
Thank you :)

2015-05-24 21:42 GMT+03:00 lars hofhansl <la...@apache.org>:

> Yeah, all you can do is drive your write amplification down.
>
>
> As Stack said:
> - Increase hbase.hstore.compactionThreshold, and
> hbase.hstore.blockingStoreFiles. It'll hurt read, but in your case read is
> already significantly hurt when compactions happen.
>
>
> - Absolutely set hbase.hregion.majorcompaction to 1 week (with a jitter if
> 1/2 week, that's the default in 0.98 and later). Minor compaction will
> still happen, based on the compactionThreshold setting. Right now you're
> rewriting _all_ you data _every_ day.
>
>
> - Turning off WAL writing will safe you IO, but I doubt it'll help much. I
> do not expect async WAL helps a lot as the aggregate IO is still the same.
>
> - See if you can enable DATA_BLOCK_ENCODING on your column families
> (FAST_DIFF, or PREFIX are good). You can also try SNAPPY compression. That
> would reduce you overall IO (Since your CPUs are also weak you'd have to
> test the CPU/IO tradeoff)
>
>
> - If you have RAM to spare, increase the memstore flush size (will lead to
> initially larger and fewer files).
>
>
> - Or (again if you have spare RAM) make your regions smaller, to curb
> write amplification.
>
>
> - I assume only the 300g partitions are mirrored, right? (not the entire
> 2t drive)
>
>
> I have some suggestions compiled here (if you don't mind the plug):
> http://hadoop-hbase.blogspot.com/2015/05/my-hbasecon-talk-about-hbase.html
>
> Other than that, I'll repeat what others said, you have 14 extremely weak
> machines, you can't expect the world from this.
> You're aggregate IOPS are less than 3000, you aggregate IO bandwidth
> ~3GB/s. Can you add more machines?
>
>
> -- Lars
>
> ________________________________
> From: Serega Sheypak <serega.shey...@gmail.com>
> To: user <user@hbase.apache.org>
> Sent: Friday, May 22, 2015 3:45 AM
> Subject: Re: Optimizing compactions on super-low-cost HW
>
>
> We don't have money, these nodes are the cheapest. I totally agree that we
> need 4-6 HDD, but there is no chance to get it unfortunately.
> Okay, I'll try yo apply Stack suggestions.
>
>
>
>
> 2015-05-22 13:00 GMT+03:00 Michael Segel <michael_se...@hotmail.com>:
>
> > Look, to be blunt, you’re screwed.
> >
> > If I read your cluster spec.. it sounds like you have a single i7 (quad
> > core) cpu. That’s 4 cores or 8 threads.
> >
> > Mirroring the OS is common practice.
> > Using the same drives for Hadoop… not so good, but once the sever boots
> > up… not so much I/O.
> > Its not good, but you could live with it….
> >
> > Your best bet is to add a couple of more spindles. Ideally you’d want to
> > have 6 drives. the 2 OS drives mirrored and separate. (Use the extra
> space
> > to stash / write logs.) Then have 4 drives / spindles in JBOD for Hadoop.
> > This brings you to a 1:1 on physical cores.  If your box can handle more
> > spindles, then going to a total of 10 drives would improve performance
> > further.
> >
> > However, you need to level set your expectations… you can only go so far.
> > If you have 4 drives spinning,  you could start to saturate a 1GbE
> network
> > so that will hurt performance.
> >
> > That’s pretty much your only option in terms of fixing the hardware and
> > then you have to start tuning.
> >
> > > On May 21, 2015, at 4:04 PM, Stack <st...@duboce.net> wrote:
> > >
> > > On Thu, May 21, 2015 at 1:04 AM, Serega Sheypak <
> > serega.shey...@gmail.com>
> > > wrote:
> > >
> > >>> Do you have the system sharing
> > >> There are 2 HDD 7200 2TB each. There is 300GB OS partition on each
> drive
> > >> with mirroring enabled. I can't persuade devops that mirroring could
> > cause
> > >> IO issues. What arguments can I bring? They use OS partition mirroring
> > when
> > >> disck fails, we can use other partition to boot OS and continue to
> > work...
> > >>
> > >>
> > > You are already compromised i/o-wise having two disks only. I have not
> > the
> > > experience to say for sure but basic physics would seem to dictate that
> > > having your two disks (partially) mirrored compromises your i/o even
> > more.
> > >
> > > You are in a bit of a hard place. Your operators want the machine to
> boot
> > > even after it loses 50% of its disk.
> > >
> > >
> > >>> Do you have to compact? In other words, do you have read SLAs?
> > >> Unfortunately, I have mixed workload from web applications. I need to
> > write
> > >> and read and SLA is < 50ms.
> > >>
> > >>
> > > Ok. You get the bit that seeks are about 10ms or each so with two disks
> > you
> > > can do 2x100 seeks a second presuming no one else is using disk.
> > >
> > >
> > >>> How are your read times currently?
> > >> Cloudera manager says it's 4K reads per second and 500 writes per
> second
> > >>
> > >>> Does your working dataset fit in RAM or do
> > >> reads have to go to disk?
> > >> I have several tables for 500GB each and many small tables 10-20 GB.
> > Small
> > >> tables loaded hourly/daily using bulkload (prepare HFiles using MR and
> > move
> > >> them to HBase using utility). Big tables are used by webapps, they
> read
> > and
> > >> write them.
> > >>
> > >>
> > > These hfiles are created on same cluster with MR? (i.e. they are using
> up
> > > i/os)
> > >
> > >
> > >>> It looks like you are running at about three storefiles per column
> > family
> > >> is it hbase.hstore.compactionThreshold=3?
> > >>
> > >
> > >
> > >>> What if you upped the threshold at which minors run?
> > >> you mean bump  hbase.hstore.compactionThreshold to 8 or 10?
> > >>
> > >>
> > > Yes.
> > >
> > > Downside is that your reads may require more seeks to find a keyvalue.
> > >
> > > Can you cache more?
> > >
> > > Can you make it so files are bigger before you flush?
> > >
> > >
> > >
> > >>> Do you have a downtime during which you could schedule compactions?
> > >> Unfortunately no. It should work 24/7 and sometimes it doesn't do it.
> > >>
> > >>
> > > So, it is running at full bore 24/7?  There is no 'downtime'... a time
> > when
> > > the traffic is not so heavy?
> > >
> > >
> > >
> > >>> Are you managing the major compactions yourself or are you having
> > hbase do
> > >> it for you?
> > >> HBase, once a day hbase.hregion.majorcompaction=1day
> > >>
> > >>
> > > Have you studied your compactions?  You realize that a major compaction
> > > will do full rewrite of your dataset?  When they run, how many
> storefiles
> > > are there?
> > >
> > > Do you have to run once a day?  Can you not run once a week?  Can you
> > > manage the compactions yourself... and run them a region at a time in a
> > > rolling manner across the cluster rather than have them just run
> whenever
> > > it suits them once a day?
> > >
> > >
> > >
> > >> I can disable WAL. It's ok to loose some data in case of RS failure.
> I'm
> > >> not doing banking transactions.
> > >> If I disable WAL, could it help?
> > >>
> > >>
> > > It could but don't. Enable deferring sync'ing first if you can 'lose'
> > some
> > > data.
> > >
> > > Work on your flushing and compactions before you mess w/ WAL.
> > >
> > > What version of hbase are you on? You say CDH but the newer your hbase,
> > the
> > > better it does generally.
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > >
> > >> 2015-05-20 18:04 GMT+03:00 Stack <st...@duboce.net>:
> > >>
> > >>> On Mon, May 18, 2015 at 4:26 PM, Serega Sheypak <
> > >> serega.shey...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hi, we are using extremely cheap HW:
> > >>>> 2 HHD 7200
> > >>>> 4*2 core (Hyperthreading)
> > >>>> 32GB RAM
> > >>>>
> > >>>> We met serious IO performance issues.
> > >>>> We have more or less even distribution of read/write requests. The
> > same
> > >>> for
> > >>>> datasize.
> > >>>>
> > >>>> ServerName Request Per Second Read Request Count Write Request Count
> > >>>> node01.domain.com,60020,1430172017193 195 171871826 16761699
> > >>>> node02.domain.com,60020,1426925053570 24 34314930 16006603
> > >>>> node03.domain.com,60020,1430860939797 22 32054801 16913299
> > >>>> node04.domain.com,60020,1431975656065 33 1765121 253405
> > >>>> node05.domain.com,60020,1430484646409 27 42248883 16406280
> > >>>> node07.domain.com,60020,1426776403757 27 36324492 16299432
> > >>>> node08.domain.com,60020,1426775898757 26 38507165 13582109
> > >>>> node09.domain.com,60020,1430440612531 27 34360873 15080194
> > >>>> node11.domain.com,60020,1431989669340 28 44307 13466
> > >>>> node12.domain.com,60020,1431927604238 30 5318096 2020855
> > >>>> node13.domain.com,60020,1431372874221 29 31764957 15843688
> > >>>> node14.domain.com,60020,1429640630771 41 36300097 13049801
> > >>>>
> > >>>> ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed
> > >>>> Storefile
> > >>>> Size Index Size Bloom Size
> > >>>> node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb
> 641849k
> > >>>> 310111k
> > >>>> node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb
> 649610k
> > >>>> 318854k
> > >>>> node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb
> 627346k
> > >>>> 307136k
> > >>>> node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb
> 655954k
> > >>>> 289316k
> > >>>> node05.domain.com,60020,1430484646409 82 185 1111807m 81474mb
> 688136k
> > >>>> 334127k
> > >>>> node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb
> 631774k
> > >>>> 296169k
> > >>>> node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb
> 681486k
> > >>>> 312325k
> > >>>> node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb
> 658924k
> > >>>> 309734k
> > >>>> node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb
> 664753k
> > >>>> 264081k
> > >>>> node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb
> 652970k
> > >>>> 304137k
> > >>>> node13.domain.com,60020,1431372874221 82 178 937557m 70042mb
> 601684k
> > >>>> 257607k
> > >>>> node14.domain.com,60020,1429640630771 82 145 949090m 69749mb
> 592812k
> > >>>> 266677k
> > >>>>
> > >>>>
> > >>>> When compaction starts  random node gets I/O 100%, io wait for
> > seconds,
> > >>>> even tenth of seconds.
> > >>>>
> > >>>> What are the approaches to optimize minor and major compactions when
> > >> you
> > >>>> are I/O bound..?
> > >>>>
> > >>>
> > >>> Yeah, with two disks, you will be crimped. Do you have the system
> > sharing
> > >>> with hbase/hdfs or is hdfs running on one disk only?
> > >>>
> > >>> Do you have to compact? In other words, do you have read SLAs?  How
> are
> > >>> your read times currently?  Does your working dataset fit in RAM or
> do
> > >>> reads have to go to disk?  It looks like you are running at about
> three
> > >>> storefiles per column family.  What if you upped the threshold at
> which
> > >>> minors run? Do you have a downtime during which you could schedule
> > >>> compactions? Are you managing the major compactions yourself or are
> you
> > >>> having hbase do it for you?
> > >>>
> > >>> St.Ack
> > >>>
> > >>
> >
> >
>

Re: Optimizing compactions on super-low-cost HW

Reply via email to