Hi, thanks! > hbase.hstore.blockingStoreFiles Don't understand the idea of this setting, can I find explanation for "dummies"?
>hbase.hregion.majorcompaction done already >DATA_BLOCK_ENCODING, SNAPPY I always use it by default, CPU OK > memstore flush size done >I assume only the 300g partitions are mirrored, right? (not the entire 2t drive) Aha >Can you add more machines? Will do it when earn money. Thank you :) 2015-05-24 21:42 GMT+03:00 lars hofhansl <la...@apache.org>: > Yeah, all you can do is drive your write amplification down. > > > As Stack said: > - Increase hbase.hstore.compactionThreshold, and > hbase.hstore.blockingStoreFiles. It'll hurt read, but in your case read is > already significantly hurt when compactions happen. > > > - Absolutely set hbase.hregion.majorcompaction to 1 week (with a jitter if > 1/2 week, that's the default in 0.98 and later). Minor compaction will > still happen, based on the compactionThreshold setting. Right now you're > rewriting _all_ you data _every_ day. > > > - Turning off WAL writing will safe you IO, but I doubt it'll help much. I > do not expect async WAL helps a lot as the aggregate IO is still the same. > > - See if you can enable DATA_BLOCK_ENCODING on your column families > (FAST_DIFF, or PREFIX are good). You can also try SNAPPY compression. That > would reduce you overall IO (Since your CPUs are also weak you'd have to > test the CPU/IO tradeoff) > > > - If you have RAM to spare, increase the memstore flush size (will lead to > initially larger and fewer files). > > > - Or (again if you have spare RAM) make your regions smaller, to curb > write amplification. > > > - I assume only the 300g partitions are mirrored, right? (not the entire > 2t drive) > > > I have some suggestions compiled here (if you don't mind the plug): > http://hadoop-hbase.blogspot.com/2015/05/my-hbasecon-talk-about-hbase.html > > Other than that, I'll repeat what others said, you have 14 extremely weak > machines, you can't expect the world from this. > You're aggregate IOPS are less than 3000, you aggregate IO bandwidth > ~3GB/s. Can you add more machines? > > > -- Lars > > ________________________________ > From: Serega Sheypak <serega.shey...@gmail.com> > To: user <user@hbase.apache.org> > Sent: Friday, May 22, 2015 3:45 AM > Subject: Re: Optimizing compactions on super-low-cost HW > > > We don't have money, these nodes are the cheapest. I totally agree that we > need 4-6 HDD, but there is no chance to get it unfortunately. > Okay, I'll try yo apply Stack suggestions. > > > > > 2015-05-22 13:00 GMT+03:00 Michael Segel <michael_se...@hotmail.com>: > > > Look, to be blunt, you’re screwed. > > > > If I read your cluster spec.. it sounds like you have a single i7 (quad > > core) cpu. That’s 4 cores or 8 threads. > > > > Mirroring the OS is common practice. > > Using the same drives for Hadoop… not so good, but once the sever boots > > up… not so much I/O. > > Its not good, but you could live with it…. > > > > Your best bet is to add a couple of more spindles. Ideally you’d want to > > have 6 drives. the 2 OS drives mirrored and separate. (Use the extra > space > > to stash / write logs.) Then have 4 drives / spindles in JBOD for Hadoop. > > This brings you to a 1:1 on physical cores. If your box can handle more > > spindles, then going to a total of 10 drives would improve performance > > further. > > > > However, you need to level set your expectations… you can only go so far. > > If you have 4 drives spinning, you could start to saturate a 1GbE > network > > so that will hurt performance. > > > > That’s pretty much your only option in terms of fixing the hardware and > > then you have to start tuning. > > > > > On May 21, 2015, at 4:04 PM, Stack <st...@duboce.net> wrote: > > > > > > On Thu, May 21, 2015 at 1:04 AM, Serega Sheypak < > > serega.shey...@gmail.com> > > > wrote: > > > > > >>> Do you have the system sharing > > >> There are 2 HDD 7200 2TB each. There is 300GB OS partition on each > drive > > >> with mirroring enabled. I can't persuade devops that mirroring could > > cause > > >> IO issues. What arguments can I bring? They use OS partition mirroring > > when > > >> disck fails, we can use other partition to boot OS and continue to > > work... > > >> > > >> > > > You are already compromised i/o-wise having two disks only. I have not > > the > > > experience to say for sure but basic physics would seem to dictate that > > > having your two disks (partially) mirrored compromises your i/o even > > more. > > > > > > You are in a bit of a hard place. Your operators want the machine to > boot > > > even after it loses 50% of its disk. > > > > > > > > >>> Do you have to compact? In other words, do you have read SLAs? > > >> Unfortunately, I have mixed workload from web applications. I need to > > write > > >> and read and SLA is < 50ms. > > >> > > >> > > > Ok. You get the bit that seeks are about 10ms or each so with two disks > > you > > > can do 2x100 seeks a second presuming no one else is using disk. > > > > > > > > >>> How are your read times currently? > > >> Cloudera manager says it's 4K reads per second and 500 writes per > second > > >> > > >>> Does your working dataset fit in RAM or do > > >> reads have to go to disk? > > >> I have several tables for 500GB each and many small tables 10-20 GB. > > Small > > >> tables loaded hourly/daily using bulkload (prepare HFiles using MR and > > move > > >> them to HBase using utility). Big tables are used by webapps, they > read > > and > > >> write them. > > >> > > >> > > > These hfiles are created on same cluster with MR? (i.e. they are using > up > > > i/os) > > > > > > > > >>> It looks like you are running at about three storefiles per column > > family > > >> is it hbase.hstore.compactionThreshold=3? > > >> > > > > > > > > >>> What if you upped the threshold at which minors run? > > >> you mean bump hbase.hstore.compactionThreshold to 8 or 10? > > >> > > >> > > > Yes. > > > > > > Downside is that your reads may require more seeks to find a keyvalue. > > > > > > Can you cache more? > > > > > > Can you make it so files are bigger before you flush? > > > > > > > > > > > >>> Do you have a downtime during which you could schedule compactions? > > >> Unfortunately no. It should work 24/7 and sometimes it doesn't do it. > > >> > > >> > > > So, it is running at full bore 24/7? There is no 'downtime'... a time > > when > > > the traffic is not so heavy? > > > > > > > > > > > >>> Are you managing the major compactions yourself or are you having > > hbase do > > >> it for you? > > >> HBase, once a day hbase.hregion.majorcompaction=1day > > >> > > >> > > > Have you studied your compactions? You realize that a major compaction > > > will do full rewrite of your dataset? When they run, how many > storefiles > > > are there? > > > > > > Do you have to run once a day? Can you not run once a week? Can you > > > manage the compactions yourself... and run them a region at a time in a > > > rolling manner across the cluster rather than have them just run > whenever > > > it suits them once a day? > > > > > > > > > > > >> I can disable WAL. It's ok to loose some data in case of RS failure. > I'm > > >> not doing banking transactions. > > >> If I disable WAL, could it help? > > >> > > >> > > > It could but don't. Enable deferring sync'ing first if you can 'lose' > > some > > > data. > > > > > > Work on your flushing and compactions before you mess w/ WAL. > > > > > > What version of hbase are you on? You say CDH but the newer your hbase, > > the > > > better it does generally. > > > > > > St.Ack > > > > > > > > > > > > > > > > > >> 2015-05-20 18:04 GMT+03:00 Stack <st...@duboce.net>: > > >> > > >>> On Mon, May 18, 2015 at 4:26 PM, Serega Sheypak < > > >> serega.shey...@gmail.com> > > >>> wrote: > > >>> > > >>>> Hi, we are using extremely cheap HW: > > >>>> 2 HHD 7200 > > >>>> 4*2 core (Hyperthreading) > > >>>> 32GB RAM > > >>>> > > >>>> We met serious IO performance issues. > > >>>> We have more or less even distribution of read/write requests. The > > same > > >>> for > > >>>> datasize. > > >>>> > > >>>> ServerName Request Per Second Read Request Count Write Request Count > > >>>> node01.domain.com,60020,1430172017193 195 171871826 16761699 > > >>>> node02.domain.com,60020,1426925053570 24 34314930 16006603 > > >>>> node03.domain.com,60020,1430860939797 22 32054801 16913299 > > >>>> node04.domain.com,60020,1431975656065 33 1765121 253405 > > >>>> node05.domain.com,60020,1430484646409 27 42248883 16406280 > > >>>> node07.domain.com,60020,1426776403757 27 36324492 16299432 > > >>>> node08.domain.com,60020,1426775898757 26 38507165 13582109 > > >>>> node09.domain.com,60020,1430440612531 27 34360873 15080194 > > >>>> node11.domain.com,60020,1431989669340 28 44307 13466 > > >>>> node12.domain.com,60020,1431927604238 30 5318096 2020855 > > >>>> node13.domain.com,60020,1431372874221 29 31764957 15843688 > > >>>> node14.domain.com,60020,1429640630771 41 36300097 13049801 > > >>>> > > >>>> ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed > > >>>> Storefile > > >>>> Size Index Size Bloom Size > > >>>> node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb > 641849k > > >>>> 310111k > > >>>> node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb > 649610k > > >>>> 318854k > > >>>> node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb > 627346k > > >>>> 307136k > > >>>> node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb > 655954k > > >>>> 289316k > > >>>> node05.domain.com,60020,1430484646409 82 185 1111807m 81474mb > 688136k > > >>>> 334127k > > >>>> node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb > 631774k > > >>>> 296169k > > >>>> node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb > 681486k > > >>>> 312325k > > >>>> node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb > 658924k > > >>>> 309734k > > >>>> node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb > 664753k > > >>>> 264081k > > >>>> node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb > 652970k > > >>>> 304137k > > >>>> node13.domain.com,60020,1431372874221 82 178 937557m 70042mb > 601684k > > >>>> 257607k > > >>>> node14.domain.com,60020,1429640630771 82 145 949090m 69749mb > 592812k > > >>>> 266677k > > >>>> > > >>>> > > >>>> When compaction starts random node gets I/O 100%, io wait for > > seconds, > > >>>> even tenth of seconds. > > >>>> > > >>>> What are the approaches to optimize minor and major compactions when > > >> you > > >>>> are I/O bound..? > > >>>> > > >>> > > >>> Yeah, with two disks, you will be crimped. Do you have the system > > sharing > > >>> with hbase/hdfs or is hdfs running on one disk only? > > >>> > > >>> Do you have to compact? In other words, do you have read SLAs? How > are > > >>> your read times currently? Does your working dataset fit in RAM or > do > > >>> reads have to go to disk? It looks like you are running at about > three > > >>> storefiles per column family. What if you upped the threshold at > which > > >>> minors run? Do you have a downtime during which you could schedule > > >>> compactions? Are you managing the major compactions yourself or are > you > > >>> having hbase do it for you? > > >>> > > >>> St.Ack > > >>> > > >> > > > > >