The major issue with HBase compactions not an excessive CPU or IO usage but excessive temporary (garbage) objects creation, which results in a more frequent GC failures and in a some cases - RS shut downs due to long GC pauses.
That is why so important to keep compactions under control: disable automatic major compactions and spits, perform those manually during off peak hours, for example. -Vlad On Fri, May 5, 2017 at 7:56 AM, Alexander Ilyin <[email protected]> wrote: > Kevin, > > Thanks for your answer. We're using Ambari to manage our cluster. > > I see an increase of CPU usage and IO but it's not a big one. And this > increase tends to be at the beginning of off-peak window although it's > difficult to tell for sure since our workload comes in bursts and the > picture is not clear. That's why I was asking if there are some metrics > related specifically to compaction. But probably I can shorten the window. > > As for region sizes, I will experiment, as you suggest. > > > On Fri, May 5, 2017 at 4:07 PM, Kevin O'Dell <[email protected]> wrote: > > > Alexander, > > > > That is a great series of questions. What are you using for > > instrumentation of your HBase cluster? Cloudera Manager, Ambari, > Ganglia, > > Cacti, etc? You are really asking a lot of performance based metric > > questions. I don't think you will be able to answer your questions > without > > first being able to answer these questions: > > > > Do you see the Major Compaction I/O/CPU/Memory spikes throughout the > whole > > "off-peak" window? > > > > Do you have the host resources overhead to add additional compaction > > threads to shorten it if so? > > > > What do your responses times look like during your "off-peak hours" are > you > > still within your SLAs? > > > > Answering these questions should quickly allow you to answer your first > two > > questions. Your last question is very interesting: > > > > *how much degrades my performance if region size is becoming too large? > <-- > > This is 100% depends, it depends on your environment, I/O usage, SLAs > etc, > > I am not sure if anyone has done documented compaction times based on > > Region sizes. You may have to do some trial and error here. > > > > I hope this helps! > > > > > > > > On Fri, May 5, 2017 at 8:47 AM, Alexander Ilyin <[email protected]> > > wrote: > > > > > Hi, > > > > > > Tuning HBase performance I've found a lot of settings which affect > > > compaction process (off-peak hours, time between compactions, > compaction > > > ratio, region sizes, etc.). They all seem to be useful and there are > > > recommendations in the doc saying which values to set. But I found no > way > > > to assess how they actually affect my cluster performance, i.e. how > much > > > resources is taken by compaction and when. I would like to figure out > > which > > > settings work best for my dataset and my specific workload but with > only > > > general recommendations in hand it seems difficult to do. > > > > > > For example, I have difficulties answering the following questions: > > > * can I shorten my off-peak hours range? > > > * can I afford to do compactions more often? or more aggressively? > > > * how much degrades my performance if region size is becoming too > large? > > > > > > HBase version I'm using is 1.1.2 > > > > > > > > > Alexander > > > > > > > > > > > -- > > Kevin O'Dell > > Field Engineer > > 850-496-1298 | [email protected] > > @kevinrodell > > <http://www.rocana.com> > > >
