This is not an answer to your question, but just an anecdote on cluster pauses/slowdowns. We had horrible problems with cluster wide pauses. I think there were several keys to getting this resolved:
1) we used the default settings recommended for bulk inserts: http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf 2) we upgraded to hbase 20.6 b/c there was a deadlock bug in prior versions that basically just caused the entire cluster to "go to sleep" Finally, we had a very strange problem which took 3 weeks of debugging to get to the bottom of. I don't expect that this is your problem, but I'll just throw it out there. Most bulk HBase data-producing M/R jobs are going to do some processing, then write the data from the reducer into hbase (using autoflush=false and disabling the WAL). Since the reducers all receive keys in the same order, this causes all the reducers to load the same HBase region simultaneously. We had this "great idea" that if we reversed the keys that we wrote out of our mapper, then un-reversed them in the reducer, that our reducers would be randomly writing to different region servers, not hitting a single region in lock step. Now, I have some theories on why this seemingly innocuous approach repeatedly destroyed our entire Hbase database. I won't wax philosophical here, but one thing is certain: Any table created via batch inserts of randomized keys got totally hosed. Scans became dirt slow and compactions ran constantly, even *days* after the table was created. None of these problems made a whole lot of sense, which is why it took 3-4 weeks of debugging for us to back this "key randomizing" out of our code. The hosed tables, actually had to be dropped for the problem, and ensuing chaos to totally abate. Until we dropped the tables, if the region server logs showed constant compaction. Like I said, it sounds crazy, but this definitely was the cause of our problem. I'm fully expecting a lot of "your crazy" responses to this email, but we repeatedly reproduced the issue, and the fix was to stop the "key reversing". We just had to live with all the reducers loading individual regions in lock step, as this was really not a big deal (at least not as big a deal as hosing the entire installation). -g -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Christopher Tarnas Sent: Friday, January 14, 2011 9:54 AM To: [email protected] Subject: Re: Cluster Wide Pauses Thanks - I was not sure and had not received a response from the list on my related question earlier this week. It does seem like compactions are related to my problem, and if I understand correctly does raising hbase.hregion.memstore.block.multiplier give it more of a buffer for that before writes are blocked while compactions happen? I'm writing via thrift (about 30 clients) to a 5 node cluster when I see this problem. There is no io wait so I don't think it is disk bound, and it is not CPU starved. I'm waiting on IT to get me access to ganglia for the network info. -chris On Fri, Jan 14, 2011 at 11:29 AM, Jonathan Gray <[email protected]> wrote: > These are a different kind of pause (those caused by blockingStoreFiles). > > This is HBase stepping in and actually blocking updates to a region because > compactions have not been able to keep up with the write load. It could > manifest itself in the same way but this is different than shorter pauses > caused by periodic offlining of regions during balancing and splits. > > Wayne, have you confirmed in your RegionServer logs that the pauses are > associated with splits or region movement, and that you are not seeing the > blocking store files issue? > > JG > > > -----Original Message----- > > From: [email protected] [mailto:[email protected]] On Behalf Of Christopher > > Tarnas > > Sent: Friday, January 14, 2011 7:29 AM > > To: [email protected] > > Subject: Re: Cluster Wide Pauses > > > > I have been seeing similar problems and found by raising the > > hbase.hregion.memstore.block.multiplier > > to above 12 (default is two) and the hbase.hstore.blockingStoreFiles to > 16 I > > managed to reduce the frequency of the pauses during loads. My nodes are > > pretty beefy (48 GB of ram) so I had room to experiment. > > > > From what I understand that gave the regionservers more buffer before > > they had to halt the world to catch up. The pauses still happen but their > > impact is less now. > > > > -chris > > > > On Fri, Jan 14, 2011 at 8:34 AM, Wayne <[email protected]> wrote: > > > > > We have not found any smoking gun here. Most likely these are region > > > splits on a quickly growing/hot region that all clients get caught > waiting for. > > > > > > > > > On Thu, Jan 13, 2011 at 7:49 AM, Wayne <[email protected]> wrote: > > > > > > > Thank you for the lead! We will definitely look closer at the OS > logs. > > > > > > > > > > > > On Thu, Jan 13, 2011 at 6:59 AM, Tatsuya Kawano > > > ><[email protected] > > > >wrote: > > > > > > > >> > > > >> Hi Wayne, > > > >> > > > >> > We are seeing some TCP Resets on all nodes at the same time, and > > > >> sometimes > > > >> > quite a lot of them. > > > >> > > > >> > > > >> Have you checked this article from Andrei and Cosmin? They had a > > > >> busy firewall to cause network blackout. > > > >> > > > >> http://hstack.org/hbase-performance-testing/ > > > >> > > > >> Maybe it's not your case but just for sure. > > > >> > > > >> Thanks, > > > >> > > > >> -- > > > >> Tatsuya Kawano (Mr.) > > > >> Tokyo, Japan > > > >> > > > >> > > > >> On Jan 13, 2011, at 4:52 AM, Wayne <[email protected]> wrote: > > > >> > > > >> > We are seeing some TCP Resets on all nodes at the same time, and > > > >> sometimes > > > >> > quite a lot of them. We have yet to correlate the pauses to the > > > >> > TCP > > > >> resets > > > >> > but I am starting to wonder if this is partly a network problem. > > > >> > Does Gigabit Ethernet break down on high volume nodes? Do high > > > >> > volume nodes > > > >> use > > > >> > 10G or Infiniband? > > > >> > > > > >> > > > > >> > On Wed, Jan 12, 2011 at 1:52 PM, Stack <[email protected]> wrote: > > > >> > > > > >> >> Jon asks that you describe your loading in the issue. Would you > > > >> >> mind doing so. Ted, stick up in the issue the workload and > > > >> >> configs. you are running if you don't mind. I'd like to try it > over here. > > > >> >> Thanks lads, > > > >> >> St.Ack > > > >> >> > > > >> >> > > > >> >> On Wed, Jan 12, 2011 at 9:03 AM, Wayne <[email protected]> > > wrote: > > > >> >>> Added: https://issues.apache.org/jira/browse/HBASE-3438. > > > >> >>> > > > >> >>> On Wed, Jan 12, 2011 at 11:40 AM, Wayne <[email protected]> > > wrote: > > > >> >>> > > > >> >>>> We are using 0.89.20100924, r1001068 > > > >> >>>> > > > >> >>>> We are seeing see it during heavy write load (which is all the > > > time), > > > >> >> but > > > >> >>>> yesterday we had read load as well as write load and saw both > > > >> >>>> reads > > > >> and > > > >> >>>> writes stop for 10+ seconds. The region size is the biggest > > > >> >>>> clue we > > > >> have > > > >> >>>> found from our tests as setting up a new cluster with a 1GB > > > >> >>>> max > > > >> region > > > >> >> size > > > >> >>>> and starting to load heavily we will see this a lot for long > > > >> >>>> long > > > >> time > > > >> >>>> frames. Maybe the bigger file gets hung up more easily with a > > > split? > > > >> >> Your > > > >> >>>> description below also fits in that early on the load is not > > > balanced > > > >> so > > > >> >> it > > > >> >>>> is easier to stop everything on one node as the balance is not > > > great > > > >> >> early > > > >> >>>> on. I will file a JIRA. I will also try to dig deeper into the > > > >> >>>> logs > > > >> >> during > > > >> >>>> the pauses to find a node that might be stuck in a split. > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> On Wed, Jan 12, 2011 at 11:17 AM, Stack <[email protected]> > > wrote: > > > >> >>>> > > > >> >>>>> On Tue, Jan 11, 2011 at 2:34 PM, Wayne <[email protected]> > > wrote: > > > >> >>>>>> We have very frequent cluster wide pauses that stop all > > > >> >>>>>> reads and > > > >> >>>>> writes > > > >> >>>>>> for seconds. > > > >> >>>>> > > > >> >>>>> All reads and all writes? > > > >> >>>>> > > > >> >>>>> I've seen the pause too for writes. Its something I've > > > >> >>>>> always > > > meant > > > >> >>>>> to look into. Friso postulates one cause. Another that > > > >> >>>>> we've > > > >> talked > > > >> >>>>> of is a region taking a while to come back on line after a > > > >> >>>>> split > > > or > > > >> a > > > >> >>>>> rebalance for whatever reason. Client loading might be > 'random' > > > >> >>>>> spraying over lots of random regions but they all get stuck > > > waiting > > > >> on > > > >> >>>>> one particular region to come back online. > > > >> >>>>> > > > >> >>>>> I suppose reads could be blocked for same reason if all are > > > >> >>>>> trying > > > >> to > > > >> >>>>> read from the offlined region. > > > >> >>>>> > > > >> >>>>> What version of hbase are you using? Splits should be faster > > > >> >>>>> in > > > >> 0.90 > > > >> >>>>> now that the split daughters come up on the same region. > > > >> >>>>> > > > >> >>>>> Sorry I don't have a better answer for you. Need to dig in. > > > >> >>>>> > > > >> >>>>> File a JIRA. If you want to help out some, stick some data > > > >> >>>>> up in > > > >> it. > > > >> >>>>> Some suggestions would be to enable logging of when we > > lookup > > > region > > > >> >>>>> locations in client and then note when requests go to zero. > > > >> >>>>> Can > > > you > > > >> >>>>> figure what region the clients are waiting on (if they are > > > >> >>>>> waiting > > > >> on > > > >> >>>>> any). If you can pull out a particular one, try and elicit > > > >> >>>>> its history at time of blockage. Is it being moved or > > > >> >>>>> mid-split? I suppose it makes sense that bigger regions > > > >> >>>>> would make the > > > situation > > > >> >>>>> 'worse'. I can take a look at it too. > > > >> >>>>> > > > >> >>>>> St.Ack > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> We are constantly loading data to this cluster of 10 nodes. > > > >> >>>>>> These pauses can happen as frequently as every minute but > > > sometimes > > > >> >> are > > > >> >>>>> not > > > >> >>>>>> seen for 15+ minutes. Basically watching the Region server > > > >> >>>>>> list > > > >> with > > > >> >>>>> request > > > >> >>>>>> counts is the only evidence of what is going on. All reads > > > >> >>>>>> and > > > >> writes > > > >> >>>>>> totally stop and if there is ever any activity it is on the > > > >> >>>>>> node > > > >> >> hosting > > > >> >>>>> the > > > >> >>>>>> .META. table with a request count of region count + 1. This > > > problem > > > >> >>>>> seems to > > > >> >>>>>> be worse with a larger region size. We tried a 1GB region > > > >> >>>>>> size > > > and > > > >> >> saw > > > >> >>>>> this > > > >> >>>>>> more than we saw actual activity (and stopped using a larger > > > region > > > >> >> size > > > >> >>>>>> because of it). We went back to the default region size and > > > >> >>>>>> it > > > was > > > >> >>>>> better, > > > >> >>>>>> but we had too many regions so now we are up to 512M for a > > > >> >>>>>> region > > > >> >> size > > > >> >>>>> and > > > >> >>>>>> we are seeing it more again. > > > >> >>>>>> > > > >> >>>>>> Does anyone know what this is? We have dug into all of the > > > >> >>>>>> logs > > > to > > > >> >> find > > > >> >>>>> some > > > >> >>>>>> sort of pause but are not able to find anything. Is this an > > > >> >>>>>> wal > > > >> hlog > > > >> >>>>> roll? > > > >> >>>>>> Is this a region split or compaction? Of course our biggest > > > >> >>>>>> fear > > > is > > > >> a > > > >> >> GC > > > >> >>>>>> pause on the master but we do not have java logging turned > > > >> >>>>>> on > > > with > > > >> >> the > > > >> >>>>>> master to tell. What could possibly stop the entire cluster > > > >> >>>>>> from > > > >> >> working > > > >> >>>>> for > > > >> >>>>>> seconds at a time very frequently? > > > >> >>>>>> > > > >> >>>>>> Thanks in advance for any ideas of what could be causing > this. > > > >> >>>>>> > > > >> >>>>> > > > >> >>>> > > > >> >>>> > > > >> >>> > > > >> >> > > > >> > > > >> > > > > > > > >
