Thanks Yu! It was certainly helpful. > Regarding the issue you met, what's the setting of hbase.regionserver.maxlogs in your env? By default it's 32 which means for each RS the un-archived wal number shouldn't exceed 32. However, when multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals allowed for a single RS.
I used default configuration for this. By multiWal, I understand there is different wal per region. Can you please explain how did you get 64 wals for a Region Server. > when multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals allowed for a single RS. I thought one of the side effects of having multiwal enabled is that there will be *large amount of data waiting in unarchived wals.* So if a region server fails, it would take more time to playback the wal files and hence it could *compromise Availability.* Wdyt ? Thanks -Sachin On Tue, Jun 6, 2017 at 2:04 PM, Yu Li <[email protected]> wrote: > Hi Sachin, > > We have been using multiwal in production here in Alibaba for over 2 years > and see no problem. Facebook is also running multiwal online. Please refer > to HBASE-14457 <https://issues.apache.org/jira/browse/HBASE-14457> for > more > details. > > There's also a JIRA HBASE-15131 > <https://issues.apache.org/jira/browse/HBASE-15131> proposing to turn on > multiwal by default but still under discussion, please feel free to leave > your voice there. > > Regarding the issue you met, what's the setting of > hbase.regionserver.maxlogs in your env? By default it's 32 which means for > each RS the un-archived wal number shouldn't exceed 32. However, when > multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals > allowed for a single RS. > > Let me further explain how it leads to RegionTooBusyException: > 1. if the number of un-archived wal exceeds the setting, it will check the > oldest WAL and flush all regions involved in it > 2. if the data ingestion speed is high and wal keeps rolling, there'll be > many small hfiles flushed out, that compaction speed cannot catch up > 3. when hfile number of one store exceeds the setting of > hbase.hstore.blockingStoreFiles (10 by default), it will delay the flush > for hbase.hstore.blockingWaitTime (90s by default) > 4. when data ingestion continues but flush delayed, the memstore size might > exceed the upper limit thus throw RegionTooBusyException > > Hope these information helps. > > Best Regards, > Yu > > On 6 June 2017 at 13:39, Sachin Jain <[email protected]> wrote: > > > Hi, > > > > I was in the middle of a situation where I was getting > > *RegionTooBusyException* with log something like: > > > > *Above Memstore limit, regionName = X ... memstore size = Y and > > blockingMemstoreSize = Z* > > > > This potentially hinted me towards *hotspotting* of a particular region. > So > > I fixed my keyspace partitioning to have more uniform distribution per > > region. It did not completely fix the problem but definitely delayed it a > > bit. > > > > Next thing, I enabled *multiWal*. As I remember there is a configuration > > which leads to flushing of memstores when the threshold of wal is > reached. > > Upon doing this, problem seems to go away. > > > > But, this raises couple of questions > > > > 1. Are there any reprecussions of using *multiWal* in production > > environment ? > > 2. If there are no repercussions and only benefits of using *multiWal*, > why > > is this not turned on by default. Let other consumers turn it off in > > certain (whatever) scenarios. > > > > PS: *Hbase Configuration* > > Single Node (Local Setup) v1.3.1 Ubuntu 16 Core machine. > > > > Thanks > > -Sachin > > >
