>From region server log, looks like region 78c531911731765fcf5a9d4ded5336b0 had trouble flushing.
Still going through the log. Cheers On Fri, Jun 6, 2014 at 6:33 PM, sunweiwei <[email protected]> wrote: > Hi > > The symptom reproduced again. > I paste the log in http://paste2.org/D2N6ZDvk,http://paste2.org/a64LXD0X > One is the regionserver jstack log. > The other is regionserver log, which was grep and only include the unflush > region. > > Thanks > > -----邮件原件----- > 发件人: sunweiwei [mailto:[email protected]] > 发送时间: 2014年6月5日 14:51 > 收件人: [email protected] > 主题: 答复: 答复: 答复: forcing flush not works > > I'm sorry but the regionserver log have been deleted... > > to Stack: > Yes, Always the same two regions of Table BT_D_BF001_201406 always can't > be flushed. > > > Previously I have only saved a little log, When Table BT_D_BF001_201405 > had lots of regions. > 2014-05-27 22:40:52,025 DEBUG [regionserver60020.logRoller] > regionserver.LogRoller: HLog roll requested > 2014-05-27 22:40:52,039 DEBUG [regionserver60020.logRoller] wal.FSHLog: > cleanupCurrentWriter waiting for transactions to get synced total > 450500823 synced till here 450500779 > 2014-05-27 22:40:52,049 INFO [regionserver60020.logRoller] wal.FSHLog: > Rolled WAL > /apps/hbase/data/WALs/hadoop03,60020,1401173211108/hadoop03%2C60020%2C1401173211108.1401201646659 > with entries=94581, filesize=122.2 M; new WAL > /apps/hbase/data/WALs/hadoop03,60020,1401173211108/hadoop03%2C60020%2C1401173211108.1401201652025 > 2014-05-27 22:40:52,049 INFO [regionserver60020.logRoller] wal.FSHLog: > Too many hlogs: logs=156, maxlogs=32; forcing flush of 2 regions(s): > a5b94272f0fdd477bf320e428059fe87, f1a60d3ea5820cb672832c59531de89d > 2014-05-27 22:40:52,073 DEBUG [Thread-17] regionserver.MemStoreFlusher: > Flush thread woke up because memory above low water=6.1 G > 2014-05-27 22:40:52,074 DEBUG [Thread-17] regionserver.MemStoreFlusher: > Under global heap pressure: Region > BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050815160220140508151602174000000000462000001094000080000000000000000000548090000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. > has too many store files, but is 27.6 M vs best flushable region's 0. > Choosing the bigger. > 2014-05-27 22:40:52,075 INFO [Thread-17] regionserver.MemStoreFlusher: > Flush of region > BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050815160220140508151602174000000000462000001094000080000000000000000000548090000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. > due to global heap pressure > 2014-05-27 22:40:52,075 DEBUG [Thread-17] regionserver.HRegion: Started > memstore flush for > BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050815160220140508151602174000000000462000001094000080000000000000000000548090000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39., > current region memstore size 27.6 M > 2014-05-27 22:40:52,599 INFO [Thread-17] > regionserver.DefaultStoreFlusher: Flushed, sequenceid=10069900941, > memsize=27.6 M, hasBloomFilter=true, into tmp file > hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80bd6fede708c05c9fcc9e2b39/.tmp/a89428808e1a4be4a1bf7bd9ec8ece88 > 2014-05-27 22:40:52,608 DEBUG [Thread-17] regionserver.HRegionFileSystem: > Committing store file > hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80bd6fede708c05c9fcc9e2b39/.tmp/a89428808e1a4be4a1bf7bd9ec8ece88 > as > hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80bd6fede708c05c9fcc9e2b39/cf/a89428808e1a4be4a1bf7bd9ec8ece88 > 2014-05-27 22:40:52,617 INFO [Thread-17] regionserver.HStore: Added > hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80bd6fede708c05c9fcc9e2b39/cf/a89428808e1a4be4a1bf7bd9ec8ece88, > entries=44962, sequenceid=10069900941, filesize=5.5 M > 2014-05-27 22:40:52,618 INFO [Thread-17] regionserver.HRegion: Finished > memstore flush of ~27.6 M/28933240, currentsize=43.6 K/44664 for region > BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050815160220140508151602174000000000462000001094000080000000000000000000548090000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. > in 542ms, sequenceid=10069900941, compaction requested=true > 2014-05-27 22:40:52,618 DEBUG [Thread-17] regionserver.CompactSplitThread: > Small Compaction requested: system; Because: Thread-17; > compaction_queue=(4896:19152), split_queue=0, merge_queue=0 > > > Thanks > > -----邮件原件----- > 发件人: ramkrishna vasudevan [mailto:[email protected]] > 发送时间: 2014年6月5日 13:43 > 收件人: [email protected] > 主题: Re: 答复: 答复: forcing flush not works > > >>I still (highly)suspect that there is something wrong with the flush > queue(some entry pushed into it can't be poll out). > Ya I too have that suspect. May be any new logs may help to uncover the > issue. > > > On Thu, Jun 5, 2014 at 11:06 AM, Stack <[email protected]> wrote: > > > Always the same two regions that get stuck or does it vary? Another set > of > > example logs may help uncover the sequence of trouble-causing events. > > > > Thanks, > > St.Ack > > > > > > On Wed, Jun 4, 2014 at 7:31 PM, sunweiwei <[email protected]> > > wrote: > > > > > my log is similar as HBASE-10499. > > > > > > Thanks > > > > > > -----邮件原件----- > > > 发件人: [email protected] [mailto:[email protected]] 代表 Stack > > > 发送时间: 2014年6月3日 23:10 > > > 收件人: Hbase-User > > > 主题: Re: 答复: forcing flush not works > > > > > > Mind posting link to your log? Sounds like HBASE-10499 as Honghua > says. > > > St.Ack > > > > > > > > > On Tue, Jun 3, 2014 at 2:34 AM, sunweiwei <[email protected]> > > > wrote: > > > > > > > Thanks. Maybe the same as HBase-10499. > > > > I stop the regionserver then start it. Then hbase back to normal. > > > > This is jstack log when 2 regions can not flush. > > > > > > > > "Thread-17" prio=10 tid=0x00007f6210383800 nid=0x6540 waiting on > > > condition > > > > [0x00007f61e0a26000] > > > > java.lang.Thread.State: TIMED_WAITING (parking) > > > > at sun.misc.Unsafe.park(Native Method) > > > > - parking to wait for <0x000000041ae0e6b8> (a > > > > java.util.concurrent. > > > > locks.AbstractQueuedSynchronizer$ConditionObject) > > > > at > > > > > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) > > > > at > > > > > > > > > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN > > > > anos(AbstractQueuedSynchronizer.java:2025) > > > > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:201) > > > > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:39) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemSto > > > > reFlusher.java:228) > > > > at java.lang.Thread.run(Thread.java:662) > > > > > > > > -----邮件原件----- > > > > 发件人: 冯宏华 [mailto:[email protected]] > > > > 发送时间: 2014年6月3日 16:34 > > > > 收件人: [email protected] > > > > 主题: 答复: forcing flush not works > > > > > > > > The same symptom as HBase-10499? > > > > > > > > I still (highly)suspect that there is something wrong with the flush > > > > queue(some entry pushed into it can't be poll out). > > > > ________________________________________ > > > > 发件人: sunweiwei [[email protected]] > > > > 发送时间: 2014年6月3日 15:43 > > > > 收件人: [email protected] > > > > 主题: forcing flush not works > > > > > > > > Hi > > > > > > > > > > > > > > > > I'm using a heavy-write hbase0.96 . I find this in regionserver log: > > > > > > > > 2014-06-03 15:13:19,445 INFO [regionserver60020.logRoller] > wal.FSHLog: > > > Too > > > > many hlogs: logs=33, maxlogs=32; forcing flush of 3 regions(s): > > > > 1a7dda3c3815c19970ace39fd99abfe8, aff81bc46aa7d3ed51a01f11f23c8320, > > > > d5666e003f598147b4dda509f173a779 > > > > > > > > 2014-06-03 15:13:23,869 INFO [regionserver60020.logRoller] > wal.FSHLog: > > > Too > > > > many hlogs: logs=34, maxlogs=32; forcing flush of 2 regions(s): > > > > aff81bc46aa7d3ed51a01f11f23c8320, d5666e003f598147b4dda509f173a779 > > > > > > > > ┇ > > > > > > > > ┇ > > > > > > > > 2014-06-03 15:18:14,778 INFO [regionserver60020.logRoller] > wal.FSHLog: > > > Too > > > > many hlogs: logs=93, maxlogs=32; forcing flush of 2 regions(s): > > > > aff81bc46aa7d3ed51a01f11f23c8320, d5666e003f598147b4dda509f173a779 > > > > > > > > > > > > > > > > > > > > > > > > It seems like 2 regions can’t be flushed and WALs Dir continue to > > > increase > > > > and Then I find this in client log: > > > > > > > > INFO | AsyncProcess-waitForMaximumCurrentTasks [2014-06-03 15:30:53] > - > > : > > > > Waiting for the global number of running tasks to be equals or less > > than > > > 0, > > > > tasksSent=15819, tasksDone=15818, currentTasksDone=15818, > > > > tableName=BT_D_BF001_201406 > > > > > > > > > > > > > > > > Then write speed will become very slow. > > > > > > > > After I flush 2 regions manually , write speed can back to normal > > only > > > > a > > > > short while. > > > > > > > > > > > > > > > > Any suggestion will be appreciated. Thanks. > > > > > > > > > > > > > > > > > > > > > >
