Extremely long flush times

Buckley,Ron Mon, 06 Aug 2012 14:39:00 -0700

We ran into this situation today on our production instance, I'm
wondering if anyone has seen or knows a way around it.


 

We were repeatedly updating the same 60,000 column wide row.  With each
update, we were updating nearly every column of the row. Each update
generated nearly 9 MB in the WAL.

This worked OK most of the time. We saw frequent messages like this in
the region server log, indicating a 2.6 second flush time:

2012-08-06 11:42:52,316 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of
~155.4m/162931400, currentsize=2.7k/2760 for region
Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in
2610ms, sequenceid=1792263953, compaction requested=false

 

However, sometimes we would see a flush times around 8 - 10 Minutes. 

2012-08-06 11:51:28,139 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of
~155.4m/162905016, currentsize=116.5m/122172880 for region
Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in
501337ms, sequenceid=1792264092, compaction requested=true

 

During these long flushes, we would end up taking all the ipc handler
slots into that region server that we had configured (50).  It looked to
our application like the region server was unavailable.

 

After the long delay, the region server would continue running fine, but
then do it again a few minutes later.

 

I thought it was most likely something with Garbage Collection, so I got
a stack trace during one of the events.  The trace doesn't look like GC
was running. 

 

 

http://pastebin.com/ANBiUzX4  - Stack Trace when the region server was
unreachable

http://pastebin.com/naTYQxZR  - Region Server Log during the same
interval

 

 

Versions:

HBase 0.92.1, r1298924

Hadoop 0.20.2-cdh3u3

 

 

----------------------------------------------------------------------

Ron Buckley

x6365

Extremely long flush times

Reply via email to