Here is the region server log distilled down to events pertinent to the region
move. Not much being logged while the Store is being closed.
This region was about 1.4G compressed, which I consider small.
2015-10-15 08:34:28,510 INFO
org.apache.hadoop.hbase.regionserver.RSRpcServices: Close
dad6c71ed395df19220ef1056a110086, moving to
hb17.prod1.connexity.net,60020,1444425924021
2015-10-15 08:34:28,510 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing
close of
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
2015-10-15 08:34:28,511 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.:
disabling compactions & flushes
2015-10-15 08:34:28,511 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Running close preflush of
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
2015-10-15 08:34:28,511 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.,
current region memstore size 44.89 MB, and 1/1 column families' memstores are
being flushed.
2015-10-15 08:34:28,511 WARN org.apache.hadoop.hbase.regionserver.wal.FSHLog:
Couldn't find oldest seqNum for the region we are about to flush:
[dad6c71ed395df19220ef1056a110086]
2015-10-15 08:34:29,137 INFO
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed,
sequenceid=70839856845, memsize=44.9 M, hasBloomFilter=false, into tmp file
hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/3600b47839a945de9733cf17581458e0
2015-10-15 08:34:29,144 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionFileSystem: Committing store file
hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/3600b47839a945de9733cf17581458e0
as
hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/3600b47839a945de9733cf17581458e0
2015-10-15 08:34:29,150 INFO org.apache.hadoop.hbase.regionserver.HStore: Added
hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/3600b47839a945de9733cf17581458e0,
entries=190131, sequenceid=70839856845, filesize=3.2 M
2015-10-15 08:34:29,151 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~44.89 MB/47066832, currentsize=22.39 KB/22928 for
region
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
in 640ms, sequenceid=70839856845, compaction requested=false
2015-10-15 08:34:29,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
2015-10-15 08:34:29,152 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.,
current region memstore size 22.39 KB, and 1/1 column families' memstores are
being flushed.
2015-10-15 08:34:29,152 WARN org.apache.hadoop.hbase.regionserver.wal.FSHLog:
Couldn't find oldest seqNum for the region we are about to flush:
[dad6c71ed395df19220ef1056a110086]
2015-10-15 08:34:29,225 INFO
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed,
sequenceid=70839856876, memsize=22.4 K, hasBloomFilter=false, into tmp file
hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/a8a74549c7a44174b105cbed23cfaea1
2015-10-15 08:34:29,231 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionFileSystem: Committing store file
hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/a8a74549c7a44174b105cbed23cfaea1
as
hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/a8a74549c7a44174b105cbed23cfaea1
2015-10-15 08:34:29,279 INFO org.apache.hadoop.hbase.regionserver.HStore: Added
hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/a8a74549c7a44174b105cbed23cfaea1,
entries=128, sequenceid=70839856876, filesize=2.6 K
2015-10-15 08:34:29,280 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~22.39 KB/22928, currentsize=0 B/0 for region
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
in 128ms, sequenceid=70839856876, compaction requested=true
2015-10-15 08:37:02,945 INFO org.apache.hadoop.hbase.regionserver.HStore:
Closed L
2015-10-15 08:37:02,954 DEBUG org.apache.hadoop.hbase.wal.WALSplitter: Wrote
region
seqId=hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/recovered.edits/70839856879.seqid
to file, newSeqId=70839856879, maxSeqId=70838935131
2015-10-15 08:37:02,955 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
2015-10-15 08:37:02,955 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Adding moved region record:
dad6c71ed395df19220ef1056a110086 to
hb17.prod1.connexity.net,60020,1444425924021 as of 70838935131
2015-10-15 08:37:02,956 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x45049a3c1b80a8b,
quorum=ccm2.prod1.connexity.net:2181,ccm1.prod1.connexity.net:2181,db2.prod1.connexity.net:2181,sql2.prod1.connexity.net:2181,ccm3.prod1.connexity.net:2181,
baseZNode=/hbase Transitioning dad6c71ed395df19220ef1056a110086 from
M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
2015-10-15 08:37:02,967 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x45049a3c1b80a8b,
quorum=ccm2.prod1.connexity.net:2181,ccm1.prod1.connexity.net:2181,db2.prod1.connexity.net:2181,sql2.prod1.connexity.net:2181,ccm3.prod1.connexity.net:2181,
baseZNode=/hbase Transitioned node dad6c71ed395df19220ef1056a110086 from
M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
2015-10-15 08:37:02,967 DEBUG
org.apache.hadoop.hbase.coordination.ZkCloseRegionCoordination: Set closed
state in zk for
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
on hb18.prod1.connexity.net,60020,1444427168636
2015-10-15 08:37:02,967 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed
Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
~
On 10/15/15, 11:05 AM, "Esteban Gutierrez" <[email protected]> wrote:
>Hi Randy,
>
>As Ted suggested a couple of stack traces would be helpful but if you could
>attach the HMaster and a couple of RS logs where you see this problem that
>would be great. Also, do you have a large regions that might need some time
>to flush? have you looked into any potential problem in HDFS?
>
>thanks,
>esteban.
>
>--
>Cloudera, Inc.
>
>
>On Thu, Oct 15, 2015 at 10:39 AM, Ted Yu <[email protected]> wrote:
>
>> Can you give a bit more detail on why block eviction was cause for the
>> slow region movement?
>>
>> Did you happen to take stack traces ?
>>
>> Thanks
>>
>> > On Oct 15, 2015, at 10:32 AM, Randy Fox <[email protected]> wrote:
>> >
>> > Hi,
>> >
>> > We just upgraded from 0.94 to 1.0.0 and have noticed that region moves
>> are super slow (order of minutes) whereas previously they where in the
>> seconds range. After looking at the code, I think the time is spent
>> waiting for the blocks to be evicted from block cache.
>> >
>> > I wanted to verify that this theory is correct and see if there is
>> anything that can be done to speed up the moves.
>> >
>> > This is particular painful as we are trying to get our configs tuned to
>> the new SW and need to do rolling restarts which is taking almost 24 hours
>> on our cluster. We also do our own manual rebalancing of regions across
>> RS’s and that task is also now painful.
>> >
>> >
>> > Thanks,
>> >
>> > Randy
>>