That is not 0.94 vs 1.0 issue - its BucketCache vs LRUCache issue.

It seems that, BucketCache.freeBlock is expensive (and source of
contention). It is the issue. Please open JIRA.

-Vlad

On Thu, Oct 15, 2015 at 11:29 AM, Randy Fox <[email protected]> wrote:

> Here is the region server log distilled down to events pertinent to the
> region move.  Not much being logged while the Store is being closed.
> This region was about 1.4G compressed, which I consider small.
>
> 2015-10-15 08:34:28,510 INFO
> org.apache.hadoop.hbase.regionserver.RSRpcServices: Close
> dad6c71ed395df19220ef1056a110086, moving to hb17.prod1.connexity.net
> ,60020,1444425924021
> 2015-10-15 08:34:28,510 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing
> close of
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
> 2015-10-15 08:34:28,511 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Closing
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.:
> disabling compactions & flushes
> 2015-10-15 08:34:28,511 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Running close preflush of
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
> 2015-10-15 08:34:28,511 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.,
> current region memstore size 44.89 MB, and 1/1 column families' memstores
> are being flushed.
> 2015-10-15 08:34:28,511 WARN
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Couldn't find oldest
> seqNum for the region we are about to flush:
> [dad6c71ed395df19220ef1056a110086]
> 2015-10-15 08:34:29,137 INFO
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed,
> sequenceid=70839856845, memsize=44.9 M, hasBloomFilter=false, into tmp file
> hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/3600b47839a945de9733cf17581458e0
> 2015-10-15 08:34:29,144 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem: Committing store
> file
> hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/3600b47839a945de9733cf17581458e0
> as
> hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/3600b47839a945de9733cf17581458e0
> 2015-10-15 08:34:29,150 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Added
> hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/3600b47839a945de9733cf17581458e0,
> entries=190131, sequenceid=70839856845, filesize=3.2 M
> 2015-10-15 08:34:29,151 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memstore flush of ~44.89 MB/47066832, currentsize=22.39 KB/22928
> for region
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
> in 640ms, sequenceid=70839856845, compaction requested=false
> 2015-10-15 08:34:29,152 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
> 2015-10-15 08:34:29,152 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.,
> current region memstore size 22.39 KB, and 1/1 column families' memstores
> are being flushed.
> 2015-10-15 08:34:29,152 WARN
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Couldn't find oldest
> seqNum for the region we are about to flush:
> [dad6c71ed395df19220ef1056a110086]
> 2015-10-15 08:34:29,225 INFO
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed,
> sequenceid=70839856876, memsize=22.4 K, hasBloomFilter=false, into tmp file
> hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/a8a74549c7a44174b105cbed23cfaea1
> 2015-10-15 08:34:29,231 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem: Committing store
> file
> hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/a8a74549c7a44174b105cbed23cfaea1
> as
> hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/a8a74549c7a44174b105cbed23cfaea1
> 2015-10-15 08:34:29,279 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Added
> hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/a8a74549c7a44174b105cbed23cfaea1,
> entries=128, sequenceid=70839856876, filesize=2.6 K
> 2015-10-15 08:34:29,280 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memstore flush of ~22.39 KB/22928, currentsize=0 B/0 for region
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
> in 128ms, sequenceid=70839856876, compaction requested=true
>
>
> 2015-10-15 08:37:02,945 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Closed L
> 2015-10-15 08:37:02,954 DEBUG org.apache.hadoop.hbase.wal.WALSplitter:
> Wrote region
> seqId=hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/recovered.edits/70839856879.seqid
> to file, newSeqId=70839856879, maxSeqId=70838935131
> 2015-10-15 08:37:02,955 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
> 2015-10-15 08:37:02,955 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Adding moved region
> record: dad6c71ed395df19220ef1056a110086 to 
> hb17.prod1.connexity.net,60020,1444425924021
> as of 70838935131
> 2015-10-15 08:37:02,956 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x45049a3c1b80a8b, quorum=ccm2.prod1.connexity.net:2181
> ,ccm1.prod1.connexity.net:2181,db2.prod1.connexity.net:2181,
> sql2.prod1.connexity.net:2181,ccm3.prod1.connexity.net:2181,
> baseZNode=/hbase Transitioning dad6c71ed395df19220ef1056a110086 from
> M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
> 2015-10-15 08:37:02,967 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x45049a3c1b80a8b, quorum=ccm2.prod1.connexity.net:2181
> ,ccm1.prod1.connexity.net:2181,db2.prod1.connexity.net:2181,
> sql2.prod1.connexity.net:2181,ccm3.prod1.connexity.net:2181,
> baseZNode=/hbase Transitioned node dad6c71ed395df19220ef1056a110086 from
> M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
> 2015-10-15 08:37:02,967 DEBUG
> org.apache.hadoop.hbase.coordination.ZkCloseRegionCoordination: Set closed
> state in zk for
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
> on hb18.prod1.connexity.net,60020,1444427168636
> 2015-10-15 08:37:02,967 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed
> Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.
> ~
>
>
>
>
>
>
> On 10/15/15, 11:05 AM, "Esteban Gutierrez" <[email protected]> wrote:
>
> >Hi Randy,
> >
> >As Ted suggested a couple of stack traces would be helpful but if you
> could
> >attach the HMaster and a couple of RS logs where you see this problem that
> >would be great. Also, do you have a large regions that might need some
> time
> >to flush? have you looked into any potential problem in HDFS?
> >
> >thanks,
> >esteban.
> >
> >--
> >Cloudera, Inc.
> >
> >
> >On Thu, Oct 15, 2015 at 10:39 AM, Ted Yu <[email protected]> wrote:
> >
> >> Can you give a bit more detail on why block eviction was cause for the
> >> slow region movement?
> >>
> >> Did you happen to take stack traces ?
> >>
> >> Thanks
> >>
> >> > On Oct 15, 2015, at 10:32 AM, Randy Fox <[email protected]> wrote:
> >> >
> >> > Hi,
> >> >
> >> > We just upgraded from 0.94 to 1.0.0 and have noticed that region moves
> >> are super slow (order of minutes) whereas previously they where in the
> >> seconds range.  After looking at the code, I think the time is spent
> >> waiting for the blocks to be evicted from block cache.
> >> >
> >> > I wanted to verify that this theory is correct and see if there is
> >> anything that can be done to speed up the moves.
> >> >
> >> > This is particular painful as we are trying to get our configs tuned
> to
> >> the new SW and need to do rolling restarts which is taking almost 24
> hours
> >> on our cluster.  We also do our own manual rebalancing of regions across
> >> RS’s and that task is also now painful.
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Randy
> >>
>

Reply via email to