[jira] [Updated] (HBASE-14932) bulkload fails because file not found
[ https://issues.apache.org/jira/browse/HBASE-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14932: --- Status: Patch Available (was: Open) > bulkload fails because file not found > - > > Key: HBASE-14932 > URL: https://issues.apache.org/jira/browse/HBASE-14932 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.98.10 >Reporter: Shuaifeng Zhou >Assignee: Alicia Ying Shu > Fix For: 0.98.18 > > Attachments: HBASE-14932-0.98.patch > > > When make a dobulkload call, one call may contain sevel hfiles to load, but > the call may timeout during regionserver load files, and client will retry to > load. > But when client doing retry call, regionserver may continue doing load > operation, if somefiles success, the retry call will throw filenotfound > exception, and this will cause client retry again and again until retry > exhausted, and bulkload fails. > When this happening, actually, some files are loaded successfully, that's a > inconsistent status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158469#comment-15158469 ] Shuaifeng Zhou commented on HBASE-14735: In our clusters fixed like this, and this problem never happened again during the passed few monthes. > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch, > 14735-master.patch, 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14932) bulkload fails because file not found
[ https://issues.apache.org/jira/browse/HBASE-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14932: --- Attachment: HBASE-14932-0.98.patch A solution is when HRegion bulkLoadHFiles, ignore FileNotFoundException to continue. Patch on 0.98 is attached, please review it > bulkload fails because file not found > - > > Key: HBASE-14932 > URL: https://issues.apache.org/jira/browse/HBASE-14932 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.98.10 >Reporter: Shuaifeng Zhou >Assignee: Alicia Ying Shu > Fix For: 0.98.18 > > Attachments: HBASE-14932-0.98.patch > > > When make a dobulkload call, one call may contain sevel hfiles to load, but > the call may timeout during regionserver load files, and client will retry to > load. > But when client doing retry call, regionserver may continue doing load > operation, if somefiles success, the retry call will throw filenotfound > exception, and this will cause client retry again and again until retry > exhausted, and bulkload fails. > When this happening, actually, some files are loaded successfully, that's a > inconsistent status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14932) bulkload fails because file not found
Shuaifeng Zhou created HBASE-14932: -- Summary: bulkload fails because file not found Key: HBASE-14932 URL: https://issues.apache.org/jira/browse/HBASE-14932 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.10 Reporter: Shuaifeng Zhou Fix For: 0.98.17 When make a dobulkload call, one call may contain sevel hfiles to load, but the call may timeout during regionserver load files, and client will retry to load. But when client doing retry call, regionserver may continue doing load operation, if somefiles success, the retry call will throw filenotfound exception, and this will cause client retry again and again until retry exhausted, and bulkload fails. When this happening, actually, some files are loaded successfully, that's a inconsistent status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14931) Active master switches may cause region close forever
Shuaifeng Zhou created HBASE-14931: -- Summary: Active master switches may cause region close forever Key: HBASE-14931 URL: https://issues.apache.org/jira/browse/HBASE-14931 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.98.10 Reporter: Shuaifeng Zhou Priority: Critical Fix For: 0.98.17 60010 webpage shows that a region is online on one RS, but when access data in the region throw notServingRegion. After lookup the source code and logs, found that it's because active master switches during the region openning: 1, master1 open region 'region1', sent open region request to rs and create node in zk 2, master1 stoped 3, master2 became active master 4, master2 obtain all region status, 'region1' status is offline 5, rs opened 'region1' node changed to opened in zk, and sent message to master2 6, master2 received RS_ZK_REGION_OPENED, but the status is not pending open or openning, sent unassign to rs, 'region1' closed {code:title=AssignmentManager.java|borderStyle=solid} case RS_ZK_REGION_OPENED: // Should see OPENED after OPENING but possible after PENDING_OPEN. if (regionState == null || !regionState.isPendingOpenOrOpeningOnServer(sn)) { LOG.warn("Received OPENED for " + prettyPrintedRegionName + " from " + sn + " but the region isn't PENDING_OPEN/OPENING here: " + regionStates.getRegionState(encodedName)); if (regionState != null) { // Close it without updating the internal region states, // so as not to create double assignments in unlucky scenarios // mentioned in OpenRegionHandler#process unassign(regionState.getRegion(), null, -1, null, false, sn); } return; } {code} 7, master2 continue handle regioninfo when master1 stoped, found that 'region1' status in zk is opened, update status in memory to opened. 8, up to now, 'region1' status is opened on webpage of master status, but not opened on any regionserver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031504#comment-15031504 ] Shuaifeng Zhou commented on HBASE-14735: Hi, [~stack] We running with this patch applied on our clusters. We have many clusters, some 0.94 and some 0.98 version. Recently we are upgrading, not finished. This patch really works. In 0.98 version, there are no reference problem, but 0.94 have. Because in 0.98, if there is any reference, compact will force to major, but in 0.94, it's not. Both version have huge region problem. Because 0.94 is too old and to be upgraded, I haven't provide the patch on 0.94. Below are some of the du result and lsr result of one example in 0.94( after split onece, alse have a 200G+ huge region, a file more than 100G, but aways being selected during compaction. And also hive 2 reference after several compactions), the regionsize configured is 40GB du: {noformat} 32796614610 hdfs://hm101:9000/hbase/TAB_INTERESTING/effa8658177d023f4001b5d169bca149 24719467342 hdfs://hm101:9000/hbase/TAB_INTERESTING/f0819cb446cbdf785fb85638553605c5 210031594622 hdfs://hm101:9000/hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019 40210595441 hdfs://hm101:9000/hbase/TAB_INTERESTING/f0e08b1a4b1169a7b1f537c068a577bb 50824015435 hdfs://hm101:9000/hbase/TAB_INTERESTING/f0e710bb05dbc394d11524fa6dc34016 21566277612 hdfs://hm101:9000/hbase/TAB_INTERESTING/f11affc0f157e8f4cacce13c6faefe52 {noformat} lsr: {noformat} -rw-r--r-- 2 root supergroup 4181396311 2015-11-23 09:48 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/1f5b326bbbe64b178ce98783fe8223af -rw-r--r-- 2 root supergroup 4128995550 2015-11-23 10:03 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/29289c0ea8a746a284d585a928611d65 -rw-r--r-- 2 root supergroup 4137771163 2015-11-22 08:05 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/3458f12dd5d842fa8629ade59fbc5443 -rw-r--r-- 2 root supergroup 4122308215 2015-11-23 10:08 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/4ecdd970f2d845d680d5273b13a4d463 -rw-r--r-- 2 root supergroup 74 2015-11-22 01:34 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/5b4fa0edcd37427cadc50602b0a0758a.78b89f6a03d5e5f61e7e49b2cb1bb0a8 -rw-r--r-- 2 root supergroup 122997494766 2015-11-22 22:22 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/6517dbf39bd449c1ae97cdcc0f341100 -rw-r--r-- 2 root supergroup 4121185787 2015-11-22 07:57 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/72c864a6b36148b98a26f6e9fd52e89c -rw-r--r-- 2 root supergroup 4131467137 2015-11-23 09:58 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/74ff9ced889f43839fa520dcaba1744a -rw-r--r-- 2 root supergroup 1963236714 2015-11-23 10:34 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/75e5ab25679e4bc6bc7490577e90b166 -rw-r--r-- 2 root supergroup 4141563183 2015-11-23 09:54 /hbase/TAB_INTERESTING/f0c180d817bd74e1743c56f6478ac019/F/7c5d0db31f92424fa06e6070dc4d0817 {noformat} > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch, > 14735-master.patch, 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029342#comment-15029342 ] Shuaifeng Zhou commented on HBASE-14735: Thanks a lot for the explain, [~stack] We met the problem. The huge region can not be compacted to a few files because high input load, and if cannot be split, the input load aways on the region, this situation become worse and worse. If split the region to 2, the input load will be split and balanced on the 2 children. What you wary about the patch is reasonable, we also met the the reference file problem. After we apply the patch on our cluster, the huge region also cannot be split, because there is a reference file, for some reason, the file aways cannot be selected to compact, and we sent a major compact request to solve the problem. The patch may not solve the huge region problem, but can prevent it. In the patch, we respect the rule that compact comes first, but give a chance to split if region is too big. If region split before it grows too big, compact on the children may be easily, and can clean the reference intime before the children grow too big. > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch, > 14735-master.patch, 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020332#comment-15020332 ] Shuaifeng Zhou commented on HBASE-14735: priority <= 0 means storefiles in the store is more than blockingFileCount(default = 7), and memory store flush on this region will be blocked for a while. So, I think the check doing priority <= 0 is all right, isn't it? But this issue is that if we don't split it, may cause region growing too big. So, Recursive enqueue is all right, but if region size is too big, it should splited. > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch, > 14735-master.patch, 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006181#comment-15006181 ] Shuaifeng Zhou commented on HBASE-14735: lgtm, no zombie tests this time > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch, > 14735-master.patch, 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14735: --- Attachment: 14735-branch-1.2.patch reattached patch on 1.2 > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch, > 14735-master.patch, 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999831#comment-14999831 ] Shuaifeng Zhou commented on HBASE-14735: Should the problem block this jira issue? I haven't seen this before, what can I do to continue? Thanks [~stack]] [~tedyu]] > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-master (2).patch, 14735-master.patch, > 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999832#comment-14999832 ] Shuaifeng Zhou commented on HBASE-14735: Should the problem block this jira issue? I haven't seen this before, what can I do to continue? Thanks [~stack]] [~tedyu]] > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-master (2).patch, 14735-master.patch, > 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999833#comment-14999833 ] Shuaifeng Zhou commented on HBASE-14735: Should the problem block this jira issue? I haven't seen this before, what can I do to continue? Thanks [~stack]] [~tedyu]] > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-master (2).patch, 14735-master.patch, > 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990880#comment-14990880 ] Shuaifeng Zhou commented on HBASE-14735: Yes, my mistake. > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-master (2).patch, 14735-master.patch, > 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14735: --- Attachment: 14735-branch-1.2.patch 14735-branch-1.1.patch 14735-0.98.patch patches on branch 0.98, 1.1 and 1.2 is attached > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-master (2).patch, 14735-master.patch, > 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989259#comment-14989259 ] Shuaifeng Zhou commented on HBASE-14735: I think it's correct. The lower value have higher priority, because there are more storefiles in the store. If the number of storefiles <= blockingFileCount, flush will be blocked, so it should have higher priority. > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-master (2).patch, 14735-master.patch, > 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14735: --- Attachment: 14735-master.patch attached patch on master, all regionserver testcase passed, please review it. patch on other branch will be attached late after I run the test case. > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983921#comment-14983921 ] Shuaifeng Zhou commented on HBASE-14735: edit the requestSplit function, remove -deleted-&& r.getCompactPriority() >= Store.PRIORITY_USER {code:title=CompactSplitThread.java|borderStyle=solid} public synchronized boolean requestSplit(final HRegion r) { // don't split regions that are blocking if (shouldSplitRegion()) { byte[] midKey = r.checkSplit(); if (midKey != null) { requestSplit(r, midKey); return true; } } return false; } {code} > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14735) Region may grow too big and can not be split
Shuaifeng Zhou created HBASE-14735: -- Summary: Region may grow too big and can not be split Key: HBASE-14735 URL: https://issues.apache.org/jira/browse/HBASE-14735 Project: HBase Issue Type: Bug Components: Compaction, regionserver Affects Versions: 0.98.15, 1.1.2 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou When a compaction completed, may there are also many storefiles in the store, and CompactPriority < 0, then compactSplitThread will do a "Recursive enqueue" compaction request instead of request a split: {code:title=CompactSplitThread.java|borderStyle=solid} if (completed) { // degenerate case: blocked regions require recursive enqueues if (store.getCompactPriority() <= 0) { requestSystemCompaction(region, store, "Recursive enqueue"); } else { // see if the compaction has caused us to exceed max region size requestSplit(region); } {code} But in some situation, the "recursive enqueue" request may return null, and not build up a new compaction runner. For example, an other compaction of the same region is running, and compaction selection will exclude all files older than the newest files currently compacting, this may cause no enough files can be selected by the "recursive enqueue" request. When this happen, split will not be trigged. If the input load is high enough, compactions aways running on the region, and split will never be triggered. In our cluster, this situation happened, and a huge region more than 400GB and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14735) Region may grow too big and can not be split
[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983918#comment-14983918 ] Shuaifeng Zhou commented on HBASE-14735: A solution is remove "else" switch, give a chance to split after each completed compaction. {code:title=CompactSplitThread.java|borderStyle=solid} if (completed) { // degenerate case: blocked regions require recursive enqueues if (store.getCompactPriority() <= 0) { requestSystemCompaction(region, store, "Recursive enqueue"); } // see if the compaction has caused us to exceed max region size requestSplit(region); {code} > Region may grow too big and can not be split > > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver >Affects Versions: 1.1.2, 0.98.15 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14407: --- Attachment: 14407-branch-1.1.patch Reattached patch on branch-1.1 Is it ok? [~apurtell] > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou >Priority: Critical > Attachments: 14407-0.98.patch, 14407-branch-1.1.patch, > 14407-branch-1.2.patch, hbase-14407-0.98.patch, hbase-14407-1.1.patch, > hbase-14407-1.2.patch, hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906064#comment-14906064 ] Shuaifeng Zhou commented on HBASE-14407: lgtm > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou >Priority: Critical > Attachments: 14407-0.98.patch, 14407-branch-1.1.patch, > 14407-branch-1.2.patch, hbase-14407-0.98.patch, hbase-14407-1.1.patch, > hbase-14407-1.2.patch, hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907328#comment-14907328 ] Shuaifeng Zhou commented on HBASE-14407: lgtm > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou >Priority: Critical > Attachments: 14407-0.98.patch, 14407-branch-1.1.patch, > 14407-branch-1.2.patch, hbase-14407-0.98.patch, hbase-14407-1.1.patch, > hbase-14407-1.2.patch, hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901759#comment-14901759 ] Shuaifeng Zhou commented on HBASE-14407: [~apurtell] > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou >Priority: Critical > Attachments: 14407-0.98.patch, 14407-branch-1.2.patch, > hbase-14407-0.98.patch, hbase-14407-1.1.patch, hbase-14407-1.2.patch, > hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877385#comment-14877385 ] Shuaifeng Zhou commented on HBASE-14407: lgtm should patch goes to 0.98 and branch-1.1 ? > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou >Priority: Critical > Attachments: 14407-branch-1.2.patch, hbase-14407-0.98.patch, > hbase-14407-1.1.patch, hbase-14407-1.2.patch, hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14407: --- Attachment: hbase-14407-1.2.patch hbase-14407-1.1.patch hbase-14407-0.98.patch A possible solution is when processAlreadyOpenedRegion, check zk state before modify master memory. Patch on branch 0.98, 1.1 and 1.2 is attached. And I tested 0.98.10 modified as this with more than 10,000 regions, that's ok(before, the problem happens every time restarting hbase). In master branch, assign not using zk, so there is no problem. Please review it, welcome more smart solution. > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou >Priority: Critical > Attachments: hbase-14407-0.98.patch, hbase-14407-1.1.patch, > hbase-14407-1.2.patch, hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745283#comment-14745283 ] Shuaifeng Zhou commented on HBASE-14407: Thanks, stack I have extract the master log analysis, and attached possible patch. > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou >Priority: Critical > Attachments: hbase-14407-0.98.patch, hbase-14407-1.1.patch, > hbase-14407-1.2.patch, hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744866#comment-14744866 ] Shuaifeng Zhou commented on HBASE-14407: In master log, the error is happened like this(0.98.10): 1, master open region timeout: {noformat} 2015-09-06 01:35:59,521 DEBUG [hm,6,1438368907764-GeneralBulkAssigner-19] master.AssignmentManager(1768): Bulk assigner openRegion() to hs4,60020,1441213185092 has timed out, but the regions might already be opened on it. java.net.SocketTimeoutException: Call to hs4/15.173.0.115:60020 failed because java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/15.173.0.110:33771 remote=hs4/15.173.0.115:60020] {noformat} 2, master retried open region, found that region already opened, delete node {noformat} 2015-09-06 01:36:07,063 DEBUG [hm,6,1438368907764-GeneralBulkAssigner-19] master.AssignmentManager(2293): ALREADY_OPENED NB_APP_BEHAVIOR_LABEL_00,F\x180100\x18,1441015264846.2070d83bfd7c3fa6950c859ce842039e. to hs4,60020,1441213185092 {noformat} 3, delete zk node in processAlreadyOpenedRegion, but region state not match (because this is a retry, region opened previously ) {noformat} 2015-09-06 01:36:07,073 WARN [hm,6,1438368907764-GeneralBulkAssigner-19] zookeeper.ZKAssign(458): master:6-0x24ee432542d01eb, quorum=hs5:2181,hs4:2181,hm:2181, baseZNode=/hbase Attempting to delete unassigned node 2070d83bfd7c3fa6950c859ce842039e in M_ZK_REGION_OFFLINE state but node is in RS_ZK_REGION_OPENED state 2015-09-06 01:36:07,073 INFO [hm,6,1438368907764-GeneralBulkAssigner-19] master.AssignmentManager(3614): Failed to delete the offline node for 2070d83bfd7c3fa6950c859ce842039e. The node type may not match {noformat} at the same time, will modiry regionStates in master memory: {code:title=AssignmentManager.java|borderStyle=solid} private void processAlreadyOpenedRegion(HRegionInfo region, ServerName sn) { // Remove region from in-memory transition and unassigned node from ZK // While trying to enable the table the regions of the table were // already enabled. LOG.debug("ALREADY_OPENED " + region.getRegionNameAsString() + " to " + sn); String encodedName = region.getEncodedName(); deleteNodeInStates(encodedName, "offline", sn, EventType.M_ZK_REGION_OFFLINE); regionStates.regionOnline(region, sn); } {code} 4, handling previous success open region zk event (delayed) {noformat} 2015-09-06 01:36:07,073 INFO [hm,6,1438368907764-GeneralBulkAssigner-19] master.RegionStates(826): Transition {2070d83bfd7c3fa6950c859ce842039e state=PENDING_OPEN, ts=1441474499424, server=hs4,60020,1441213185092} to {2070d83bfd7c3fa6950c859ce842039e state=OPEN, ts=1441474567073, server=hs4,60020,1441213185092} 2015-09-06 01:36:07,073 INFO [hm,6,1438368907764-GeneralBulkAssigner-19] master.RegionStates(371): Onlined 2070d83bfd7c3fa6950c859ce842039e on hs4,60020,1441213185092 2015-09-06 01:36:33,960 DEBUG [AM.ZK.Worker-pool2-t5251] master.AssignmentManager(926): Handling RS_ZK_REGION_OPENED, server=hs4,60020,1441213185092, region=2070d83bfd7c3fa6950c859ce842039e, which is more than 15 seconds late, current_state={2070d83bfd7c3fa6950c859ce842039e state=OPEN, ts=1441474567073, server=hs4,60020,1441213185092} {noformat} 5, modify regionStates again, but found that region already opened, Error, close region {noformat} 2015-09-06 01:36:33,961 WARN [AM.ZK.Worker-pool2-t5251] master.AssignmentManager(1061): Received OPENED for 2070d83bfd7c3fa6950c859ce842039e from hs4,60020,1441213185092 but the region isn't PENDING_OPEN/OPENING here: {2070d83bfd7c3fa6950c859ce842039e state=OPEN, ts=1441474567073, server=hs4,60020,1441213185092} 2015-09-06 01:36:33,965 DEBUG [AM.ZK.Worker-pool2-t5251] master.AssignmentManager(1849): Sent CLOSE to hs4,60020,1441213185092 for region NB_APP_BEHAVIOR_LABEL_00,F\x180100\x18,1441015264846.2070d83bfd7c3fa6950c859ce842039e. {noformat} > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou >Priority: Critical > Attachments: hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send
[jira] [Updated] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14407: --- Attachment: hs4.log master.log attached is logs on master and regionserver > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.10, 1.1.2 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > Attachments: hs4.log, master.log > > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14407) NotServingRegion: hbase region closed forever
Shuaifeng Zhou created HBASE-14407: -- Summary: NotServingRegion: hbase region closed forever Key: HBASE-14407 URL: https://issues.apache.org/jira/browse/HBASE-14407 Project: HBase Issue Type: Bug Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Fix For: 1.1.2, 0.98.10 I found a situation may cause region closed forever, and this situation happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the problem: 1, master send region open to regionserver 2, rs open a handler do openregion 3, rs return resopnse to master 3, master not received the response, or timeout, send open region again 4, rs already opened the region 5, master processAlreadyOpenedRegion, update regionstate open in master memory 6, master received zk message region opened(for some reason late, eg: net work), and triger update regionstate open, but find that region already opened, ERROR! 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14407: --- Affects Version/s: 0.98.10 1.1.2 > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.10, 1.1.2 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14407) NotServingRegion: hbase region closed forever
[ https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-14407: --- Fix Version/s: (was: 1.1.2) (was: 0.98.10) > NotServingRegion: hbase region closed forever > - > > Key: HBASE-14407 > URL: https://issues.apache.org/jira/browse/HBASE-14407 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.10, 1.1.2 >Reporter: Shuaifeng Zhou >Assignee: Shuaifeng Zhou > > I found a situation may cause region closed forever, and this situation > happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the > problem: > 1, master send region open to regionserver > 2, rs open a handler do openregion > 3, rs return resopnse to master > 3, master not received the response, or timeout, send open region again > 4, rs already opened the region > 5, master processAlreadyOpenedRegion, update regionstate open in master > memory > 6, master received zk message region opened(for some reason late, eg: net > work), and triger update regionstate open, but find that region already > opened, ERROR! > 7, master send close region, and region be closed forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13528: --- Attachment: HBASE-13528-master.patch HBASE-13528-1.0.patch HBASE-13528-0.98.patch A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506298#comment-14506298 ] Shuaifeng Zhou commented on HBASE-13528: Yes, it's redundant, just like this is OK? {noformat} long size = compaction.getRequest().getSize(); ThreadPoolExecutor pool = (selectNow s.throttleCompaction(size)) ? largeCompactions : smallCompactions; {noformat} A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506341#comment-14506341 ] Shuaifeng Zhou commented on HBASE-13528: OK, will atach patch soon. A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13528: --- Status: Patch Available (was: Open) A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13528) A bug on selecting compaction pool
Shuaifeng Zhou created HBASE-13528: -- Summary: A bug on selecting compaction pool Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13528: --- Status: Open (was: Patch Available) A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13528: --- Attachment: HBASE-13528-master-1.patch HBASE-13528-1.0-1.patch HBASE-13528-0.98-1.patch refine the patch as comments from zhangduo A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13528: --- Status: Patch Available (was: Open) A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13122) Improve efficiency for return codes of some filters
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13122: --- Status: Patch Available (was: Reopened) Improve efficiency for return codes of some filters --- Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.98.10.1, 0.94.24, 1.0.1 Reporter: Shuaifeng Zhou Fix For: 2.0.0, 1.1.0 Attachments: 13122-master.patch, 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13122) Improve efficiency for return codes of some filters
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13122: --- Attachment: 13122-master.patch Improve efficiency for return codes of some filters --- Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Fix For: 2.0.0, 1.1.0 Attachments: 13122-master.patch, 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13122) Improve efficiency for return codes of some filters
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347847#comment-14347847 ] Shuaifeng Zhou commented on HBASE-13122: Failure error is below: Failed to read test report file /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-shell/target/surefire-reports/TEST-org.apache.hadoop.hbase.client.TestShell.xml org.dom4j.DocumentException: Error on line 706 of document file:///home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-shell/target/surefire-reports/TEST-org.apache.hadoop.hbase.client.TestShell.xml : XML document structures must start and end within the same entity. Nested exception: XML document structures must start and end within the same entity. whole test case only cost 0ms, not run any testcase. Error info shows that TestShell.xml error, nothing related with the patch, I think. Improve efficiency for return codes of some filters --- Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Fix For: 2.0.0, 1.1.0 Attachments: 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13122) Improve efficiency for return codes of some filters
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346352#comment-14346352 ] Shuaifeng Zhou commented on HBASE-13122: Thanks for the review, Ram. That's the same thing if got data from the first family. Both cells from the two families will be filtered by the filter. Got data from secondFamily, it will skip the first family after check one cell. Similar, get data from the firstFamily, it will skip the second family after check one cell in it. (each row check once) Improve efficiency for return codes of some filters --- Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Fix For: 2.0.0, 1.1.0 Attachments: 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13122) Improve efficiency for return codes of some filters
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346313#comment-14346313 ] Shuaifeng Zhou commented on HBASE-13122: That's a good point. Accurately, there should be a return code NEXT_FAMILY, so the return code should be next_family. But currently, we have no this code. And if there is multi-family, next_row does the work, scan jump to next family if return code is NEXT_ROW. Maybe lately, we should add a return code NEXT_FAMILY, but it's a big change ... Improve efficiency for return codes of some filters --- Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Fix For: 2.0.0, 1.1.0 Attachments: 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13122) Improve efficiency for return codes of some filters
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346312#comment-14346312 ] Shuaifeng Zhou commented on HBASE-13122: That's a good point. Accurately, there should be a return code NEXT_FAMILY, so the return code should be next_family. But currently, we have no this code. And if there is multi-family, next_row does the work, scan jump to next family if return code is NEXT_ROW. Maybe lately, we should add a return code NEXT_FAMILY, but it's a big change ... Improve efficiency for return codes of some filters --- Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Fix For: 2.0.0, 1.1.0 Attachments: 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13122) Improve efficiency for return codes of some filters
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346367#comment-14346367 ] Shuaifeng Zhou commented on HBASE-13122: NEXT_ROW can work is because there is regionscanner and store scanner, next_row affect store scanner. When one storescanner switch to next row, the region scanner switch to the next storescanner, the second store scanner will continue check the current row. That why the change can be more efficiency. Improve efficiency for return codes of some filters --- Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Fix For: 2.0.0, 1.1.0 Attachments: 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13122) Improve efficiency for return codes of some filters
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343269#comment-14343269 ] Shuaifeng Zhou commented on HBASE-13122: We have done a performance test, here are the result: FamilyFilter: test table have two familys , each have 3 qualifier, and put 1 rows into the table, each rowqualifier have 1000 versions. scan use familyFilter get values from the second family, scaned 2000 rows and 100 versions of each row qualifier. Use the oraginal FamilyFilter, cost average 309 seconds, but with the improved familyFilter, the cost is average 38 seconds, improved about 700% ColumnRangeFilter: The same data but only one family, scan 1 rows and 1000 versions, orangial cost average 68s, the improved cost 64s, improved a little. Because in the FamilyFilter, the improve will reduce read files, so improved significantly, but the columnRangeFilter can not reduce read files, so imporve little. Improve efficiency for return codes of some filters --- Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Attachments: 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13122) return codes of some filters not efficent
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13122: --- Attachment: 13122-master.patch patch for master branch attached return codes of some filters not efficent - Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Attachments: 13122-master.patch, 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13122) return codes of some filters not efficent
Shuaifeng Zhou created HBASE-13122: -- Summary: return codes of some filters not efficent Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.98.10.1, 0.94.24, 1.0.1 Reporter: Shuaifeng Zhou ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13122) return codes of some filters not efficent
[ https://issues.apache.org/jira/browse/HBASE-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaifeng Zhou updated HBASE-13122: --- Attachment: 13122.patch just change the return code to next_col in ColumnRangeFilter, change to Next_Row in FamilyFilter. return codes of some filters not efficent - Key: HBASE-13122 URL: https://issues.apache.org/jira/browse/HBASE-13122 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.24, 1.0.1, 0.98.10.1 Reporter: Shuaifeng Zhou Attachments: 13122.patch ColumnRangeFilter: when minColumnInclusive is false, it means all the cells at the current rowcolumn not fit the condition, so it should skip to next column, return code should be NEXT_COL, not SKIP. FamilyFilter is the similar sitution. Currently, SKIP will not causing error, but not efficent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Set default value for hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308474#comment-14308474 ] Shuaifeng Zhou commented on HBASE-12976: Lars, sorry for confussing. I mean the kvs returned to client by one rpc is defined by catching and batching, but the byte size is not controled, may be this parameter can help? Just an idea. Set default value for hbase.client.scanner.max.result.size -- Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12976-v2.txt, 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12976) Default hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/HBASE-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306787#comment-14306787 ] Shuaifeng Zhou commented on HBASE-12976: this setting should work together with batching and catching to control the result size Default hbase.client.scanner.max.result.size Key: HBASE-12976 URL: https://issues.apache.org/jira/browse/HBASE-12976 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 0.94.27, 0.98.11 Attachments: 12976.txt Setting scanner caching is somewhat of a black art. It's hard to estimate ahead of time how large the result set will be. I propose we hbase.client.scanner.max.result.size to 2mb. That is good compromise between performance and buffer usage on typical networks (avoiding OOMs when the caching was chosen too high). To an HTable client this is completely transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)