[jira] [Commented] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits
[ https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246874#comment-13246874 ] Prakash Khemani commented on HBASE-5618: TestColumnSeeking isn't failing for me. SplitLogManager - prevent unnecessary attempts to resubmits --- Key: HBASE-5618 URL: https://issues.apache.org/jira/browse/HBASE-5618 Project: HBase Issue Type: Improvement Components: wal, zookeeper Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch Currently once a watch fires that the task node has been updated (hearbeated) by the worker, the splitlogmanager still quite some time before it updates the last heard from time. This is because the manager currently schedules another getDataSetWatch() and only after that finishes will it update the task's last heard from time. This leads to a large number of zk-BadVersion warnings when resubmission is continuously attempted and it fails. Two changes should be made (1) On a resubmission failure because of BadVersion the task's lastUpdate time should get upped. (2) The task's lastUpdate time should get upped as soon as the nodeDataChanged() watch fires and without waiting for getDataSetWatch() to complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits
[ https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245790#comment-13245790 ] Prakash Khemani commented on HBASE-5618: sorry for the test failure. fixed and verified. SplitLogManager - prevent unnecessary attempts to resubmits --- Key: HBASE-5618 URL: https://issues.apache.org/jira/browse/HBASE-5618 Project: HBase Issue Type: Improvement Components: wal, zookeeper Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch Currently once a watch fires that the task node has been updated (hearbeated) by the worker, the splitlogmanager still quite some time before it updates the last heard from time. This is because the manager currently schedules another getDataSetWatch() and only after that finishes will it update the task's last heard from time. This leads to a large number of zk-BadVersion warnings when resubmission is continuously attempted and it fails. Two changes should be made (1) On a resubmission failure because of BadVersion the task's lastUpdate time should get upped. (2) The task's lastUpdate time should get upped as soon as the nodeDataChanged() watch fires and without waiting for getDataSetWatch() to complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245797#comment-13245797 ] Prakash Khemani commented on HBASE-5606: Making the deletes synchronous doesn't theoretically remove the race condition. A master could send the delete to the zk-server it is connected to and die. The next master can (theoretically) still run into the pending delete race. SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.2 Attachments: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238794#comment-13238794 ] Prakash Khemani commented on HBASE-5606: @Jimmy This is similar to HBASE-5081 w.r.t what goes wrong - a pending delete creates havoc on the next create. But it is different from HBASE-5081 because the pending Delete is created at a different point in the code - in the timeoutMonitor and not when the task actually fails ... SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 Attachments: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234592#comment-13234592 ] Prakash Khemani commented on HBASE-5606: @Chinna It is the TimeoutMonitor that causes the so many Deletes to be queued. The fix will be the following In TimeoutMonitor do not call getDataSetWatch() if the task has already failed. Ignore the call to getDataSetWatch() if there is already a pending getDataSetWatch against the task. Thanks for finding this issue. SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235297#comment-13235297 ] Prakash Khemani commented on HBASE-5606: The getDataSetWatch() call in the timeout-monitor is only being done to check whether the znode still exists or not. If there is a failure in getting to the znode then we should ignore that failure. How about implementing the following in timeoutmonitor call getDataSetWatch() only if task has not already failed. (This is just an optimization and it can be done without any locking) for this particular getDataSetWatch() call, store a IGNORE-ZK-ERROR flag in the zk async context. If a zk error happens silently then do nothing. SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 Attachments: 5606.txt 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5528) Retry splitting log if failed in the process of ServerShutdownHandler, and abort master when retries exhausted
[ https://issues.apache.org/jira/browse/HBASE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223475#comment-13223475 ] Prakash Khemani commented on HBASE-5528: I think the log-splitting retry logic is there in ServerShutdownHandler ... In ServerShutdownHandler.process() ... the handler is requeued in case of error code try { if (this.shouldSplitHlog) { LOG.info(Splitting logs for + serverName); this.services.getMasterFileSystem().splitLog(serverName); } else { LOG.info(Skipping log splitting for + serverName); } } catch (IOException ioe) { this.services.getExecutorService().submit(this); this.deadServers.add(serverName); throw new IOException(failed log splitting for + serverName + , will retry, ioe); } code Retry splitting log if failed in the process of ServerShutdownHandler, and abort master when retries exhausted -- Key: HBASE-5528 URL: https://issues.apache.org/jira/browse/HBASE-5528 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5528.patch, hbase-5528v2.patch We will retry splitting log if failed in splitLogAfterStartup when master starts. However, there is no retry for failed splitting log in the process of ServerShutdownHandler. Also, if we finally failed to split log, we should abort master even if filesystem is ok to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219413#comment-13219413 ] Prakash Khemani commented on HBASE-5270: @Stack If we presume that the list of servers that joinClusters received contains the server hosting .META., then the next step, that you outlined in your scenario, cannot be allowed. If we are splitting logs for .META. then we have determined that meta-server was not running and therefore it cannot be taking edits. The problem you are outlining is probably still there but the scenario has to be refined. Anyway my point was - at startup master should determine once what servers are up and what are not. This should include whether ROOT and META are assigned or not. And then it should initialize everything based on that knowledge which must not change during initialization. Anything that changes during initialization should be taken care of by the normal Server-handlers. But I have to admit, I don't understand the assignment complexities very well Š I will read up some more. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218407#comment-13218407 ] Prakash Khemani commented on HBASE-5270: Assuming that the master uses the saved region-server list in joinCluster, can you then please outline the scenario where problems can still happen? There is some handling of META and ROOT not being available in ServerShutdownHandler and I am wondering why that is not sufficient. On 2/27/12 11:17 PM, chunhui shen (Commented) (JIRA) j...@apache.org Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217968#comment-13217968 ] Prakash Khemani commented on HBASE-5270: (I haven't read through the comments carefully and I am sorry for the noise if I am way off the mark) The problem as I see it is that the Master's understanding of which region servers are online changes from the time that it calls splitLogAfterStartup() to the time it calls rebuildUserRegions() in joinCluster(). I feel that it might be lot simpler if master saves the list of region-servers that it had given to splitLogAfterStartup(), and later uses the same list for rebuilding user regions. That should fix this issue, won't it? Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4932) Block cache can be mistakenly instantiated by tools
[ https://issues.apache.org/jira/browse/HBASE-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215947#comment-13215947 ] Prakash Khemani commented on HBASE-4932: Yes ... It is a good to have patch. Thanks. On 2/24/12 12:03 PM, Mikhail Bautin (Commented) (JIRA) j...@apache.org Block cache can be mistakenly instantiated by tools --- Key: HBASE-4932 URL: https://issues.apache.org/jira/browse/HBASE-4932 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Fix For: 0.94.0 Attachments: HBASE-4932.patch Map Reduce tasks that create a writer to write HFiles inadvertently end up creating block cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache
[ https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215085#comment-13215085 ] Prakash Khemani commented on HBASE-5347: Lars, you are right. I have been trying to induce a Full GC but without any success. (I can induce a full GC if I artificially hold some key-values in queue and force them to be tenured.) On 89-fb, my test-case is doing random increments on a space of slightly more than 40GB worth of Key-value data. The heap is set to 36GB. The LRU cache has a high and low watermark of .98 and .85 percents. The region server spawns 1000 threads that continuously do the increments. The eviction thread manages to keep the block-cache at about 85% always. Cache-on-write is turned on to induce more cache churn. All the 12 disks are close to 100% read pegged. GC takes 60% of the CPU (sum of user times in 1000 lines of gc log / (elapsed time * #cpus)). Compactions that get started never complete while the load is on. I guess I have to change the dynamics of the test case to induce GC pauses. On 2/22/12 11:35 PM, Todd Lipcon (Commented) (JIRA) j...@apache.org wrote: GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: D1635.5.patch On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212858#comment-13212858 ] Prakash Khemani commented on HBASE-5332: The major compactions are jittered so that too many of them don't happen at the same time. Rather than relying on random jitter, why can't the compaction thread simply ensure that it doesn't schedule too many compactions at the same time? Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache
[ https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205647#comment-13205647 ] Prakash Khemani commented on HBASE-5347: another advantage of this approach will be that we will be able to get rid of low/high water marks in LRUBlockCache and make block eviction synchronous with demand. The default value of the watermarks is set to 75% and 85% (in 89). That means we waste somewhere around 20% of the block-cache today because of asynchronous garbage collection. GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache
[ https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203871#comment-13203871 ] Prakash Khemani commented on HBASE-5347: initial diff for feedback https://reviews.facebook.net/D1635 GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression
[ https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203918#comment-13203918 ] Prakash Khemani commented on HBASE-5313: The values can be kept compressed in memory. We can uncompress them on demand when writing out the key-values during rpc or compactions. The key has to have a pointer to the values. The pointer can be implicit and can be derived from value lengths if all the values are stored in the same order as keys. The value pointer has to be explicit if the values are stored in a different order than the keys. We might want to write out the values in a different order if we want to do per column compression. While writing out the HFileBlock the following can be done - group all the values by their column identifier, independently compress and write out each group of values, go back to the keys and update the value pointers. On 2/8/12 11:50 AM, Lars Hofhansl (Commented) (JIRA) j...@apache.org Restructure hfiles layout for better compression Key: HBASE-5313 URL: https://issues.apache.org/jira/browse/HBASE-5313 Project: HBase Issue Type: Improvement Components: io Reporter: dhruba borthakur Assignee: dhruba borthakur A HFile block contain a stream of key-values. Can we can organize these kvs on the disk in a better way so that we get much greater compression ratios? One option (thanks Prakash) is to store all the keys in the beginning of the block (let's call this the key-section) and then store all their corresponding values towards the end of the block. This will allow us to not-even decompress the values when we are scanning and skipping over rows in the block. Any other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5010) Filter HFiles based on TTL
[ https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194454#comment-13194454 ] Prakash Khemani commented on HBASE-5010: This change is doesn't break HBASE-4721. HBASE-4721 introduced another parameter called hbase.hstore.time.to.purge.deletes to keep deletes even after major compactions. But hbase.hstore.time.to.purge.deletes doesn't override the TTL of the store. Pasting the comment from code which hopefully makes it clear that this diff works with HBASE-4721 // By default, when hbase.hstore.time.to.purge.deletes is 0ms, a delete // marker is always removed during a major compaction. If set to non-zero // value then major compaction will try to keep a delete marker around for // the given number of milliseconds. We want to keep the delete markers // around a bit longer because old puts might appear out-of-order. For // example, during log replication between two clusters. // // If the delete marker has lived longer than its column-family's TTL then // the delete marker will be removed even if time.to.purge.deletes has not // passed. This is because all the Puts that this delete marker can influence // would have also expired. (Removing of delete markers on col family TTL will // not happen if min-versions is set to non-zero) // Filter HFiles based on TTL -- Key: HBASE-5010 URL: https://issues.apache.org/jira/browse/HBASE-5010 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.94.0 Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch, D909.6.patch In ScanWildcardColumnTracker we have {code:java} this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl; ... private boolean isExpired(long timestamp) { return timestamp oldestStamp; } {code} but this time range filtering does not participate in HFile selection. In one real case this caused next() calls to time out because all KVs in a table got expired, but next() had to iterate over the whole table to find that out. We should be able to filter out those HFiles right away. I think a reasonable approach is to add a default timerange filter to every scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry
[ https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181777#comment-13181777 ] Prakash Khemani commented on HBASE-5136: it will be lot simpler to do status.cleanup() in the finally block in splitLogDistributed() Redundant MonitoredTask instances in case of distributed log splitting retry Key: HBASE-5136 URL: https://issues.apache.org/jira/browse/HBASE-5136 Project: HBase Issue Type: Task Reporter: Zhihong Yu Assignee: Zhihong Yu Attachments: 5136.txt In case of log splitting retry, the following code would be executed multiple times: {code} public long splitLogDistributed(final ListPath logDirs) throws IOException { MonitoredTask status = TaskMonitor.get().createStatus( Doing distributed log split in + logDirs); {code} leading to multiple MonitoredTask instances. User may get confused by multiple distributed log splitting entries for the same region server on master UI -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry
[ https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179891#comment-13179891 ] Prakash Khemani commented on HBASE-5081: The retry logic is in HMaster.splitLogAfterStartup(). I will remove the OrphanLogException handling from MasterFileSystem. On 1/4/12 12:03 PM, Zhihong Yu (Commented) (JIRA) j...@apache.org Distributed log splitting deleteNode races against splitLog retry -- Key: HBASE-5081 URL: https://issues.apache.org/jira/browse/HBASE-5081 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Prakash Khemani Fix For: 0.92.0 Attachments: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt Recently, during 0.92 rc testing, we found distributed log splitting hangs there forever. Please see attached screen shot. I looked into it and here is what happened I think: 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All three tasks failed, so the three tasks were deleted, asynchronously; 3. Servershutdownhandler retried the log splitting; 4. During the retrial, it created these three tasks again, and put them in a hashmap (tasks); 5. The asynchronously deletion in step 2 finally happened for one task, in the callback, it removed one task in the hashmap; 6. One of the newly submitted tasks' zookeeper watcher found out that task is unassigned, and it is not in the hashmap, so it created a new orphan task. 7. All three tasks failed, but that task created in step 6 is an orphan so the batch.err counter was one short, so the log splitting hangs there and keeps waiting for the last task to finish which is never going to happen. So I think the problem is step 2. The fix is to make deletion sync, instead of async, so that the retry will have a clean start. Async deleteNode will mess up with split log retrial. In extreme situation, if async deleteNode doesn't happen soon enough, some node created during the retrial could be deleted. deleteNode should be sync. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry
[ https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179894#comment-13179894 ] Prakash Khemani commented on HBASE-5081: Will look into the test failure. I am not sure I know where to find the test run's output logs. On 1/4/12 12:35 PM, Zhihong Yu (Commented) (JIRA) j...@apache.org Distributed log splitting deleteNode races against splitLog retry -- Key: HBASE-5081 URL: https://issues.apache.org/jira/browse/HBASE-5081 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Prakash Khemani Fix For: 0.92.0 Attachments: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt Recently, during 0.92 rc testing, we found distributed log splitting hangs there forever. Please see attached screen shot. I looked into it and here is what happened I think: 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All three tasks failed, so the three tasks were deleted, asynchronously; 3. Servershutdownhandler retried the log splitting; 4. During the retrial, it created these three tasks again, and put them in a hashmap (tasks); 5. The asynchronously deletion in step 2 finally happened for one task, in the callback, it removed one task in the hashmap; 6. One of the newly submitted tasks' zookeeper watcher found out that task is unassigned, and it is not in the hashmap, so it created a new orphan task. 7. All three tasks failed, but that task created in step 6 is an orphan so the batch.err counter was one short, so the log splitting hangs there and keeps waiting for the last task to finish which is never going to happen. So I think the problem is step 2. The fix is to make deletion sync, instead of async, so that the retry will have a clean start. Async deleteNode will mess up with split log retrial. In extreme situation, if async deleteNode doesn't happen soon enough, some node created during the retrial could be deleted. deleteNode should be sync. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry
[ https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179926#comment-13179926 ] Prakash Khemani commented on HBASE-5081: If there is a spurious wakeup before the status has changed to DELETE then the code will return error (oldtask) to the caller. Regarding the hung TestSplitLogManager test in https://builds.apache.org/job/PreCommit-HBASE-Build/665/console - I couldn't find what failed or what hung. https://builds.apache.org/job/PreCommit-HBASE-Build/665//testReport/org.apa che.hadoop.hbase.master/TestSplitLogManager/ shows that everything passed. Distributed log splitting deleteNode races against splitLog retry -- Key: HBASE-5081 URL: https://issues.apache.org/jira/browse/HBASE-5081 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Prakash Khemani Fix For: 0.92.0 Attachments: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt Recently, during 0.92 rc testing, we found distributed log splitting hangs there forever. Please see attached screen shot. I looked into it and here is what happened I think: 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All three tasks failed, so the three tasks were deleted, asynchronously; 3. Servershutdownhandler retried the log splitting; 4. During the retrial, it created these three tasks again, and put them in a hashmap (tasks); 5. The asynchronously deletion in step 2 finally happened for one task, in the callback, it removed one task in the hashmap; 6. One of the newly submitted tasks' zookeeper watcher found out that task is unassigned, and it is not in the hashmap, so it created a new orphan task. 7. All three tasks failed, but that task created in step 6 is an orphan so the batch.err counter was one short, so the log splitting hangs there and keeps waiting for the last task to finish which is never going to happen. So I think the problem is step 2. The fix is to make deletion sync, instead of async, so that the retry will have a clean start. Async deleteNode will mess up with split log retrial. In extreme situation, if async deleteNode doesn't happen soon enough, some node created during the retrial could be deleted. deleteNode should be sync. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry
[ https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180167#comment-13180167 ] Prakash Khemani commented on HBASE-5081: {code} +while (oldtask.status == FAILURE) { + // wait for status to change to DELETED + try { +oldtask.wait(); + } catch (InterruptedException e) { +Thread.currentThread().interrupt(); +LOG.warn(Interrupted when waiting for znode delete callback); +// fall through to return failure } - oldtask.setBatch(batch); } {code} Changing the 'if' to 'while' is OK. But in case of interruptedexception you should exit the while loop and fall through and return. If you don't return on interrupt then there is a good possibility of deadlock when the process is trying to exit. Also there is no point calling oldtask.wait() with the thread's interrupt set. It will immediately throw InterruptedException again. Distributed log splitting deleteNode races against splitLog retry -- Key: HBASE-5081 URL: https://issues.apache.org/jira/browse/HBASE-5081 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Prakash Khemani Fix For: 0.92.0 Attachments: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 5081-deleteNode-with-while-loop.txt, distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt Recently, during 0.92 rc testing, we found distributed log splitting hangs there forever. Please see attached screen shot. I looked into it and here is what happened I think: 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All three tasks failed, so the three tasks were deleted, asynchronously; 3. Servershutdownhandler retried the log splitting; 4. During the retrial, it created these three tasks again, and put them in a hashmap (tasks); 5. The asynchronously deletion in step 2 finally happened for one task, in the callback, it removed one task in the hashmap; 6. One of the newly submitted tasks' zookeeper watcher found out that task is unassigned, and it is not in the hashmap, so it created a new orphan task. 7. All three tasks failed, but that task created in step 6 is an orphan so the batch.err counter was one short, so the log splitting hangs there and keeps waiting for the last task to finish which is never going to happen. So I think the problem is step 2. The fix is to make deletion sync, instead of async, so that the retry will have a clean start. Async deleteNode will mess up with split log retrial. In extreme situation, if async deleteNode doesn't happen soon enough, some node created during the retrial could be deleted. deleteNode should be sync. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry
[ https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180206#comment-13180206 ] Prakash Khemani commented on HBASE-5081: Latest patch uploaded by Ted looks good. I will try to develop a test case for the delayed delete handling. On 1/4/12 9:44 PM, Lars Hofhansl (Commented) (JIRA) j...@apache.org Distributed log splitting deleteNode races against splitLog retry -- Key: HBASE-5081 URL: https://issues.apache.org/jira/browse/HBASE-5081 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Prakash Khemani Fix For: 0.92.0 Attachments: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 5081-deleteNode-with-while-loop.txt, distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt Recently, during 0.92 rc testing, we found distributed log splitting hangs there forever. Please see attached screen shot. I looked into it and here is what happened I think: 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All three tasks failed, so the three tasks were deleted, asynchronously; 3. Servershutdownhandler retried the log splitting; 4. During the retrial, it created these three tasks again, and put them in a hashmap (tasks); 5. The asynchronously deletion in step 2 finally happened for one task, in the callback, it removed one task in the hashmap; 6. One of the newly submitted tasks' zookeeper watcher found out that task is unassigned, and it is not in the hashmap, so it created a new orphan task. 7. All three tasks failed, but that task created in step 6 is an orphan so the batch.err counter was one short, so the log splitting hangs there and keeps waiting for the last task to finish which is never going to happen. So I think the problem is step 2. The fix is to make deletion sync, instead of async, so that the retry will have a clean start. Async deleteNode will mess up with split log retrial. In extreme situation, if async deleteNode doesn't happen soon enough, some node created during the retrial could be deleted. deleteNode should be sync. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry
[ https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179320#comment-13179320 ] Prakash Khemani commented on HBASE-5081: Assuming splitlog failed, the delete of the zk-task-node is queued up, splitlog is retried, createTaskIfAbsent is called, the following piece of code in createTaskIfAbsent() will be hit (because oldtask status is neither IN_PROGRESS nor SUCCESS. Oldtask status is FAILED) LOG.warn(Transient problem. Failure because previously failed task + state still present. Waiting for znode delete callback + path= + path); return oldtask; The splitlog retry will fail immediately with IOException(duplicate log split scheduled for ). The caller (master) will wait and retry again. On 1/3/12 7:02 PM, Jimmy Xiang (Commented) (JIRA) j...@apache.org Distributed log splitting deleteNode races against splitLog retry -- Key: HBASE-5081 URL: https://issues.apache.org/jira/browse/HBASE-5081 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Prakash Khemani Fix For: 0.92.0 Attachments: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt Recently, during 0.92 rc testing, we found distributed log splitting hangs there forever. Please see attached screen shot. I looked into it and here is what happened I think: 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All three tasks failed, so the three tasks were deleted, asynchronously; 3. Servershutdownhandler retried the log splitting; 4. During the retrial, it created these three tasks again, and put them in a hashmap (tasks); 5. The asynchronously deletion in step 2 finally happened for one task, in the callback, it removed one task in the hashmap; 6. One of the newly submitted tasks' zookeeper watcher found out that task is unassigned, and it is not in the hashmap, so it created a new orphan task. 7. All three tasks failed, but that task created in step 6 is an orphan so the batch.err counter was one short, so the log splitting hangs there and keeps waiting for the last task to finish which is never going to happen. So I think the problem is step 2. The fix is to make deletion sync, instead of async, so that the retry will have a clean start. Async deleteNode will mess up with split log retrial. In extreme situation, if async deleteNode doesn't happen soon enough, some node created during the retrial could be deleted. deleteNode should be sync. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5029) TestDistributedLogSplitting fails on occasion
[ https://issues.apache.org/jira/browse/HBASE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172613#comment-13172613 ] Prakash Khemani commented on HBASE-5029: I have been running this test in a loop on my laptop for a while. But I haven't been able to reproduce it. Looked at the code and could not figure out why the test will fail. The test can fail if for some reason the task that has been put up in zookeeper doesn't get acquired for 30 seconds. It will be easier to fix this if I had all the logs. We have been running distributed logging in production for quite some time. And yes, I have tested a region server aborting when it is executing a task a number of times. TestDistributedLogSplitting fails on occasion - Key: HBASE-5029 URL: https://issues.apache.org/jira/browse/HBASE-5029 Project: HBase Issue Type: Bug Reporter: stack Assignee: Prakash Khemani Priority: Critical Attachments: 0001-HBASE-5029-jira-TestDistributedLogSplitting-fails-on.patch, 5029-addingignore.txt, HBASE-5029.D891.1.patch, HBASE-5029.D891.2.patch This is how it usually fails: https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testWorkerAbort/ Assigning mighty Prakash since he offered to take a looksee. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5029) TestDistributedLogSplitting fails on occasion
[ https://issues.apache.org/jira/browse/HBASE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172677#comment-13172677 ] Prakash Khemani commented on HBASE-5029: The cause for this error appears to be the following DFSClient exception 2011-12-17 01:14:48,369 ERROR [SplitLogWorker-janus.apache.org,53708,1324084461889] regionserver.SplitLogWorker(169): unexpected error java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.jav a:3831) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.ja va:3874) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStr eam.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1017) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(Sequen ceFileLogWriter.java:214) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HL ogSplitter.java:458) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HL ogSplitter.java:351) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.j ava:113) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker .java:266) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker .java:197) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java :165) at java.lang.Thread.run(Thread.java:662) (Much earlier in the logs, while the SplitLogWorker was trying to recover-lease on the log file it had received a Thread.interrupt() because the region server was exiting. That thread.interrupt() was unsuccessful in interrupting the recoverLease() call. It is also possible that the interrupt was eaten up during the recoverLease() call.) the split-log-worker thread continued to split the log file. It successfully split the file ... But in the end it hit this exception. It is possible that the file-system was closed by the time the above exception happened.) The fix probably requires some more checking in DFSClient$DFSOutputStream.closeInternal() for a closed file system. The more difficult task is to make sure that recoverLease() handles interrupts correctly. On 12/14/11 2:33 PM, Zhihong Yu (Commented) (JIRA) j...@apache.org TestDistributedLogSplitting fails on occasion - Key: HBASE-5029 URL: https://issues.apache.org/jira/browse/HBASE-5029 Project: HBase Issue Type: Bug Reporter: stack Assignee: Prakash Khemani Priority: Critical Attachments: 0001-HBASE-5029-jira-TestDistributedLogSplitting-fails-on.patch, 5029-addingignore.txt, HBASE-5029.D891.1.patch, HBASE-5029.D891.2.patch This is how it usually fails: https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testWorkerAbort/ Assigning mighty Prakash since he offered to take a looksee. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5013) NPE in HBaseClient$Connection.receiveResponse
[ https://issues.apache.org/jira/browse/HBASE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168168#comment-13168168 ] Prakash Khemani commented on HBASE-5013: It could be HBASE-4980. HBASE-4980 is not synced-into the internal fb branch. My analysis could be wring because I might be trying to match the stack trace against the wrong build. I will cross check. NPE in HBaseClient$Connection.receiveResponse - Key: HBASE-5013 URL: https://issues.apache.org/jira/browse/HBASE-5013 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani We have the following NPE java.io.IOException: Call to hbasedev003.snc3.facebook.com/10.26.1.198:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:916) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:885) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:149) at $Proxy6.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:182) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:295) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:272) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:324) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:228) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1197) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1154) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1141) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:872) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:768) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:742) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:978) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:772) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:736) at org.apache.hadoop.hbase.client.HTable.(HTable.java:207) at org.apache.hadoop.hbase.client.HTable.(HTable.java:177) at com.facebook.BulkImporter.VerifyAssocs.(VerifyAssocs.java:248) at com.facebook.BulkImporter.VerifyAssocs$AssocVerifierMapper.setup(VerifyAssocs.java:138) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:624) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:494) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:490) === Just by looking at code the NPE shouldn't have happened HBaseClient$Connection.setUpIOstreams() sets up in and out. Then it starts the Connection thread. The Connection in its run method calls receiveResponse() In receiveResponse() NPE happens in int id = in.readInt(); As per java.util.concurrent docs the the initialization of in should have been visible in the Connection thread's run() method. So I don't know how in ended up being NULL. === While looking into this issue I noticed a small problem in the closeConnection() method. I will soon upload a diff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4987) wrong use of incarnation var in SplitLogManager
[ https://issues.apache.org/jira/browse/HBASE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165766#comment-13165766 ] Prakash Khemani commented on HBASE-4987: Old issue HBASE-4855 wrong use of incarnation var in SplitLogManager --- Key: HBASE-4987 URL: https://issues.apache.org/jira/browse/HBASE-4987 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani @Ramakrishna found and analyzed an issue in SplitLogManager. But I don't think that the fix is correct. Will upload a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints
[ https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153655#comment-13153655 ] Prakash Khemani commented on HBASE-4823: https://issues.apache.org/jira/browse/HBASE-3415 is also related long running scans lose benefit of bloomfilters and timerange hints --- Key: HBASE-4823 URL: https://issues.apache.org/jira/browse/HBASE-4823 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan When you have a long running scan due to say an MR job, you can lose the benefit of timerange hints bloom filters midway if your scanner gets reset. [Note: The scanners can get reset say due to a flush or compaction]. In one of our workloads, we periodically want to do rollups on recent 15 minutes of data in a column family... but the timerange hint benefit is lost midway when this resetScannerStack (shown below) happens. And end result-- we end up reading all the old HFiles rather than just the recent HFiles. {code} private void resetScannerStack(KeyValue lastTopKey) throws IOException { if (heap != null) { throw new RuntimeException(StoreScanner.reseek run on an existing heap!); } /* When we have the scan object, should we not pass it to getScanners() * to get a limited set of scanners? We did so in the constructor and we * could have done it now by storing the scan object from the constructor */ ListKeyValueScanner scanners = getScanners(); {code} The comment in the code seems to be aware of this issue and even has the suggested fix! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4721) Retain Delete Markers after Major Compaction
[ https://issues.apache.org/jira/browse/HBASE-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13145785#comment-13145785 ] Prakash Khemani commented on HBASE-4721: I had started at a point where I thought I will independently assign ttls to delete markers. But now I have realized that it doesn't make any sense to give a different ttl to the delete-markers. (giving the delete-markers a smaller ttl than the puts will be incorrect. giving them a larger ttl than the puts will be pointless because then the delete-markers will be deleting already expired puts) HBASE-4536 will work but only if keep-deleted-kvs flag is set on the column family (or is it table?). Do you think it makes sense to make it the default behavior that regardless of whether point-in-time queries are being supported or not, major compaction will not remove the delete-markers? A delete-marker will only be removed when it expires or when enough put versions accumulate before it. Concerns that people have raised if we stopped removing all delete markers in a major compaction (1) Space wastage. I am not sure if this is a big concern. (2) The bigger issue is that the user will never be able to insert a Put beyond the delete marker. Today, if the user makes a mistake then the admin can go in, delete the puts, do a major compaction, and then the user can reinsert the correct Puts. This workflow will be nullified if we keep delete-markers even after major compaction. (3) Today the user doesn't even know that there are delete markers. But that will have to change if we start keeping delete-markers beyond major compactions. === I don't get the reasoning behind why we need to keep deleted puts when syncing logs from one cluster to another. The problem that I am concerned about is the following (1) Delete marker arrives from the source cluster (2) major compaction happens on the target cluster which gets rid of the delete marker (3) The deleted put arrives from the source cluster. Now that the delete marker is not there, this put will become visible on the target cluster. Retain Delete Markers after Major Compaction Key: HBASE-4721 URL: https://issues.apache.org/jira/browse/HBASE-4721 Project: HBase Issue Type: New Feature Reporter: Prakash Khemani Assignee: Prakash Khemani There is a need to provide long TTLs for delete markers. This is useful when replicating hbase logs from one cluster to another. The receiving cluster shouldn't compact away the delete markers because the affected key-values might still be on the way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4674) splitLog silently fails
[ https://issues.apache.org/jira/browse/HBASE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139857#comment-13139857 ] Prakash Khemani commented on HBASE-4674: Stack, it is pretty obvious in the code. And yes, I have had seen lost edits a number of times. A simple way to reproduce this issue Create a table Kill namenode. That kills all region servers. Master doesn't die. Master tries to split logs and fails. But it ignores the failure and moves on to assign regions. Start namenode. Start regionservers The regions get assigned w/o their logs getting replayed. == BTW, the fix to this is being posted by Nicolas in HBASE-2312 splitLog silently fails --- Key: HBASE-4674 URL: https://issues.apache.org/jira/browse/HBASE-4674 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Environment: splitLog() can fail silently and region can open w/o its edits getting replayed. Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4696) HRegionThriftServer' might have to indefinitely do redirtects
[ https://issues.apache.org/jira/browse/HBASE-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138891#comment-13138891 ] Prakash Khemani commented on HBASE-4696: looks good to me. thanks. HRegionThriftServer' might have to indefinitely do redirtects - Key: HBASE-4696 URL: https://issues.apache.org/jira/browse/HBASE-4696 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: Prakash Khemani Assignee: Jonathan Gray Fix For: 0.94.0 Attachments: HBASE-4696-v1.patch HRegionThriftServer.getRowWithColumnsTs() redirects the request to the correct region server if it has landed on the wrong region-server. With this approach the smart-client will never get a NotServingRegionException and it will never be able to invalidate its cache. It will indefinitely send the request to the wrong region server and the wrong region server will always be redirecting it. Either redirects should be turned off and the client should react to NotServingRegionExceptions. Or somehow a flag should be set in the response telling the client to refresh its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira