[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870692#comment-13870692 ] Gustavo Anatoly commented on HBASE-9948: Hi, [~yuzhih...@gmail.com] Issue Status: - I'm working with solution based on [~jeffreyz] suggestion, skipping schedule for duplicate log splitting waiting finish them. - I tried reproduce this exception but without success, so I'll write a test to simulate this condition. Thank you. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865722#comment-13865722 ] Ted Yu commented on HBASE-9948: --- Thanks for the patch. {code} +} catch (DuplicatedSplitLogException dsle) { + LOG.warn(dsle.getMessage()); +} {code} Should the method return in the catch block ? There is no need to do metrics for the duplicate log split request. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865765#comment-13865765 ] Gustavo Anatoly commented on HBASE-9948: Hi, [~yuzhih...@gmail.com] You're right. I can change this block to: {code} +} catch (DuplicatedSplitLogException dsle) { + return; +} {code} Thanks for review. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865797#comment-13865797 ] Jeffrey Zhong commented on HBASE-9948: -- We should check why TestRestartCluster has duplicate log split requests error at first place. The current patch doesn't work because SplitLog has to be a blocking call till the requested logs complete log splitting process otherwise region assignment could happen before a log splitting completes which will cause data loss. I'd suggest the fix can skip scheduling dup log splitting but wait for them to finish. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865806#comment-13865806 ] Hadoop QA commented on HBASE-9948: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622005/HBASE-9948.patch against trunk revision . ATTACHMENT ID: 12622005 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 1.3.9) to fail. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8366//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8366//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8366//console This message is automatically generated. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865848#comment-13865848 ] Gustavo Anatoly commented on HBASE-9948: Hi, [~jeffreyz]. I will follow your suggestions and really to avoid data loss the request splitting log process should be an atomic operation, so the best way is investigate the root causes of dup log. [~yuzhih...@gmail.com], How can I reproduce this scenario? Thank you [~jeffreyz]. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865851#comment-13865851 ] Ted Yu commented on HBASE-9948: --- Let me search my computer to see if I have the test output. Meanwhile, you can loop TestRestartCluster and see if the duplicate message appears in test output. Thanks HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865858#comment-13865858 ] Gustavo Anatoly commented on HBASE-9948: Thanks, [~yuzhih...@gmail.com] HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865864#comment-13865864 ] Ted Yu commented on HBASE-9948: --- Please modify the following line in the script to look for 'duplicate log split scheduled' : {code} grep NullPointerException hbase-server/target/surefire-reports/*${test[$j]%\#*}-output.txt {code} HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865867#comment-13865867 ] Gustavo Anatoly commented on HBASE-9948: Okay HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865910#comment-13865910 ] Hadoop QA commented on HBASE-9948: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622017/HBASE-9948-v2.patch against trunk revision . ATTACHMENT ID: 12622017 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//console This message is automatically generated. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857467#comment-13857467 ] Gustavo Anatoly commented on HBASE-9948: Hi, Ted. Could I work in this issue? HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857506#comment-13857506 ] Ted Yu commented on HBASE-9948: --- That would be great. Thanks HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857340#comment-13857340 ] Ted Yu commented on HBASE-9948: --- HMaster should handle IOException for duplicate split request so that it doesn't abort. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)