[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469872#comment-13469872 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.94-security-on-Hadoop-23 #8 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/8/]) HBASE-6847 HBASE-6649 broke replication (Devaraj Das via JD) (Revision 1388160) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381289) Result = FAILURE jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469983#comment-13469983 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.92-security #143 (See [https://builds.apache.org/job/HBase-0.92-security/143/]) HBASE-6847 HBASE-6649 broke replication (Devaraj Das via JD) (Revision 1388159) Fixing the CHANGES.txt after 0.92.2's release and adding HBASE-6649 (Revision 1388157) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381291) Result = FAILURE jdcryans : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java jdcryans : Files : * /hbase/branches/0.92/CHANGES.txt stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460750#comment-13460750 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.94-security #53 (See [https://builds.apache.org/job/HBase-0.94-security/53/]) HBASE-6847 HBASE-6649 broke replication (Devaraj Das via JD) (Revision 1388160) Result = SUCCESS jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459733#comment-13459733 ] Lars Hofhansl commented on HBASE-6649: -- +1 on last patch. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459754#comment-13459754 ] Lars Hofhansl commented on HBASE-6649: -- J-D, any objections to committing this? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459758#comment-13459758 ] Jean-Daniel Cryans commented on HBASE-6649: --- I'm going to create a new jira first (should have done that when I found that problem) and post the patches there with a small nit fixed. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459917#comment-13459917 ] Hudson commented on HBASE-6649: --- Integrated in HBase-TRUNK #3360 (See [https://builds.apache.org/job/HBase-TRUNK/3360/]) HBASE-6847 HBASE-6649 broke replication (Devaraj Das via JD) (Revision 1388161) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459923#comment-13459923 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.94 #476 (See [https://builds.apache.org/job/HBase-0.94/476/]) HBASE-6847 HBASE-6649 broke replication (Devaraj Das via JD) (Revision 1388160) Result = FAILURE jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459977#comment-13459977 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.92 #583 (See [https://builds.apache.org/job/HBase-0.92/583/]) HBASE-6847 HBASE-6649 broke replication (Devaraj Das via JD) (Revision 1388159) Fixing the CHANGES.txt after 0.92.2's release and adding HBASE-6649 (Revision 1388157) Result = SUCCESS jdcryans : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java jdcryans : Files : * /hbase/branches/0.92/CHANGES.txt [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460080#comment-13460080 ] Hudson commented on HBASE-6649: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #184 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/184/]) HBASE-6847 HBASE-6649 broke replication (Devaraj Das via JD) (Revision 1388161) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458839#comment-13458839 ] Jean-Daniel Cryans commented on HBASE-6649: --- bq. This would be a dataloss issue without the fix. bq. I have seen dataloss issues (via the unit test) without this patch.. FWIW if there was indeed dataloss caused by this, it would have been when recovering logs. During normal operation that exception was retried until we're able to read the file. bq. could you please try this patch out in your cluster. It's not exactly a test cluster, more like prod-ish, so I'll put it on only one machine. I assume it might take the whole day to hit the condition. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458850#comment-13458850 ] Devaraj Das commented on HBASE-6649: Thanks, JD [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459137#comment-13459137 ] Jean-Daniel Cryans commented on HBASE-6649: --- The server that has the patch did a Break on IOE twice, and it seems to work: {noformat} 2012-09-19 21:26:50,104 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication va1r6s44%2C10304%2C1348088378534.1348089931722 at 21992487 2012-09-19 21:26:50,110 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Break on IOE: hdfs://va1r5s41:10101/va1-backup/.logs/va1r6s44,10304,1348088378534/va1r6s44%2C10304%2C1348088378534.1348089931722, entryStart=21993911, pos=22058496, end=22058496, edit=5 2012-09-19 21:26:50,110 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:783007 and seenEntries:5 and size: 64585 2012-09-19 21:26:50,110 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 5 2012-09-19 21:26:50,119 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #va1r6s44%2C10304%2C1348088378534.1348089931722 for position 21993911 in hdfs://va1r5s41:10101/va1-backup/.logs/va1r6s44,10304,1348088378534/va1r6s44%2C10304%2C1348088378534.1348089931722 2012-09-19 21:26:50,129 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Removing 0 logs in the list: [] 2012-09-19 21:26:50,129 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicated in total: 145502 2012-09-19 21:26:50,129 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication va1r6s44%2C10304%2C1348088378534.1348089931722 at 21993911 {noformat} One thing that I saw that this patch breaks is the size in currentNbOperations:783007 and seenEntries:5 and size: 64585 because it relies on this.position being the position at the beginning. I often see that number at 0 while having edits to replicate. It's minor since in HBASE-6804 I'm removing that log message altogether but we may want to either remove the size or keep track of what it is at the beginning of the loop within the context of this jira. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459163#comment-13459163 ] Devaraj Das commented on HBASE-6649: Good to know, JD. I'll submit a patch with the logging addressed in a bit. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458121#comment-13458121 ] Jean-Daniel Cryans commented on HBASE-6649: --- We applied this patch on a cluster that replicates and about all the nodes stopped replicated after some time. This is what I see in the logs: {noformat} 2012-09-17 20:04:08,111 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication va1r3s24%2C10304%2C1347911704238.1347911706318 at 78617132 2012-09-17 20:04:08,120 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Break on IOE: hdfs://va1r5s41:10101/va1-backup/.logs/va1r3s24,10304,1347911704238/va1r3s24%2C10304%2C1347911704238.1347911706318, entryStart=78641557, pos=78771200, end=78771200, edit=84 2012-09-17 20:04:08,120 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:164529 and seenEntries:84 and size: 154068 2012-09-17 20:04:08,120 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 84 2012-09-17 20:04:08,146 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #va1r3s24%2C10304%2C1347911704238.1347911706318 for position 78771200 in hdfs://va1r5s41:10101/va1-backup/.logs/va1r3s24,10304,1347911704238/va1r3s24%2C10304%2C1347911704238.1347911706318 2012-09-17 20:04:08,158 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Removing 0 logs in the list: [] 2012-09-17 20:04:08,158 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicated in total: 93234 2012-09-17 20:04:08,158 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication va1r3s24%2C10304%2C1347911704238.1347911706318 at 78771200 2012-09-17 20:04:08,163 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected exception in ReplicationSource, currentPath=hdfs://va1r5s41:10101/va1-backup/.logs/va1r3s24,10304,1347911704238/va1r3s24%2C10304%2C1347911704238.1347911706318 java.lang.IndexOutOfBoundsException at java.io.DataInputStream.readFully(DataInputStream.java:175) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2001) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1901) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1947) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:235) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:394) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:307) {noformat} The file is still in HDFS and it's about double the size we see up there, so it wasn't the end of the file. Looking at other nodes, we always get Break on IOE before getting the exception that kills replication. This is why I think that this patch is the issue. Somehow reading up to the end is reading too far. We need to fix or backport. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458131#comment-13458131 ] Lars Hofhansl commented on HBASE-6649: -- You fix or rollback (the change)? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458134#comment-13458134 ] Devaraj Das commented on HBASE-6649: Looking at the logs/patch more closely.. Will get back soon. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458135#comment-13458135 ] Jean-Daniel Cryans commented on HBASE-6649: --- [~lhofhansl] Trying to figure out what the problem is first although if we're in a hurry we can just rollback. (not backport, doh!) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458185#comment-13458185 ] Jean-Daniel Cryans commented on HBASE-6649: --- [~devaraj] I'm still trying to figure out exactly how we get the IndexOutOfBoundsException (I'd say the file didn't get new data and we started reading exactly at the end and the DFSClient doesn't like that? Or it's missing something at the end?), but if it's a case of reading the tail of a recovered log then we *could* add a check like this: {code} try { entry = this.reader.next(entriesArray[currentNbEntries]); } catch (IOException ie) { if (queueRecovered) { LOG.debug(Break on IOE: + ie.getMessage()); break; } else { throw ie; } } {code} [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458192#comment-13458192 ] Jean-Daniel Cryans commented on HBASE-6649: --- But now that I think about it, it may crap out when coming back to read even on a recovered file. The data will all make it to the other cluster but that source will never be fully cleaned up. Which leads me to think that this is a bug in DFSClient. It's expecting something it's not getting. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458222#comment-13458222 ] Devaraj Das commented on HBASE-6649: Yeah, [~jdcryans] not sure how one could get a IndexOutOfBounds exception. I can't see how the patch would make it surface as well .. The patch only catches and ignores IOE (as opposed to *all* exceptions).. But yeah give me another hour please. Let me dig some more. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458251#comment-13458251 ] Devaraj Das commented on HBASE-6649: Has there been any change in your cluster environment (hadoop version, etc. using different version of dfs client causing the issue to surface)? [Not sure which hadoop version you are on, but there is no chance you are hitting HDFS-1108, right?] [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458274#comment-13458274 ] Devaraj Das commented on HBASE-6649: Okay a plausible explanation - 1. ReplicationSource.readAllEntriesToReplicateOrNextFile throws an IOException (which causes the log Break on IOE: to print), but ignores the exception. 2. When readAllEntriesToReplicateOrNextFile returns, the reader's file-pointer position is queried and 'this.position' is set to that (the reader's file-pointer is possibly pointing to gibberish) 3. Eventually, readAllEntriesToReplicateOrNextFile gets called again, and this time this.reader.next inside throws IndexOutOfBounds exception because it read gibberish (looking at the code of DataInputStream.java, it seems like one of the cases where the IndexOutOfBounds is thrown is when the length passed to readFully is less than 0). The fix I can think of is to reset the reader's 'position' to the last valid position (upon return from the method readAllEntriesToReplicateOrNextFile). Thoughts on the above? Does the analysis make sense? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458352#comment-13458352 ] Lars Hofhansl commented on HBASE-6649: -- Should we pull HBASE-6719 into this? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458378#comment-13458378 ] Jean-Daniel Cryans commented on HBASE-6649: --- bq. The patch only catches and ignores IOE (as opposed to all exceptions) What it does do is permitting to read up to the end of the file. bq. [Not sure which hadoop version you are on, but there is no chance you are hitting HDFS-1108, right?] We are on CDH3u3, didn't change when we applied the patch. bq. Okay a plausible explanation - It's plausible but unless we really understand what that gibberish is at the end of the file, we can't truly make a fix. I don't know why that IOE is throw out but normally we just silently finish reading from the file. There is some special case here. bq. Should we pull HBASE-6719 into this? I think it's separate issues. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458404#comment-13458404 ] Lars Hofhansl commented on HBASE-6649: -- I say we revert from 0.94.2 and retry in 0.94.3. Although from DD's comment: bq. If the second call (within the while loop) throws an exception (like EOFException), it basically destroys the work done up until then. Therefore, some rows would never be replicated. This would be a dataloss issue without the fix. I find that a bit confusion. Since J-D saw the ignored exception in the test cluster eventually on all machines, it seems there was data lost in all versions before 0.94.2? That seems very unlikely. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455031#comment-13455031 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.94-security #52 (See [https://builds.apache.org/job/HBase-0.94-security/52/]) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381289) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455143#comment-13455143 ] Lars Hofhansl commented on HBASE-6649: -- Just failed again: https://builds.apache.org/job/PreCommit-HBASE-Build/2852//testReport/ [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454157#comment-13454157 ] Jean-Daniel Cryans commented on HBASE-6649: --- Oh I see what you mean. Very good find! I wonder what's that gibberish at the end of the file. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454181#comment-13454181 ] Devaraj Das commented on HBASE-6649: bq. Oh I see what you mean. Very good find! I wonder what's that gibberish at the end of the file. Thanks! Are you referring to the log file? I see the following at the end (no gibberish): {noformat} 2012-08-17 15:35:01,161 DEBUG [RegionServer:1;vesta.apache.org,40480,1345217521368-EventThread.replicationSource,2] regionserver.ReplicationSource(474): Opening log for replication vesta.apache.org%2C40480%2C1345217521368.1345217648386 at 258 2012-08-17 15:35:01,164 DEBUG [RegionServer:1;vesta.apache.org,40480,1345217521368-EventThread.replicationSource,2] regionserver.ReplicationSource(429): currentNbOperations:13022 and seenEntries:0 and size: 0 2012-08-17 15:35:01,164 DEBUG [RegionServer:1;vesta.apache.org,40480,1345217521368-EventThread.replicationSource,2] regionserver.ReplicationSource(549): Nothing to replicate, sleeping 100 times 10 {noformat} [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454189#comment-13454189 ] Jean-Daniel Cryans commented on HBASE-6649: --- What I meant is that the reader gets this 10 times: {noformat} java.io.EOFException: hdfs://localhost:60044/user/hudson/hbase/.oldlogs/vesta.apache.org%2C57779%2C1345217521341.1345217601487, entryStart=40929, pos=40960, end=40960, edit=3 {noformat} So if I'm reading this correctly it's able to read the file and got 3 edits but gets an EOF. Is something half written? Then it gives up on the file: {noformat} 2012-08-17 15:33:50,099 INFO [ReplicationExecutor-0.replicationSource,2-vesta.apache.org,57779,1345217521341] regionserver.ReplicationSourceManager(352): Done with the recovered queue 2-vesta.apache.org,57779,1345217521341 {noformat} And there's data loss. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454347#comment-13454347 ] Devaraj Das commented on HBASE-6649: This log file belongs to a crashed RS, and yes, it seems like the last record wasn't completely written to the file before the RS crashed. That should be fine, i.e., no dataloss should happen - in the queueFailover test, the client would have got exceptions to the flushCommit call and it would have retried the batch of 'put' and the corresponding records would have ended up in another RS. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452581#comment-13452581 ] Jean-Daniel Cryans commented on HBASE-6649: --- bq. This is because of multiple calls to reader.next within readAllEntriesToReplicateOrNextFile. If the second call (within the while loop) throws an exception (like EOFException), it basically destroys the work done up until then. Therefore, some rows would never be replicated. The position in the log is updated in ZK only once the edits are replicated hence, even if you fail on the second or hundredth edit, the next region server that will be in charge of that log will pick up where the previous RS was (even if that means re-reading some edits). [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448512#comment-13448512 ] stack commented on HBASE-6649: -- This patch makes sense to me. We replicate all up to the exception and then next time in, we should pick up the IOE again. Want me to commit this DD? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448907#comment-13448907 ] Devaraj Das commented on HBASE-6649: [~zhi...@ebaysf.com]This patch fixes a specific problem to do with replication missing rows, and in my observations, that leads to somewhat frequent TestReplication.queueFailover failures. On trunk, do you know which test hangs? There probably are more issues to fix in the replication area, and we should have follow up jiras (and this jira is part-1 :)). [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448914#comment-13448914 ] Ted Yu commented on HBASE-6649: --- target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplication.txt was 0 length. There was no JVM left from TestReplication by the time I got back to computer. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448924#comment-13448924 ] Lars Hofhansl commented on HBASE-6649: -- Patch looks good to me. (As Ted points out there might other issues as well) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448942#comment-13448942 ] Ted Yu commented on HBASE-6649: --- @J-D: What do you think ? nit: {code} + } catch (IOException ie) { +break; {code} A log statement is desirable before break. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448987#comment-13448987 ] Himanshu Vashishtha commented on HBASE-6649: lgtm. The exception will be re-thrown in the next try, so +0 on adding a log statement before break. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448988#comment-13448988 ] stack commented on HBASE-6649: -- J-D on vacation. Let me commit this. Will add the log message Ted suggests though my sense it overkill, lets see. Would suggest new issue for other 'parts' DD. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449035#comment-13449035 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.94 #450 (See [https://builds.apache.org/job/HBase-0.94/450/]) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381289) Result = FAILURE stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449069#comment-13449069 ] Hudson commented on HBASE-6649: --- Integrated in HBase-TRUNK #3307 (See [https://builds.apache.org/job/HBase-TRUNK/3307/]) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381287) Result = FAILURE stack : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449147#comment-13449147 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.92 #557 (See [https://builds.apache.org/job/HBase-0.92/557/]) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381291) Result = SUCCESS stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448117#comment-13448117 ] Hadoop QA commented on HBASE-6649: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543752/6649-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2779//console This message is automatically generated. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira