[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-10-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469872#comment-13469872
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #8 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/8/])
HBASE-6847  HBASE-6649 broke replication (Devaraj Das via JD) (Revision 
1388160)
HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails 
[Part-1] (Revision 1381289)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java

stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.92.3, 0.94.2, 0.96.0

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-10-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469983#comment-13469983
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.92-security #143 (See 
[https://builds.apache.org/job/HBase-0.92-security/143/])
HBASE-6847  HBASE-6649 broke replication (Devaraj Das via JD) (Revision 
1388159)
Fixing the CHANGES.txt after 0.92.2's release and adding HBASE-6649 (Revision 
1388157)
HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails 
[Part-1] (Revision 1381291)

 Result = FAILURE
jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java

jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.92.3, 0.94.2, 0.96.0

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460750#comment-13460750
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.94-security #53 (See 
[https://builds.apache.org/job/HBase-0.94-security/53/])
HBASE-6847  HBASE-6649 broke replication (Devaraj Das via JD) (Revision 
1388160)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.92.3, 0.94.2, 0.96.0

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459733#comment-13459733
 ] 

Lars Hofhansl commented on HBASE-6649:
--

+1 on last patch.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459754#comment-13459754
 ] 

Lars Hofhansl commented on HBASE-6649:
--

J-D, any objections to committing this?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-20 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459758#comment-13459758
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

I'm going to create a new jira first (should have done that when I found that 
problem) and post the patches there with a small nit fixed.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459917#comment-13459917
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-TRUNK #3360 (See 
[https://builds.apache.org/job/HBase-TRUNK/3360/])
HBASE-6847  HBASE-6649 broke replication (Devaraj Das via JD) (Revision 
1388161)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459923#comment-13459923
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.94 #476 (See 
[https://builds.apache.org/job/HBase-0.94/476/])
HBASE-6847  HBASE-6649 broke replication (Devaraj Das via JD) (Revision 
1388160)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459977#comment-13459977
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.92 #583 (See 
[https://builds.apache.org/job/HBase-0.92/583/])
HBASE-6847  HBASE-6649 broke replication (Devaraj Das via JD) (Revision 
1388159)
Fixing the CHANGES.txt after 0.92.2's release and adding HBASE-6649 (Revision 
1388157)

 Result = SUCCESS
jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java

jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460080#comment-13460080
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #184 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/184/])
HBASE-6847  HBASE-6649 broke replication (Devaraj Das via JD) (Revision 
1388161)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.92.3, 0.94.2, 0.96.0

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-19 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458839#comment-13458839
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

bq. This would be a dataloss issue without the fix.
bq. I have seen dataloss issues (via the unit test) without this patch..

FWIW if there was indeed dataloss caused by this, it would have been when 
recovering logs. During normal operation that exception was retried until we're 
able to read the file.

bq. could you please try this patch out in your cluster.

It's not exactly a test cluster, more like prod-ish, so I'll put it on only one 
machine. I assume it might take the whole day to hit the condition.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-19 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458850#comment-13458850
 ] 

Devaraj Das commented on HBASE-6649:


Thanks, JD

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-19 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459137#comment-13459137
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

The server that has the patch did a Break on IOE twice, and it seems to work:

{noformat}
2012-09-19 21:26:50,104 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log 
for replication va1r6s44%2C10304%2C1348088378534.1348089931722 at 21992487
2012-09-19 21:26:50,110 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Break on 
IOE: 
hdfs://va1r5s41:10101/va1-backup/.logs/va1r6s44,10304,1348088378534/va1r6s44%2C10304%2C1348088378534.1348089931722,
 entryStart=21993911, pos=22058496, end=22058496, edit=5
2012-09-19 21:26:50,110 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 
currentNbOperations:783007 and seenEntries:5 and size: 64585
2012-09-19 21:26:50,110 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 
5
2012-09-19 21:26:50,119 INFO 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: 
Going to report log #va1r6s44%2C10304%2C1348088378534.1348089931722 for 
position 21993911 in 
hdfs://va1r5s41:10101/va1-backup/.logs/va1r6s44,10304,1348088378534/va1r6s44%2C10304%2C1348088378534.1348089931722
2012-09-19 21:26:50,129 INFO 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: 
Removing 0 logs in the list: []
2012-09-19 21:26:50,129 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicated 
in total: 145502
2012-09-19 21:26:50,129 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log 
for replication va1r6s44%2C10304%2C1348088378534.1348089931722 at 21993911
{noformat}

One thing that I saw that this patch breaks is the size in 
currentNbOperations:783007 and seenEntries:5 and size: 64585 because it 
relies on this.position being the position at the beginning. I often see that 
number at 0 while having edits to replicate. It's minor since in HBASE-6804 I'm 
removing that log message altogether but we may want to either remove the size 
or keep track of what it is at the beginning of the loop within the context of 
this jira.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-19 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459163#comment-13459163
 ] 

Devaraj Das commented on HBASE-6649:


Good to know, JD. I'll submit a patch with the logging addressed in a bit.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458121#comment-13458121
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

We applied this patch on a cluster that replicates and about all the nodes 
stopped replicated after some time. This is what I see in the logs:

{noformat}
2012-09-17 20:04:08,111 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log 
for replication va1r3s24%2C10304%2C1347911704238.1347911706318 at 78617132
2012-09-17 20:04:08,120 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Break on 
IOE: 
hdfs://va1r5s41:10101/va1-backup/.logs/va1r3s24,10304,1347911704238/va1r3s24%2C10304%2C1347911704238.1347911706318,
 entryStart=78641557, pos=78771200, end=78771200, edit=84
2012-09-17 20:04:08,120 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 
currentNbOperations:164529 and seenEntries:84 and size: 154068
2012-09-17 20:04:08,120 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 
84
2012-09-17 20:04:08,146 INFO 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: 
Going to report log #va1r3s24%2C10304%2C1347911704238.1347911706318 for 
position 78771200 in 
hdfs://va1r5s41:10101/va1-backup/.logs/va1r3s24,10304,1347911704238/va1r3s24%2C10304%2C1347911704238.1347911706318
2012-09-17 20:04:08,158 INFO 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: 
Removing 0 logs in the list: []
2012-09-17 20:04:08,158 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicated 
in total: 93234
2012-09-17 20:04:08,158 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log 
for replication va1r3s24%2C10304%2C1347911704238.1347911706318 at 78771200
2012-09-17 20:04:08,163 ERROR 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected 
exception in ReplicationSource, 
currentPath=hdfs://va1r5s41:10101/va1-backup/.logs/va1r3s24,10304,1347911704238/va1r3s24%2C10304%2C1347911704238.1347911706318
java.lang.IndexOutOfBoundsException
at java.io.DataInputStream.readFully(DataInputStream.java:175)
at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2001)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1901)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1947)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:235)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:394)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:307)
{noformat}

The file is still in HDFS and it's about double the size we see up there, so it 
wasn't the end of the file. Looking at other nodes, we always get Break on 
IOE before getting the exception that kills replication. This is why I think 
that this patch is the issue. Somehow reading up to the end is reading too far.

We need to fix or backport.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458131#comment-13458131
 ] 

Lars Hofhansl commented on HBASE-6649:
--

You fix or rollback (the change)?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458134#comment-13458134
 ] 

Devaraj Das commented on HBASE-6649:


Looking at the logs/patch more closely.. Will get back soon.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458135#comment-13458135
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

[~lhofhansl] Trying to figure out what the problem is first although if we're 
in a hurry we can just rollback. (not backport, doh!)

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458185#comment-13458185
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

[~devaraj] I'm still trying to figure out exactly how we get the 
IndexOutOfBoundsException (I'd say the file didn't get new data and we started 
reading exactly at the end and the DFSClient doesn't like that? Or it's missing 
something at the end?), but if it's a case of reading the tail of a recovered 
log then we *could* add a check like this:

{code}
  try {
entry = this.reader.next(entriesArray[currentNbEntries]);
  } catch (IOException ie) {
if (queueRecovered) {
  LOG.debug(Break on IOE:  + ie.getMessage());
  break;
} else {
  throw ie;
}
  }
{code}

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458192#comment-13458192
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

But now that I think about it, it may crap out when coming back to read even on 
a recovered file. The data will all make it to the other cluster but that 
source will never be fully cleaned up.

Which leads me to think that this is a bug in DFSClient. It's expecting 
something it's not getting.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458222#comment-13458222
 ] 

Devaraj Das commented on HBASE-6649:


Yeah, [~jdcryans] not sure how one could get a IndexOutOfBounds exception. I 
can't see how the patch would make it surface as well .. The patch only catches 
and ignores IOE (as opposed to *all* exceptions).. But yeah give me another 
hour please. Let me dig some more.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458251#comment-13458251
 ] 

Devaraj Das commented on HBASE-6649:


Has there been any change in your cluster environment (hadoop version, etc. 
using different version of dfs client causing the issue to surface)? [Not sure 
which hadoop version you are on, but there is no chance you are hitting 
HDFS-1108, right?]

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458274#comment-13458274
 ] 

Devaraj Das commented on HBASE-6649:


Okay a plausible explanation - 
1. ReplicationSource.readAllEntriesToReplicateOrNextFile throws an IOException 
(which causes the log Break on IOE: to print), but ignores the exception.
2. When readAllEntriesToReplicateOrNextFile returns, the reader's file-pointer 
position is queried and 'this.position' is set to that (the reader's 
file-pointer is possibly pointing to gibberish)
3. Eventually, readAllEntriesToReplicateOrNextFile gets called again, and this 
time this.reader.next inside throws IndexOutOfBounds exception because it read 
gibberish (looking at the code of DataInputStream.java, it seems like one of 
the cases where the IndexOutOfBounds is thrown is when the length passed to 
readFully is less than 0).

The fix I can think of is to reset the reader's 'position' to the last valid 
position (upon return from the method readAllEntriesToReplicateOrNextFile).

Thoughts on the above? Does the analysis make sense?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458352#comment-13458352
 ] 

Lars Hofhansl commented on HBASE-6649:
--

Should we pull HBASE-6719 into this?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458378#comment-13458378
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

bq. The patch only catches and ignores IOE (as opposed to all exceptions)

What it does do is permitting to read up to the end of the file.

bq. [Not sure which hadoop version you are on, but there is no chance you are 
hitting HDFS-1108, right?]

We are on CDH3u3, didn't change when we applied the patch.

bq. Okay a plausible explanation -

It's plausible but unless we really understand what that gibberish is at the 
end of the file, we can't truly make a fix. I don't know why that IOE is throw 
out but normally we just silently finish reading from the file. There is some 
special case here.

bq. Should we pull HBASE-6719 into this?

I think it's separate issues.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458404#comment-13458404
 ] 

Lars Hofhansl commented on HBASE-6649:
--

I say we revert from 0.94.2 and retry in 0.94.3.

Although from DD's comment:
bq. If the second call (within the while loop) throws an exception (like 
EOFException), it basically destroys the work done up until then. Therefore, 
some rows would never be replicated.

This would be a dataloss issue without the fix.

I find that a bit confusion. Since J-D saw the ignored exception in the test 
cluster eventually on all machines, it seems there was data lost in all 
versions before 0.94.2? That seems very unlikely.


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455031#comment-13455031
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.94-security #52 (See 
[https://builds.apache.org/job/HBase-0.94-security/52/])
HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally 
fails [Part-1] (Revision 1381289)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455143#comment-13455143
 ] 

Lars Hofhansl commented on HBASE-6649:
--

Just failed again: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2852//testReport/

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454157#comment-13454157
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

Oh I see what you mean. Very good find! I wonder what's that gibberish at the 
end of the file.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-12 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454181#comment-13454181
 ] 

Devaraj Das commented on HBASE-6649:


bq. Oh I see what you mean. Very good find! I wonder what's that gibberish at 
the end of the file.

Thanks! Are you referring to the log file? I see the following at the end (no 
gibberish):

{noformat}
2012-08-17 15:35:01,161 DEBUG 
[RegionServer:1;vesta.apache.org,40480,1345217521368-EventThread.replicationSource,2]
 regionserver.ReplicationSource(474): Opening log for replication 
vesta.apache.org%2C40480%2C1345217521368.1345217648386 at 258
2012-08-17 15:35:01,164 DEBUG 
[RegionServer:1;vesta.apache.org,40480,1345217521368-EventThread.replicationSource,2]
 regionserver.ReplicationSource(429): currentNbOperations:13022 and 
seenEntries:0 and size: 0
2012-08-17 15:35:01,164 DEBUG 
[RegionServer:1;vesta.apache.org,40480,1345217521368-EventThread.replicationSource,2]
 regionserver.ReplicationSource(549): Nothing to replicate, sleeping 100 times 
10
{noformat}

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454189#comment-13454189
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

What I meant is that the reader gets this 10 times:

{noformat}
java.io.EOFException: 
hdfs://localhost:60044/user/hudson/hbase/.oldlogs/vesta.apache.org%2C57779%2C1345217521341.1345217601487,
 entryStart=40929, pos=40960, end=40960, edit=3
{noformat}

So if I'm reading this correctly it's able to read the file and got 3 edits but 
gets an EOF. Is something half written? Then it gives up on the file:

{noformat}
2012-08-17 15:33:50,099 INFO  
[ReplicationExecutor-0.replicationSource,2-vesta.apache.org,57779,1345217521341]
 regionserver.ReplicationSourceManager(352): Done with the recovered queue 
2-vesta.apache.org,57779,1345217521341
{noformat}

And there's data loss.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-12 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454347#comment-13454347
 ] 

Devaraj Das commented on HBASE-6649:


This log file belongs to a crashed RS, and yes, it seems like the last record 
wasn't completely written to the file before the RS crashed. That should be 
fine, i.e., no dataloss should happen - in the queueFailover test, the client 
would have got exceptions to the flushCommit call and it would have retried the 
batch of 'put' and the corresponding records would have ended up in another RS.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-10 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452581#comment-13452581
 ] 

Jean-Daniel Cryans commented on HBASE-6649:
---

bq. This is because of multiple calls to reader.next within 
readAllEntriesToReplicateOrNextFile. If the second call (within the while loop) 
throws an exception (like EOFException), it basically destroys the work done up 
until then. Therefore, some rows would never be replicated.

The position in the log is updated in ZK only once the edits are replicated 
hence, even if you fail on the second or hundredth edit, the next region server 
that will be in charge of that log will pick up where the previous RS was (even 
if that means re-reading some edits).

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448512#comment-13448512
 ] 

stack commented on HBASE-6649:
--

This patch makes sense to me.  We replicate all up to the exception and then 
next time in, we should pick up the IOE again.  Want me to commit this DD?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448907#comment-13448907
 ] 

Devaraj Das commented on HBASE-6649:


[~zhi...@ebaysf.com]This patch fixes a specific problem to do with replication 
missing rows, and in my observations, that leads to somewhat frequent 
TestReplication.queueFailover failures. On trunk, do you know which test hangs? 
There probably are more issues to fix in the replication area, and we should 
have follow up jiras (and this jira is part-1 :)).

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448914#comment-13448914
 ] 

Ted Yu commented on HBASE-6649:
---

target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplication.txt 
was 0 length.
There was no JVM left from TestReplication by the time I got back to computer.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 
 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - 
 queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448924#comment-13448924
 ] 

Lars Hofhansl commented on HBASE-6649:
--

Patch looks good to me.
(As Ted points out there might other issues as well)

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 
 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - 
 queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448942#comment-13448942
 ] 

Ted Yu commented on HBASE-6649:
---

@J-D:
What do you think ?

nit:
{code}
+  } catch (IOException ie) {
+break;
{code}
A log statement is desirable before break.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 
 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - 
 queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448987#comment-13448987
 ] 

Himanshu Vashishtha commented on HBASE-6649:


lgtm. 
The exception will be re-thrown in the next try, so +0 on adding a log 
statement before break.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 
 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - 
 queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448988#comment-13448988
 ] 

stack commented on HBASE-6649:
--

J-D on vacation.  Let me commit this.  Will add the log message Ted suggests 
though my sense it overkill, lets see.  Would suggest new issue for other 
'parts' DD.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 
 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - 
 queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449035#comment-13449035
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.94 #450 (See 
[https://builds.apache.org/job/HBase-0.94/450/])
HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally 
fails [Part-1] (Revision 1381289)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449069#comment-13449069
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-TRUNK #3307 (See 
[https://builds.apache.org/job/HBase-TRUNK/3307/])
HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally 
fails [Part-1] (Revision 1381287)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449147#comment-13449147
 ] 

Hudson commented on HBASE-6649:
---

Integrated in HBase-0.92 #557 (See 
[https://builds.apache.org/job/HBase-0.92/557/])
HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally 
fails [Part-1] (Revision 1381291)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0, 0.92.3, 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2012-09-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448117#comment-13448117
 ] 

Hadoop QA commented on HBASE-6649:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543752/6649-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2779//console

This message is automatically generated.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - 
 queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
 [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira