[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490369#comment-13490369
 ] 

Hudson commented on HBASE-6733:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #9 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/9/])
HBASE-6733 TestReplication.queueFailover occasionally fails [Part-2] 
(Revision 1401130)

 Result = FAILURE
enis : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.94.3, 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch, 
 HBASE-6733-0.94.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481612#comment-13481612
 ] 

Sergey Shelukhin commented on HBASE-6733:
-

Should be ok in this one, better for git commit tracking. Thanks!

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch, 
 HBASE-6733-0.94.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-22 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481974#comment-13481974
 ] 

Enis Soztutar commented on HBASE-6733:
--

Run TestReplication a couple of times, and committed the backport patch to 
0.94. Thanks Sergey for providing the patch and Lars for review. 

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch, 
 HBASE-6733-0.94.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482044#comment-13482044
 ] 

Hudson commented on HBASE-6733:
---

Integrated in HBase-0.94 #547 (See 
[https://builds.apache.org/job/HBase-0.94/547/])
HBASE-6733 TestReplication.queueFailover occasionally fails [Part-2] 
(Revision 1401130)

 Result = SUCCESS
enis : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.94.3, 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch, 
 HBASE-6733-0.94.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480896#comment-13480896
 ] 

Lars Hofhansl commented on HBASE-6733:
--

Looks good. I can commit this to 0.94 (either here or in a new porting jira).

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch, 
 HBASE-6733-0.94.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473029#comment-13473029
 ] 

Lars Hofhansl commented on HBASE-6733:
--

I think we want this in 0.94 as well.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473152#comment-13473152
 ] 

Hudson commented on HBASE-6733:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #216 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/216/])
HBASE-6733  TestReplication.queueFailover occasionally fails [Part-2] 
(Devaraj Das via JD) (Revision 1396463)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-10 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473396#comment-13473396
 ] 

Jean-Daniel Cryans commented on HBASE-6733:
---

[~lhofhansl] want me to hold off until 0.94.2 is released?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473012#comment-13473012
 ] 

Hudson commented on HBASE-6733:
---

Integrated in HBase-TRUNK #3440 (See 
[https://builds.apache.org/job/HBase-TRUNK/3440/])
HBASE-6733  TestReplication.queueFailover occasionally fails [Part-2] 
(Devaraj Das via JD) (Revision 1396463)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-03 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468868#comment-13468868
 ] 

Jean-Daniel Cryans commented on HBASE-6733:
---

The log switching code really needs to be cleaned up, but my understanding is 
that this patch won't do anything. {{processEndOfFile}} always sets the 
{{currentPath}} to {{null}} so this:

{code}
+  Path oldPath = getCurrentPath();
{code}

would always return null in the case where we're switching log? 

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-03 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468895#comment-13468895
 ] 

Devaraj Das commented on HBASE-6733:


bq. would always return null in the case where we're switching log?
That's true.. But the patch still works :-) The check _if (getCurrentPath() != 
null  !getCurrentPath().equals(oldPath))_ would return true (after a call to 
getNextPath()) and the sleepMultiplier would be reset..

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-03 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468909#comment-13468909
 ] 

Devaraj Das commented on HBASE-6733:


The patch should continue to work if at some point of time, log switching 
behavior is changed so that the currentPath always points to a valid non-null 
path... But for now, yeah, null works as well (and I have checked in the Hadoop 
code that the implementation of equals method with a null argument is handled).

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-03 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468920#comment-13468920
 ] 

Jean-Daniel Cryans commented on HBASE-6733:
---

You are right, but I'd rather have the code expose what it's really doing.

Also, reading more, this looks weird:

{code}
+  boolean pathNull = getNextPath();
...
-  if (!getNextPath()) {
+  if (!pathNull) {
{code}

{{getNextPath}} returns true if the path was not null so shouldn't the variable 
be named pathNotNull or hasCurrentPath and then remove the exclamation point? 

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-03 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468933#comment-13468933
 ] 

Devaraj Das commented on HBASE-6733:


bq. but I'd rather have the code expose what it's really doing.

Do you want me to put a comment or something?

bq. the variable be named pathNotNull or hasCurrentPath and then remove the 
exclamation point?

Agree. I'll rename pathNull to hasCurrentPath (but the check will remain the 
same - if (!hasCurrentPath) ..)

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-03 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468936#comment-13468936
 ] 

Jean-Daniel Cryans commented on HBASE-6733:
---

bq.  (but the check will remain the same - if (!hasCurrentPath) ..)

Ah geez yeah keep that. Damn double negations.

bq. Do you want me to put a comment or something?

Check for null if that's what you expect I'd say.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469016#comment-13469016
 ] 

Hadoop QA commented on HBASE-6733:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12547620/6733-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
83 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient
  
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor
  org.apache.hadoop.hbase.regionserver.TestAtomicOperation

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2997//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2997//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2997//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2997//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2997//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2997//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2997//console

This message is automatically generated.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-28 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465596#comment-13465596
 ] 

Devaraj Das commented on HBASE-6733:


BTW I tested the patch on real clusters (based on 0.92) and observed no 
problems.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460645#comment-13460645
 ] 

stack commented on HBASE-6733:
--

[~jdcryans] Review this boss?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458107#comment-13458107
 ] 

stack commented on HBASE-6733:
--

+1

Want me to commit this and try it DD?  Want me to leave the issue open after 
commit so we can see how it does over a few builds up on jenkins?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-18 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458125#comment-13458125
 ] 

Devaraj Das commented on HBASE-6733:


Yeah, this is committable IMO stack. I think we can close the issue, and open 
follow ups if and when we spot issues.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458127#comment-13458127
 ] 

stack commented on HBASE-6733:
--

OK.  Let me get the component chief to take a look before I commit J-D?

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-11 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453175#comment-13453175
 ] 

Devaraj Das commented on HBASE-6733:


Stack, yes, unless it is a field, it seems like a lot of refactoring has to be 
done to pass it and get it back to/from methods.. Didn't seem worth. 

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452674#comment-13452674
 ] 

Hadoop QA commented on HBASE-6733:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544562/6733-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause mvn compile goal to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2844//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2844//console

This message is automatically generated.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452678#comment-13452678
 ] 

stack commented on HBASE-6733:
--

Do we have to add sleep multiplier as a data member DD?  There are lot of 
instances of it as a local variable.  Having it as data member could confuse?  
Looking at it, it could be hard NOT having it as a data member... as it would 
require a bunch of refactoring of code flow.  It does look odd though having 
the data member passed into functions since once its a data member, there 
is no need for it to be passed in any more?  What you reckon?

Good debugging by the way.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-07 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450714#comment-13450714
 ] 

Devaraj Das commented on HBASE-6733:


For the first problem, the sequence is:
1. The replicator thread in ReplicationSource fails to find anything to 
replicate for maxRetriesMultiplier. The thread starts to sleep for 
sleepForRetries times the max value of sleepMultiplier over and over. In every 
iteration of the thread's run method, readAllEntriesToReplicateOrNextFile gets 
called, and at the end of the method, processEndOfFile gets called. 

2. At some point the log roller enqueues a WAL file to replicate.

3. Now when processEndOfFile is called, the currentPath is set to null, and the 
thread's run method gets a new file to replicate (the output of 
ReplicationSource.getNextPath() call). 

4. But the sleepMultiplier is still set to the max value that was set in (1).

5. If there was an exception in reading the new WAL file (enqueued in (2)), the 
file is incorrectly overly penalized (since the sleepMultiplier is still set to 
the max)... An example is below:

{noformat}
2012-08-31 19:16:19,029 INFO  [main] wal.HLog(620): Roll 
/user/hortonde/hbase/.logs/foo.net,50437,1346440555753/foo.net%2C50437%2C1346440555753.1346440556675,
 entries=2, filesize=626.  for 
/user/hortonde/hbase/.logs/foo.net,50437,1346440555753/foo.net%2C50437%2C1346440555753.1346440579013
2012-08-31 19:16:19,032 DEBUG [main] wal.SequenceFileLogWriter(126): using new 
createWriter -- HADOOP-6840
2012-08-31 19:16:19,032 DEBUG [main] wal.SequenceFileLogWriter(136): 
Path=hdfs://localhost:34512/user/hortonde/hbase/.logs/foo.net,44638,1346440555781/foo.net%2C44638%2C1346440555781.1346440579029,
 syncFs=true, hflush=false
2012-08-31 19:16:19,033 DEBUG 
[RegionServer:0;foo.net,50437,1346440555753.replicationSource,2] 
regionserver.ReplicationSource(474): Opening log for replication 
foo.net%2C50437%2C1346440555753.1346440556675 at 626
2012-08-31 19:16:19,036 INFO  
[RegionServer:0;foo.net,50437,1346440555753.replicationSource,2] 
wal.SequenceFileLogReader(217): 
hdfs://localhost:34512/user/hortonde/hbase/.logs/foo.net,50437,1346440555753/foo.net%2C50437%2C1346440555753.1346440556675,
 entryStart=626, pos=626, end=626, edit=0
2012-08-31 19:16:19,036 DEBUG 
[RegionServer:0;foo.net,50437,1346440555753.replicationSource,2] 
regionserver.ReplicationSource(429): currentNbOperations:0 and seenEntries:0 
and size: 0
2012-08-31 19:16:19,036 DEBUG 
[RegionServer:0;foo.net,50437,1346440555753.replicationSource,2] 
regionserver.ReplicationSource(474): Opening log for replication 
foo.net%2C50437%2C1346440555753.1346440579013 at 0
2012-08-31 19:16:19,037 WARN  
[RegionServer:0;foo.net,50437,1346440555753.replicationSource,2] 
regionserver.ReplicationSource(530): 2 Got:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
at 
org.apache.hadoop.io.SequenceFile$Reader.lt;initgt;(SequenceFile.java:1486)
at 
org.apache.hadoop.io.SequenceFile$Reader.lt;initgt;(SequenceFile.java:1475)
at 
org.apache.hadoop.io.SequenceFile$Reader.lt;initgt;(SequenceFile.java:1470)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.lt;initgt;(SequenceFileLogReader.java:58)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:166)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:686)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:478)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:289)
2012-08-31 19:16:19,038 WARN  
[RegionServer:0;foo.net,50437,1346440555753.replicationSource,2] 
regionserver.ReplicationSource(534): Waited too long for this file, considering 
dumping
{noformat} 

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
 Fix For: 0.92.3


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor 

[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]

2012-09-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451102#comment-13451102
 ] 

stack commented on HBASE-6733:
--

You are finding bugs in replication DD.  Good on you.

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
 Fix For: 0.92.3


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira