subject:"\[jira\] \[Commented\] \(HBASE\-6435\) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes"


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438551#comment-13438551
 ] 

nkeywal commented on HBASE-6435:


v14: version I'm going to commit as soon as the local tests (in progress) are 
ok.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 
 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438563#comment-13438563
 ] 

nkeywal commented on HBASE-6435:


Ok, local tests said:
Tests in error:
  testGetRowVersions(org.apache.hadoop.hbase.TestMultiVersions): Shutting down
  testScanMultipleVersions(org.apache.hadoop.hbase.TestMultiVersions): 
org.apache.hadoop.hbase.MasterNotRunningException: Can create a proxy to 
master, but it is not running

Not reproduced (tried once).

Committed revision 1375451.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 
 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438567#comment-13438567
 ] 

nkeywal commented on HBASE-6435:


+ Committed revision 1375454.
As I forgot to add the new test in svn initially.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 
 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure. This log is 
 written on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-21 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438575#comment-13438575
]

Hadoop QA commented on HBASE-6435:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12541734/6435.v14.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 11 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.io.encoding.TestUpgradeFromHFileV1ToEncoding

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2638//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2638//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2638//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2638//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2638//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2638//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2638//console

This message is automatically generated.

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch,
6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch,
6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch,
6435.v9.patch, 6535.v11.patch

HBase writes a Write-Ahead-Log to revover from hardware failure. This log is
written on hdfs.
Through ZooKeeper, HBase gets informed usually in 30s that it should start
the recovery process.
This means reading the Write-Ahead-Log to replay the edits on the other
servers.
In standards deployments, HBase process (regionserver) are deployed on the
same box as the datanodes.
It means that when the box stops, we've actually lost one of the edits, as we
lost both the regionserver and the datanode.
As HDFS marks a node as dead after ~10 minutes, it appears as available when
we try to read the blocks to recover. As such, we are delaying the recovery
process by 60 seconds as the read will usually fail with a socket timeout. If
the file is still opened for writing, it adds an extra 20s + a risk of losing
edits if we connect with ipc to the dead DN.
Possible solutions are:
- shorter dead datanodes detection by the NN. Requires a NN code change.
- better dead datanodes management in DFSClient. Requires a DFS code change.
- NN customisation to write the WAL files on another DN instead of the local
one.
- reordering the blocks returned by the NN on the client side to put the
blocks on the same DN as the dead RS at the end of the priority queue.
Requires a DFS code change or a kind of workaround.
The solution retained is the last one. Compared to what was discussed on the
mailing list, the proposed patch will not modify HDFS source code but adds a
proxy. This for two reasons:
- Some HDFS functions managing block orders are static
(MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would
require to implement partially the fix, change the DFS interface to make this
function non static, or put the hook static. None of these solution is very
clean.
- Adding a proxy allows to put all the code in HBase, simplifying dependency
management.
Nevertheless, it would be better to have this in HDFS. But this solution
allows to target the last version only, and this could

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-21 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438599#comment-13438599
]

Hudson commented on HBASE-6435:
---

Integrated in HBase-TRUNK #3247 (See
[https://builds.apache.org/job/HBase-TRUNK/3247/])
HBASE-6435 Reading WAL files after a recovery leads to time lost in HDFS
timeouts when using dead datanodes - addendum TestBlockReorder.java (Revision
1375454)
HBASE-6435 Reading WAL files after a recovery leads to time lost in HDFS
timeouts when using dead datanodes (Revision 1375451)

Result = FAILURE
nkeywal :
Files :
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs
*
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java

nkeywal :
Files :
*
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ServerName.java
*
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
*
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
*
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
*
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
*
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java
*
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438605#comment-13438605
 ] 

nkeywal commented on HBASE-6435:


Résultats des tests (2 échecs / ±0)
org.apache.hadoop.hbase.TestMultiVersions.testGetRowVersions
org.apache.hadoop.hbase.TestMultiVersions.testScanMultipleVersions

Hum. It's the same error as the one I had in my fist local test. But it's so 
unrelated, and moreover we had this error in build #3242 as well; so I think 
it's ok. Marking as resolved.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 
 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure. This log is 
 written on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-21 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438623#comment-13438623
]

Hudson commented on HBASE-6435:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #140 (See
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/140/])
HBASE-6435 Reading WAL files after a recovery leads to time lost in HDFS
timeouts when using dead datanodes - addendum TestBlockReorder.java (Revision
1375454)
HBASE-6435 Reading WAL files after a recovery leads to time lost in HDFS
timeouts when using dead datanodes (Revision 1375451)

Result = FAILURE
nkeywal :
Files :
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs
*
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-19 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437485#comment-13437485
 ] 

nkeywal commented on HBASE-6435:


release notes done.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 
 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-19 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437590#comment-13437590
 ] 

stack commented on HBASE-6435:
--

@N Nice note.  You should write a blog on it and your other findings.  You 
going to commit?

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 
 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-18 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437295#comment-13437295
 ] 

nkeywal commented on HBASE-6435:


bq. ... but not used. Fix on commit.
Ok

bq.modifiersField.setInt(nf, nf.getModifiers()  ~Modifier.FINAL);
The field is final, we're changing this as we're changing its value.

bq. + // We have a rack to get always the same location order but it does not 
work.
I could remove it on commit... I wanted to use racks to have always the same 
order, but it does not work; the racks are not taken into account in this case, 
I don't know why...

Thanks for the review Ted and Stack, I will commit it beginning of next week if 
I don't have another feedback.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 
 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437419#comment-13437419
 ] 

stack commented on HBASE-6435:
--

+1 on commit w/ above two edits.  Needs nice fat release note.  Good on you N.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 
 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437131#comment-13437131
 ] 

stack commented on HBASE-6435:
--

+1 on commit.  Some notes below that you can address on commit.  Tests look 
good on cursory glance.  Comprehensive.  Nice hacking N.

HMaster does an import of this:

+import org.apache.hadoop.hbase.fs.HFileSystem;

... but not used.  Fix on commit.

Not important, but I'd check its DFS before I'd check reorder enabled flag.  
Next time.

Whats this about?

{code}+  modifiersField.setInt(nf, nf.getModifiers()  
~Modifier.FINAL);{code}

Some of this patch probably belongs in compatibility layers.  One day the 
reorder will be in hadoop  We can address in new issue.

What does this mean?

+// We have a rack to get always the same location order but it does not 
work.





 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 
 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-15 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435102#comment-13435102
 ] 

nkeywal commented on HBASE-6435:


v13 takes into account the comments from the review board.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 
 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-15 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435164#comment-13435164
]

Hadoop QA commented on HBASE-6435:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12541051/6435.v13.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 11 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
org.apache.hadoop.hbase.master.TestMasterNoCluster

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2586//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2586//console

This message is automatically generated.

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch,
6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch,
6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch,
6535.v11.patch

HBase writes a Write-Ahead-Log to revover from hardware failure.
This log is written with 'append' on hdfs.
Through ZooKeeper, HBase gets informed usually in 30s that it should start
the recovery process.
This means reading the Write-Ahead-Log to replay the edits on the other
servers.
In standards deployments, HBase process (regionserver) are deployed on the
same box as the datanodes.
It means that when the box stops, we've actually lost one of the edits, as we
lost both the regionserver and the datanode.
As HDFS marks a node as dead after ~10 minutes, it appears as available when
we try to read the blocks to recover. As such, we are delaying the recovery
process by 60 seconds as the read will usually fail with a socket timeout. If
the file is still opened for writing, it adds an extra 20s + a risk of losing
edits if we connect with ipc to the dead DN.
Possible solutions are:
- shorter dead datanodes detection by the NN. Requires a NN code change.
- better dead datanodes management in DFSClient. Requires a DFS code change.
- NN customisation to write the WAL files on another DN instead of the local
one.
- reordering the blocks returned by the NN on the client side to put the
blocks on the same DN as the dead RS at the end of the priority queue.
Requires a DFS code change or a kind of workaround.
The solution retained is the last one. Compared to what was discussed on the
mailing list, the proposed patch will not modify HDFS source code but adds a
proxy. This for two reasons:
- Some HDFS functions managing block orders are static
(MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would
require to implement partially the fix, change the DFS interface to make this
function non static, or put the hook static. None of these solution is very
clean.
- Adding a proxy allows to put all the code in HBase, simplifying dependency
management.
Nevertheless, it would be better to have this in HDFS. But this

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-15 Thread Zhihong Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435392#comment-13435392
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

+1 on v13 if TestSplitTransactionOnCluster#testSplitBeforeSettingSplittingInZK 
passes.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.96.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 
 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-10 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432715#comment-13432715
 ] 

nkeywal commented on HBASE-6435:


https://reviews.apache.org/r/6522/

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435-v12.txt, 6435.unfinished.patch, 6435.v10.patch, 
 6435.v10.patch, 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 
 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 
 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-04 Thread Zhihong Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428612#comment-13428612
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

@N:
Can you put patch on review board ?

Thanks

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-04 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428638#comment-13428638
 ] 

nkeywal commented on HBASE-6435:


Ok. Tried. Something broke! (Error 500), I will retry later.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-03 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427867#comment-13427867
]

Hadoop QA commented on HBASE-6435:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538999/6435.v12.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 11 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2496//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2496//console

This message is automatically generated.

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

Key: HBASE-6435
URL: https://issues.apache.org/jira/browse/HBASE-6435
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch,
6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435.v2.patch, 6435.v7.patch,
6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 6535.v11.patch

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-02 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427348#comment-13427348
]

Hadoop QA commented on HBASE-6435:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538906/6435.v10.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 8 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.fs.TestBlockReorder

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2481//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2481//console

This message is automatically generated.

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

Key: HBASE-6435
URL: https://issues.apache.org/jira/browse/HBASE-6435
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch,
6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-02 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427640#comment-13427640
 ] 

nkeywal commented on HBASE-6435:


I had to change the test to make it more hadoop-qa friendly. In one of my 
numerous attempts, I added the possibility to start a miniCluster with a 
specific HMaster or HRegionServer class. I finally didn't use it here, but I 
kept it in the patch as it may be useful later...

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 
 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-02 Thread Zhihong Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427752#comment-13427752
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

PreCommit build #2490 got aborted:
{code}
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/dev-support/test-patch.sh:
 line 353: 14017 Aborted $MVN clean test help:active-profiles 
-X -DskipTests -Dhadoop.profile=2.0 -D${PROJECT_NAME}PatchProcess  
$PATCH_DIR/trunk2.0JavacWarnings.txt 21
{code}

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 
 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-02 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427816#comment-13427816
]

Hadoop QA commented on HBASE-6435:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538981/6435.v12.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 11 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestFromClientSide

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2494//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2494//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2494//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2494//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2494//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2494//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2494//console

This message is automatically generated.

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

Key: HBASE-6435
URL: https://issues.apache.org/jira/browse/HBASE-6435
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch,
6435.v12.patch, 6435.v12.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch,
6435.v9.patch, 6435.v9.patch, 6535.v11.patch

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-02 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427850#comment-13427850
 ] 

nkeywal commented on HBASE-6435:


bq.  org.apache.hadoop.hbase.client.TestFromClientSide
I think it's unrelated. Let's retry.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426371#comment-13426371
 ] 

nkeywal commented on HBASE-6435:


Thanks for the review and the test failure analysis, Ted. v9 takes the comments 
into account.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-01 Thread Zhihong Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426595#comment-13426595
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

From PreCommit build #2470, look like compilation against Hadoop 2.0 failed. 

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426656#comment-13426656
 ] 

nkeywal commented on HBASE-6435:


Is there a way to have more info on the failure?
Locally
{noformat}
mvn test -Dhadoop.profile=2.0
{noformat}

says
{noformat}
Tests in error:
  testSimpleCase(org.apache.hadoop.hbase.mapreduce.TestImportExport)
  testWithDeletes(org.apache.hadoop.hbase.mapreduce.TestImportExport)

Tests run: 719, Failures: 0, Errors: 2, Skipped: 2
{noformat}

and .TestBlockReorder is ok (executed 5 times)

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-01 Thread Zhihong Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426682#comment-13426682
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

The compilation in PreCommit build was aborted.
I couldn't reproduce the issue.

Suggest re-attaching patch v9.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426683#comment-13426683
 ] 

nkeywal commented on HBASE-6435:


done :-)

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch, 6435.v9.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-08-01 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426714#comment-13426714
]

Hadoop QA commented on HBASE-6435:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538789/6435.v9.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 8 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestAdmin
org.apache.hadoop.hbase.fs.TestBlockReorder

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2472//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2472//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2472//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2472//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2472//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2472//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2472//console

This message is automatically generated.

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

Key: HBASE-6435
URL: https://issues.apache.org/jira/browse/HBASE-6435
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch,
6435.v8.patch, 6435.v9.patch, 6435.v9.patch

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426738#comment-13426738
 ] 

nkeywal commented on HBASE-6435:


I was expecting the name to be locahost, but it's not the case on hadoop-qa env:
{noformat}
/asf011.sp2.ygridcore.net,43631,1343836299404/asf011.sp2.ygridcore.net%2C43631%2C1343836299404.1343836318993
 is an HLog file, so reordering blocks, last hostname will 
be:asf011.sp2.ygridcore.net
{noformat}

So the trick used to check location ordering on a mini cluster does not work. I 
will find another way...

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch, 6435.v9.patch, 6435.v9.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425878#comment-13425878
 ] 

nkeywal commented on HBASE-6435:


Tested on a real cluster by adding validation code on a region server, went ok. 
I don't have a real idea on how to activate it just for some hadoop versions, 
so I will do a last clean-up on the logs and propose a final version.


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425959#comment-13425959
 ] 

nkeywal commented on HBASE-6435:


Ok for review...

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425980#comment-13425980
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

Just started to look at the patch.
It doesn't compile against hadoop 2.0:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) 
on project hbase-server: Compilation failure: Compilation failure:
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[214,12]
 namenode is not public in org.apache.hadoop.hdfs.DFSClient; cannot be accessed 
from outside package
[ERROR] 
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[221,52]
 namenode is not public in org.apache.hadoop.hdfs.DFSClient; cannot be accessed 
from outside package
[ERROR] 
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[289,81]
 cannot find symbol
[ERROR] symbol  : method getHost()
[ERROR] location: class org.apache.hadoop.hdfs.protocol.DatanodeInfo
{code}
Can we give the following a more meaningful name ?
{code}
+if (!conf.getBoolean(hbase.hdfs.jira6435, true)){  // activated by 
default
{code}
Comment from Todd would be appreciated.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425988#comment-13425988
 ] 

nkeywal commented on HBASE-6435:


I will have a look at the hadoop2 stuff.

for 
bq. Can we give the following a more meaningful name ?
Do you have an idea?

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425995#comment-13425995
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

How about 'hbase.filesystem.reorder.blocks' ?

BTW replacing 'Hack' with some form of 'Intercept' would be better IMHO.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426015#comment-13426015
 ] 

nkeywal commented on HBASE-6435:


Ok.
I wanted to make clear it was a temporary workaround.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426102#comment-13426102
 ] 

nkeywal commented on HBASE-6435:


v8 works ok with hadoop 1  hadoop 2 and other Ted's comments. I tried the v3 
profile, but got errors in the pom.xml.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426151#comment-13426151
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

{code}
+  private static ClientProtocol createReordoringProxy(final ClientProtocol cp,
{code}
Usually spelling would be nit. But this spelling mistake was in method name :-)
{code}
+  public static ServerName getServerNameFromHLogDirectoryName(Configuration 
conf, String path) throws IOException {
{code}
The above line is too long.
{code}
+  LOG.debug(Moved the location +toLast.getHostName()+ to the 
last place. +
+   locations size was +dnis.length);
{code}
I think the above log may appear many times.
{code}
+LOG.fatal( REORDER);
{code}
The above can be made a debug log.


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-31 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426158#comment-13426158
]

Hadoop QA commented on HBASE-6435:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538610/6435.v8.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 8 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.fs.TestBlockReorder

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//console

This message is automatically generated.

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

--
This message is automatically generated by JIRA.
If you

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426225#comment-13426225
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

For the test failure:
{code}
org.junit.ComparisonFailure: expected:[localhost] but was:[host2]
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.hadoop.hbase.fs.TestBlockReorder.testFromDFS(TestBlockReorder.java:320)
at 
org.apache.hadoop.hbase.fs.TestBlockReorder.testHBaseCluster(TestBlockReorder.java:271)
{code}
testFromDFS() should have utilized the done flag for the while loop below:
{code}
+for (int y = 0; y  l.getLocatedBlocks().size()  done; y++) {
+  done = (l.get(y).getLocations().length == 3);
+}
+  } while (l.get(0).getLocations().length != 3);
{code}
When l.getLocatedBlocks().size() is greater than 1, the above loop may exit 
prematurely.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-26 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423280#comment-13423280
 ] 

nkeywal commented on HBASE-6435:


v2. May need some clean up on logs + a check to unactivate it for hadoop 2 for 
example.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-26 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423282#comment-13423282
 ] 

nkeywal commented on HBASE-6435:


+ I need to test it on a real cluster (emulating locations on a mini cluster 
can be dangerous...)

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-22 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420346#comment-13420346
 ] 

Todd Lipcon commented on HBASE-6435:


Good points. We should probably move this discussion over to an HDFS JIRA. 
Having a global DFSClient-wide ability to mark nodes un-preferred is probably 
advantageous.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419871#comment-13419871
 ] 

stack commented on HBASE-6435:
--

@Todd Given suggested Interface, how we map from an hbase session expiration to 
a Replica?  What if the DN died but RS didn't?  Won't the fact that DFSClient 
under the wraps is banging its head timingout against a dead DN -- once per 
DFSInputStream -- be hidden from the RS since its being handled down in 
DFSClient? Don't we need more knowledge on DFSClient workings than suggested 
API exposes if we are to avoid dead DNs?  If we do figure we have a bad DN, do 
we then per open DFSInputStream iterate updating priorities?

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419414#comment-13419414
 ] 

nkeywal commented on HBASE-6435:


The patch is not finished. Actually, it contains for code for the hdfs hook and 
the related test, but not the code for defining the location order from the 
file name. But as it is different from what we initially discussed, I post it 
here in case someone sees something I missed.

It does not mean it should not be fixed in hdfs as well, just that this is 
likely to be much simpler than patching the 1.0 branch...

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419446#comment-13419446
 ] 

Todd Lipcon commented on HBASE-6435:


I'm -1 on this kind of hack going into HBase before we add the feature to HDFS. 
I agree that adding to HDFS proper means we have to wait for a release, but 
this kind of code is likely to be really fragile. Also, without HBase driving 
requirements of HDFS, it will never evolve to natively have these kind of 
features, and HBase will devolve into a mess of reflection hacks to change 
around the HDFS internals.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419551#comment-13419551
 ] 

stack commented on HBASE-6435:
--

Yeah, we should do both (I'd think that whats added to HDFS is more general 
than just this workaround scheme where local gets moved to the end of the list; 
i.e. we add being able to intercept the order returned by the NN and let a 
client-side policy alter it based on local knowledge if wanted Could add 
other customizations like being able to set timeout per DFSInput/OutputStream 
as you've suggested up on dev list N).  Would be sweet if the 'hack' were 
available meantime while we wait on an hdfs release.

Looking at patch, looks like inventive hackery; good on you.

Do we have to do this in both master and regionserver?  Can't do it in 
HFileSystem constructor assuming it takes a Conf (or that'd be too late?)

+  HFileSystem.addLocationOrderHack(conf);

Rather than have it called a reorderProxy, call it an HBaseDFSClient?  Might 
want to add more customizations while waiting on HDFS fix to arrive.


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419581#comment-13419581
 ] 

nkeywal commented on HBASE-6435:


My thinking was it could make it on a hdfs release that accepts changing public 
interfaces. I fully agree with you Todd, we need to do our homeworks and push 
hdfs to ensure that what we need is understood and makes it to a release. On 
the other hand, if I look at how it worked for much simpler stuff like JUnit 
and surefire, our changes are in theie trunk for a few months and we're still 
waiting. These things take time. But I will do my homeworks on hdfs, I promise 
(I may need your help actually). The Jira will be created next week and if I 
have enough feedback I will propose a patch.

I was also wondering if proposing natively to have interceptors would not be 
interesting for hdfs. It was available a long time in an orb called orbix and 
was great to use. But they would need to be per conf, so cannot be available 
with static stuff.

bq. Do we have to do this in both master and regionserver? Can't do it in 
HFileSystem constructor assuming it takes a Conf (or that'd be too late?)
It can be put pretty late, basically before we start a recovery process. But we 
don't want it client side, so I will check this.

bq. Rather than have it called a reorderProxy, call it an HBaseDFSClient? Might 
want to add more customizations while waiting on HDFS fix to arrive.
I've intercepted a lower level call: I'm between the DFSClient and the 
namenode. This because the DFSClient does more than just transferring calls: it 
contains some logic. Hence going in front of the namenode. But yes, I could 
make it more generic.


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419588#comment-13419588
 ] 

Todd Lipcon commented on HBASE-6435:


I think there's a good motivation to add these kind of APIs generally to 
DFSInputStream. In particular, I think something like the following:

public ListReplica getAvailableReplica(long pos); // return the list of 
available replicas at given file offset, in priority order
public void prioritizeReplica(Replica r); // move given replica to front of list
public void blacklistReplica(Replica r); // move replica to back of list
(or something of this sort)

The Replica API would then expose the datanode IDs (and after HDFS-3672, the 
disk ID).
So, in HBase we could simply open the file, enumerate the replicas, 
deprioritize the one on the suspected node, and move on with the normal code 
paths.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419608#comment-13419608
 ] 

nkeywal commented on HBASE-6435:


I understand that you don't want to expose the internal nor something like the 
DatanodeInfo. The same type of API would be useful for the outputstream, 
putting priorities on nodes (and so reusing some knowledge for the dead nodes, 
or, for the wal, remove the local writes). It simple and efficient.

With the current DFSClient implementation, a callback would ease cases like 
opening a file already opened for writing, or when a node list is cleared when 
they all failed. But may be it can be changed as well.



 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419623#comment-13419623
 ] 

Todd Lipcon commented on HBASE-6435:


bq. With the current DFSClient implementation, a callback would ease cases like 
opening a file already opened for writing, or when a node list is cleared when 
they all failed. But may be it can be changed as well.

Can you explain further what you mean here? What would you use these callbacks 
for?

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes