[jira] [Updated] (HDFS-8150) Make getFileChecksum fail for blocks under construction
[ https://issues.apache.org/jira/browse/HDFS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-8150: - Attachment: HDFS-8150.1.patch Attached an initial patch. Please review. Make getFileChecksum fail for blocks under construction --- Key: HDFS-8150 URL: https://issues.apache.org/jira/browse/HDFS-8150 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: J.Andreina Priority: Critical Attachments: HDFS-8150.1.patch We have seen the cases of validating data copy using checksum then the content of target changing. It turns out the target wasn't closed successfully, so it was still under-construction. One hour later, a lease recovery kicked in and truncated the block. Although this can be prevented in many ways, if there is no valid use case for getting file checksum from under-construction blocks, can it be disabled? E.g. Datanode can throw an exception if the replica is not in the finalized state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8150) Make getFileChecksum fail for blocks under construction
[ https://issues.apache.org/jira/browse/HDFS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-8150: - Status: Patch Available (was: Open) Make getFileChecksum fail for blocks under construction --- Key: HDFS-8150 URL: https://issues.apache.org/jira/browse/HDFS-8150 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: J.Andreina Priority: Critical Attachments: HDFS-8150.1.patch We have seen the cases of validating data copy using checksum then the content of target changing. It turns out the target wasn't closed successfully, so it was still under-construction. One hour later, a lease recovery kicked in and truncated the block. Although this can be prevented in many ways, if there is no valid use case for getting file checksum from under-construction blocks, can it be disabled? E.g. Datanode can throw an exception if the replica is not in the finalized state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517080#comment-14517080 ] Yi Liu edited comment on HDFS-7348 at 4/28/15 2:20 PM: --- {noformat} Recover one or more missed striped block in the striped block group, the minimum number of live striped blocks should be no less than data block number. | - Striped Block Group - | blk_0 blk_1 blk_2(*) blk_3 ... - A striped block group | | | | v v v v +--+ +--+ +--+ +--+ |cell_0| |cell_1| |cell_2| |cell_3| ... - The striped cell group (cell_0, cell_1, ...) +--+ +--+ +--+ +--+ |cell_4| |cell_5| |cell_6| |cell_7| ... +--+ +--+ +--+ +--+ |cell_8| |cell_9| |cell10| |cell11| ... +--+ +--+ +--+ +--+ ... ... ... ... We use following steps to recover striped cell group sequentially: step1: read minimum striped cells required by recovery. step2: decode cells for targets. step3: transfer cells to targets. In step1, try to read minimum striped cells, if there is corrupt or stale sources, read from new source will be scheduled. The best sources are remembered for next round and may be updated in each round. In step2, It's blocked by HADOOP-11847, currently we only fill 1... to target block for test. Typically if source blocks we read are all data blocks, we need to call encode, and if there is one parity block, we need to call decode. Notice we only read once and recover all missed striped block if they are more than one. In step3, send the recovered cells to targets by constructing packet and send them directly. Same as continuous block replication, we don't check the packet ack. Since the datanode doing the recovery work are one of the source datanodes, so the recovered cells are sent remotely. There are some points we can do further improvements in next phase: 1. we can read the block file directly on the local datanode, currently we use remote block reader. (Notice short-circuit is not a good choice, see inline comments). 2. We need to check the packet ack for EC recovery? Since EC recovery is more expensive than continuous block replication, it needs to read from several other datanodes, should we make sure the recovered result received by targets? {noformat} was (Author: hitliuyi): {noformat} DataRecoveryAndTransfer recover one or more missed striped block in the striped block group, the minimum number of live striped blocks should be no less than data block number. | - Striped Block Group - | blk_0 blk_1 blk_2(*) blk_3 ... - A striped block group | | | | v v v v +--+ +--+ +--+ +--+ |cell_0| |cell_1| |cell_2| |cell_3| ... - The striped cell group (cell_0, cell_1, ...) +--+ +--+ +--+ +--+ |cell_4| |cell_5| |cell_6| |cell_7| ... +--+ +--+ +--+ +--+ |cell_8| |cell_9| |cell10| |cell11| ... +--+ +--+ +--+ +--+ ... ... ... ... We use following steps to recover striped cell group sequentially: step1: read minimum striped cells required by recovery. step2: decode cells for targets. step3: transfer cells to targets. In step1, try to read minimum striped cells, if there is corrupt or stale sources, read from new source will be scheduled. The best sources are remembered for next round and may be updated in each round. In step2, It's blocked by HADOOP-11847, currently we only fill 1... to target block for test. Typically if source blocks we read are all data blocks, we need to call encode, and if there is one parity block, we need to call decode. Notice we only read once and recover all missed striped block if they are more than one. In step3, send the recovered cells to targets by constructing packet and send them directly. Same as continuous block replication, we don't check the packet ack. Since the datanode doing the recovery work are one of the source datanodes, so the recovered cells are sent remotely. There are some points we can do further improvements in next phase: 1. we can read the block file directly on the local datanode, currently we use remote block reader. (Notice short-circuit is not a good choice, see inline comments). 2. We need to check the packet ack for EC recovery? Since EC recovery is more expensive than continuous block replication, it needs to read from several other datanodes, should we make sure the recovered result received by targets? {noformat} Erasure Coding: striped block recovery
[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7348: - Description: This JIRA is to recover one or more missed striped block in the striped block group. (was: This assumes the facilities like block reader and writer are ready, implements and performs erasure decoding/recovery work in *stripping* case utilizing erasure codec and coder provided by the codec framework.) Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516940#comment-14516940 ] Binglin Chang commented on HDFS-5574: - Strange, the test error is caused by NoSuchMethodError, which should not happen if code is compiled successfully, is there any bug in test-patch process? {code} java.lang.NoSuchMethodError: org.apache.hadoop.fs.FSInputChecker.readAndDiscard(I)I at org.apache.hadoop.hdfs.RemoteBlockReader.read(RemoteBlockReader.java:128) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:740) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:796) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:856) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:899) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700) at org.apache.hadoop.hdfs.TestDFSInputStream.testSkipInner(TestDFSInputStream.java:61) at org.apache.hadoop.hdfs.TestDFSInputStream.testSkipWithRemoteBlockReader(TestDFSInputStream.java:76) {code} Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, HDFS-5574.008.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517087#comment-14517087 ] Yi Liu commented on HDFS-7348: -- {{testFileBlocksRecovery}} is to test the file blocks recovery: 1. Check the replica is recovered in the target datanode, and verify the block replica length, generationStamp and content. 2. Read the file and verify content. The decode is blocked by HADOOP-11847, will update the test to read file and verify content after recovery. Currently we fill the block replica with 1... for test. Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8268) Port conflict log for data node server is not sufficient
[ https://issues.apache.org/jira/browse/HDFS-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated HDFS-8268: --- Attachment: HDFS-8268 i thought the solution should be as per the attached patch. Please review the same. Port conflict log for data node server is not sufficient Key: HDFS-8268 URL: https://issues.apache.org/jira/browse/HDFS-8268 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0, 2.8.0 Environment: x86_64 x86_64 x86_64 GNU/Linux Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Priority: Minor Attachments: HDFS-8268 Original Estimate: 24h Remaining Estimate: 24h Data Node Server start up issue due to port conflict. The data node server port dfs.datanode.http.address conflict is not sufficient to identify the reason of failure. The exception log by the server is as below *Actual:* 2015-04-27 16:48:53,960 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:844) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:194) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:340) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) *_The above log does not contain the information of the conflicting port._* *Expected output:* java.net.BindException: Problem binding to [0.0.0.0:50075] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721) at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.start(DatanodeHttpServer.java:160) at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:795) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1142) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:439) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2420) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2298) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2349) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2540) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2564) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475)
[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7348: - Summary: Erasure Coding: striped block recovery (was: Erasure Coding: perform stripping erasure decoding/recovery work given block reader and writer) Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This assumes the facilities like block reader and writer are ready, implements and performs erasure decoding/recovery work in *stripping* case utilizing erasure codec and coder provided by the codec framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517080#comment-14517080 ] Yi Liu commented on HDFS-7348: -- {noformat} DataRecoveryAndTransfer recover one or more missed striped block in the striped block group, the minimum number of live striped blocks should be no less than data block number. | - Striped Block Group - | blk_0 blk_1 blk_2(*) blk_3 ... - A striped block group | | | | v v v v +--+ +--+ +--+ +--+ |cell_0| |cell_1| |cell_2| |cell_3| ... - The striped cell group (cell_0, cell_1, ...) +--+ +--+ +--+ +--+ |cell_4| |cell_5| |cell_6| |cell_7| ... +--+ +--+ +--+ +--+ |cell_8| |cell_9| |cell10| |cell11| ... +--+ +--+ +--+ +--+ ... ... ... ... We use following steps to recover striped cell group sequentially: step1: read minimum striped cells required by recovery. step2: decode cells for targets. step3: transfer cells to targets. In step1, try to read minimum striped cells, if there is corrupt or stale sources, read from new source will be scheduled. The best sources are remembered for next round and may be updated in each round. In step2, It's blocked by HADOOP-11847, currently we only fill 1... to target block for test. Typically if source blocks we read are all data blocks, we need to call encode, and if there is one parity block, we need to call decode. Notice we only read once and recover all missed striped block if they are more than one. In step3, send the recovered cells to targets by constructing packet and send them directly. Same as continuous block replication, we don't check the packet ack. Since the datanode doing the recovery work are one of the source datanodes, so the recovered cells are sent remotely. There are some points we can do further improvements in next phase: 1. we can read the block file directly on the local datanode, currently we use remote block reader. (Notice short-circuit is not a good choice, see inline comments). 2. We need to check the packet ack for EC recovery? Since EC recovery is more expensive than continuous block replication, it needs to read from several other datanodes, should we make sure the recovered result received by targets? {noformat} Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8268) Port conflict log for data node server is not sufficient
[ https://issues.apache.org/jira/browse/HDFS-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated HDFS-8268: --- Attachment: (was: HDFS-8268) Port conflict log for data node server is not sufficient Key: HDFS-8268 URL: https://issues.apache.org/jira/browse/HDFS-8268 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0, 2.8.0 Environment: x86_64 x86_64 x86_64 GNU/Linux Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Priority: Minor Original Estimate: 24h Remaining Estimate: 24h Data Node Server start up issue due to port conflict. The data node server port dfs.datanode.http.address conflict is not sufficient to identify the reason of failure. The exception log by the server is as below *Actual:* 2015-04-27 16:48:53,960 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:844) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:194) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:340) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) *_The above log does not contain the information of the conflicting port._* *Expected output:* java.net.BindException: Problem binding to [0.0.0.0:50075] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721) at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.start(DatanodeHttpServer.java:160) at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:795) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1142) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:439) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2420) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2298) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2349) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2540) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2564) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021)
[jira] [Updated] (HDFS-8268) Port conflict log for data node server is not sufficient
[ https://issues.apache.org/jira/browse/HDFS-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated HDFS-8268: --- Attachment: HDFS-8268.patch Port conflict log for data node server is not sufficient Key: HDFS-8268 URL: https://issues.apache.org/jira/browse/HDFS-8268 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0, 2.8.0 Environment: x86_64 x86_64 x86_64 GNU/Linux Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Priority: Minor Attachments: HDFS-8268.patch Original Estimate: 24h Remaining Estimate: 24h Data Node Server start up issue due to port conflict. The data node server port dfs.datanode.http.address conflict is not sufficient to identify the reason of failure. The exception log by the server is as below *Actual:* 2015-04-27 16:48:53,960 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:844) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:194) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:340) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) *_The above log does not contain the information of the conflicting port._* *Expected output:* java.net.BindException: Problem binding to [0.0.0.0:50075] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721) at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.start(DatanodeHttpServer.java:160) at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:795) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1142) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:439) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2420) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2298) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2349) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2540) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2564) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475) at
[jira] [Updated] (HDFS-7348) Erasure Coding: perform stripping erasure decoding/recovery work given block reader and writer
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7348: - Attachment: HDFS-7348.001.patch Erasure Coding: perform stripping erasure decoding/recovery work given block reader and writer -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This assumes the facilities like block reader and writer are ready, implements and performs erasure decoding/recovery work in *stripping* case utilizing erasure codec and coder provided by the codec framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516809#comment-14516809 ] Hadoop QA commented on HDFS-5574: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 9m 15s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 31s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 4m 46s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 42s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 27s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | common tests | 22m 32s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 174m 54s | Tests failed in hadoop-hdfs. | | | | 248m 40s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.hdfs.TestRemoteBlockReader | | | hadoop.hdfs.TestDFSInputStream | | | hadoop.hdfs.server.namenode.TestStartup | | Timed out tests | org.apache.hadoop.ha.TestZKFailoverControllerStress | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728717/HDFS-5574.008.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / feb68cb | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10429/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10429/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10429/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10429/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10429/console | This message was automatically generated. Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, HDFS-5574.008.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7678: Target Version/s: HDFS-7285 Affects Version/s: HDFS-7285 Status: Patch Available (was: In Progress) Erasure coding: DFSInputStream with decode functionality Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678.000.patch, HDFS-7678.001.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7678: Attachment: HDFS-7678-HDFS-7285.002.patch New patch with a functional test. Also renaming to trigger Jenkins. Erasure coding: DFSInputStream with decode functionality Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, HDFS-7678.000.patch, HDFS-7678.001.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517930#comment-14517930 ] Tsz Wo Nicholas Sze commented on HDFS-8204: --- Actually, It does support. ... You are right. Thanks for point it out. Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8204: -- Resolution: Fixed Fix Version/s: 2.7.1 Status: Resolved (was: Patch Available) I have committed this. Thanks, Walter! Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517943#comment-14517943 ] Hudson commented on HDFS-8204: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7694 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7694/]) HDFS-8204. Mover/Balancer should not schedule two replicas to the same datanode. Contributed by Walter Su (szetszwo: rev 5639bf02da716b3ecda785979b3d08cdca15972d) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518362#comment-14518362 ] Zhe Zhang commented on HDFS-7348: - Please find detailed comments below: Logics: # Since recovering multiple missing blocks at once is a pretty rare case, should we just reconstruct all missing blocks and use {{DataNode#DataTransfer}} to push them out? # I filed HDFS-8282 to move {{StripedReadResult}} and {{waitNextCompletion}} to {{StripedBlockUtil}}. # In foreground recovery we read in parallel to minimize latency. It's an interesting design question whether we should we do the same in background recovery. More discussions are needed here. # If we do choose to read source blocks in parallel, how should we design the unit of sync-and-decode? Right now the readers read a cell at a time. Another option is to read entire blocks and then decode. The drawback is larger temporary memory usage. The benefits are: i) simpler logic (no need to recreate reading threads) and avoid the overhead of initializing connection to source DNs; ii) maintain open connections as short as possible (fast readers don't need to wait for slow ones); iii) Does it save CPU to decode in big chunks? [~drankye] Could you advise? # Should we save a copy of reconstructed block locally? More space will be used; but it will avoid re-decoding if push fails. Nits: # Could use {{ArrayList}} {code} stripedReaders = new ArrayListStripedReader(sources.length); {code} # Maybe we can move {{getBlock}} to {{StripedBlockUtil}} too; it's a useful util to only parse the {{Block}}. If it sounds good to you I'll move it in HDFS-8282. Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8273: - Resolution: Fixed Fix Version/s: 2.7.1 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks Jing for the reviews. FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8284) Add usage of tracing originated in DFSClient to doc
Masatake Iwasaki created HDFS-8284: -- Summary: Add usage of tracing originated in DFSClient to doc Key: HDFS-8284 URL: https://issues.apache.org/jira/browse/HDFS-8284 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Tracing originated in DFSClient uses configuration keys prefixed with dfs.client.htrace after HDFS-8213. Server side tracing uses conf keys prefixed with dfs.htrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518265#comment-14518265 ] Tsz Wo Nicholas Sze commented on HDFS-7687: --- Both JIRAs are merged to the branch now. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518394#comment-14518394 ] Colin Patrick McCabe commented on HDFS-7758: can you rebase the patch on trunk? thanks Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8283) DataStreamer cleanup and some minor improvement
[ https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8283: -- Attachment: h8283_20150428.patch h8283_20150428.patch: 1st patch. DataStreamer cleanup and some minor improvement --- Key: HDFS-8283 URL: https://issues.apache.org/jira/browse/HDFS-8283 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8283_20150428.patch - When throwing an exception -* always set lastException -* always creating a new exception so that it has the new stack trace - Add LOG. - Add final to isAppend and favoredNodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8283) DataStreamer cleanup and some minor improvement
Tsz Wo Nicholas Sze created HDFS-8283: - Summary: DataStreamer cleanup and some minor improvement Key: HDFS-8283 URL: https://issues.apache.org/jira/browse/HDFS-8283 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor - When throwing an exception -* always set lastException -* always creating a new exception so that it has the new stack trace - Add LOG. - Add final to isAppend and favoredNodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) logSync() is called inside of write lock for delete op
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518492#comment-14518492 ] Jing Zhao commented on HDFS-8273: - +1 for the latest patch. Thanks for the fix, Haohui! logSync() is called inside of write lock for delete op -- Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-7980: Attachment: HDFS-7980.004.patch Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch, HDFS-7980.004.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7770) Need document for storage type label of data node storage locations under dfs.data.dir
[ https://issues.apache.org/jira/browse/HDFS-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518454#comment-14518454 ] Hadoop QA commented on HDFS-7770: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 32s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 54s | Site still builds. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | native | 3m 12s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 163m 21s | Tests failed in hadoop-hdfs. | | | | 206m 31s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728926/HDFS-7770.02.patch | | Optional Tests | javadoc javac unit site | | git revision | trunk / 5190923 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10437/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10437/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10437/console | This message was automatically generated. Need document for storage type label of data node storage locations under dfs.data.dir -- Key: HDFS-7770 URL: https://issues.apache.org/jira/browse/HDFS-7770 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7700.01.patch, HDFS-7770.00.patch, HDFS-7770.02.patch HDFS-2832 enables support for heterogeneous storages in HDFS, which allows DN as a collection of storages with different types. However, I can't find document on how to label different storage types from the following two documents. I found the information from the design spec. It will be good we document this for admins and users to use the related Archival storage and storage policy features. http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml This JIRA is opened to add document for the new storage type labels. 1. Add an example under ArchivalStorage.html#Configuration section: {code} property namedfs.data.dir/name value[DISK]file:///hddata/dn/disk0, [SSD]file:///hddata/dn/ssd0,[ARCHIVE]file:///hddata/dn/archive0/value /property {code} 2. Add a short description of [DISK/SSD/ARCHIVE/RAM_DISK] options in hdfs-default.xml#dfs.data.dir and document DISK as storage type if no storage type is labeled in the data node storage location configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8283) DataStreamer cleanup and some minor improvement
[ https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8283: -- Status: Patch Available (was: Open) DataStreamer cleanup and some minor improvement --- Key: HDFS-8283 URL: https://issues.apache.org/jira/browse/HDFS-8283 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8283_20150428.patch - When throwing an exception -* always set lastException -* always creating a new exception so that it has the new stack trace - Add LOG. - Add final to isAppend and favoredNodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) logSync() is called inside of write lock for delete op
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518468#comment-14518468 ] Hadoop QA commented on HDFS-8273: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 7m 13s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 3s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 164m 40s | Tests passed in hadoop-hdfs. | | | | 212m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728929/HDFS-8273.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5190923 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10436/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10436/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10436/console | This message was automatically generated. logSync() is called inside of write lock for delete op -- Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8272: Attachment: h8272-HDFS-7285.001.patch Thanks again for the review, Zhe! Update the patch to address your comments (including DFSInputStream changes). The main change is to only refetch the key/token once for the group. About the encryption key retry logic, I think it is handled while creating the block reader. More specifically, while creating the TCP peer in {{BlockReaderFactory#getRemoteBlockReaderFromTcp}}, the sasl protocol is triggered during which the encryptionKey can be refetched. Erasure Coding: simplify the retry logic in DFSStripedInputStream - Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block
[ https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518376#comment-14518376 ] Yongjun Zhang commented on HDFS-7281: - Hi [~mingma], I think we can get this fix to trunk targetting 3.0, and follow up with other improvement like [~andrew.wang] proposed in the email thread. Would you please take a look at comment at https://issues.apache.org/jira/browse/HDFS-7281?focusedCommentId=14510451page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14510451 ? Thanks. Missing block is marked as corrupted block -- Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Labels: supportability Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, HDFS-7281.patch In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-7758: Attachment: HDFS-7758.006.patch Thanks for looking into this, [~cmccabe]. Upload a rebased patch. Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518487#comment-14518487 ] Hadoop QA commented on HDFS-7397: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | native | 3m 19s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 164m 55s | Tests failed in hadoop-hdfs. | | | | 203m 59s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | Timed out tests | org.apache.hadoop.hdfs.server.mover.TestMover | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728676/HDFS-7397-002.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / 5190923 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10438/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10438/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10438/console | This message was automatically generated. The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397-002.patch, HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-8282) Erasure coding: move striped reading logic to StripedBlockUtil
[ https://issues.apache.org/jira/browse/HDFS-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-8282 started by Zhe Zhang. --- Erasure coding: move striped reading logic to StripedBlockUtil -- Key: HDFS-8282 URL: https://issues.apache.org/jira/browse/HDFS-8282 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8282) Erasure coding: move striped reading logic to StripedBlockUtil
Zhe Zhang created HDFS-8282: --- Summary: Erasure coding: move striped reading logic to StripedBlockUtil Key: HDFS-8282 URL: https://issues.apache.org/jira/browse/HDFS-8282 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518415#comment-14518415 ] Hadoop QA commented on HDFS-7678: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 30s | Pre-patch HDFS-7285 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 31s | The applied patch generated 3 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 55s | The patch appears to introduce 9 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 196m 58s | Tests failed in hadoop-hdfs. | | {color:red}-1{color} | hdfs tests | 0m 20s | Tests failed in hadoop-hdfs-client. | | | | 243m 54s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-client | | | org.apache.hadoop.hdfs.protocol.LocatedStripedBlock.getBlockIndices() may expose internal representation by returning LocatedStripedBlock.blockIndices At LocatedStripedBlock.java:by returning LocatedStripedBlock.blockIndices At LocatedStripedBlock.java:[line 57] | | FindBugs | module:hadoop-hdfs | | | Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time Unsynchronized access at DFSOutputStream.java:89% of time Unsynchronized access at DFSOutputStream.java:[line 142] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.DFSStripedInputStream.planReadPortions(int, int, long, int, int) At DFSStripedInputStream.java:to long in org.apache.hadoop.hdfs.DFSStripedInputStream.planReadPortions(int, int, long, int, int) At DFSStripedInputStream.java:[line 101] | | | Dead store to offSuccess in org.apache.hadoop.hdfs.StripedDataStreamer.endBlock() At StripedDataStreamer.java:org.apache.hadoop.hdfs.StripedDataStreamer.endBlock() At StripedDataStreamer.java:[line 104] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.spaceConsumed() At BlockInfoStriped.java:to long in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.spaceConsumed() At BlockInfoStriped.java:[line 208] | | | Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:[line 206] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema): String.getBytes() At ErasureCodingZoneManager.java:[line 116] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath): new String(byte[]) At ErasureCodingZoneManager.java:[line 81] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 75] | | Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestDFSStripedInputStream | | | hadoop.hdfs.TestReadStripedFile | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | org.apache.hadoop.hdfs.server.namenode.TestAddStripedBlocks | | Failed build | hadoop-hdfs-client | \\ \\ || Subsystem || Report/Notes || | Patch URL |
[jira] [Commented] (HDFS-8273) logSync() is called inside of write lock for delete op
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518444#comment-14518444 ] Hadoop QA commented on HDFS-8273: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 29s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 7m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 4s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 163m 40s | Tests failed in hadoop-hdfs. | | | | 211m 44s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728919/HDFS-8273.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bc1bd7e | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10435/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10435/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10435/console | This message was automatically generated. logSync() is called inside of write lock for delete op -- Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518465#comment-14518465 ] Zhe Zhang commented on HDFS-7678: - Thanks Andrew for the review; it's very helpful. Some quick feedback while I work on the harder parts: bq. waitNextCompletion, shouldn't the read timeout be an overall timeout Great idea. Otherwise the timeout policy is too strict in the beginning and too loose toward the end. bq. throwing InterruptedException on empty futures This part (also the main {{waitNextCompletion}} logic) was actually inherited from {{DFSInputStream#getFirstToComplete}}. I think we should take care of this issue together with other {{InterruptedException}} updates in the planed follow-on JIRA (against trunk). I will update {{waitNextCompletion}} to get rid of this {{InterruptedException}} under this JIRA. bq. Do we actually need missingBlkIndices or the non-success cases? It's the set complement of fetchedBlkIndices. Not really. {{missingBlkIndices}} has all _confirmed_ missing blocks while {{fetchedBlkIndices}} has all fetched blocks that _cover the max missing span_. For example, if cell size is 4k, you want to read range 2k~8k, and block #1 is missing, then {{missingBlkIndices}} should contain only _1_ and {{fetchedBlkIndices}} is empty, since block #0 needs to be refetched (we only have half of it for recovery). bq. We always go through a function called fetchExtraBlks... Good catch and I think that's what fails {{TestReadStripedFile}} and {{TestDFSStripedInputStream}} bq. I wonder if it'd be better to do all the fetching first (including parity if necessary), It's an appealing idea. [~hitliuyi] has an interesting logic of inserting a new Future when finding a failed Future logic under HDFS-7348. I'll try to leverage that. Erasure coding: DFSInputStream with decode functionality Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, HDFS-7678.000.patch, HDFS-7678.001.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7995) Implement chmod in the HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518482#comment-14518482 ] Haohui Mai commented on HDFS-7995: -- Sorry for the late reply. Thanks for the work. {code} + div class=modal id=perm-info tabindex=-1 role=dialog aria-hidden=true {code} It makes sense to give the id a prefix (e.g. {{explorer}}) to avoid confusion. The same comment applies to things like {{perm-heading}}, etc. {code} +tdspan class=explorer-perm-links editable-click + {type|helper_to_directory}{permission|helper_to_permission} + {aclBit|helper_to_acl_bit} +/span/td -tda style=cursor:pointer inode-type={type} class=explorer-browse-links inode-path={pathSuffix}{pathSuffix}/a/td +tda style=cursor:pointer inode-type={type} class=explorer-browse-links{pathSuffix}/a/td{code} The change seems unnecessary. {code} + function view_perm_details(filename, abs_path, perms) { {code} There is no need to parse the permission from the string as the original data is available in the {{LISTSTATUS}} call. What you can do is to expose it through a data field. e.g., {code} + tr inode-path={pathSuffix} data-permission={permission} {code} {code} + function convertCheckboxesToOctalPermissions() { {code} It is easier to calculate the permission by expose the location of the bit using an attribute. e.g., {code} var p = 0; $.each('perm inputbox:checked').function() { p += 1 (+$(this).attr('data-bit')); } return p.toString(8); {code} Implement chmod in the HDFS Web UI -- Key: HDFS-7995 URL: https://issues.apache.org/jira/browse/HDFS-7995 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7995.01.patch, HDFS-7995.02.patch We should let users change the permissions of files and directories using the HDFS Web UI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8056) Decommissioned dead nodes should continue to be counted as dead after NN restart
[ https://issues.apache.org/jira/browse/HDFS-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518357#comment-14518357 ] Ming Ma commented on HDFS-8056: --- [~andrew.wang] and others, appreciate any input you might have. Decommissioned dead nodes should continue to be counted as dead after NN restart Key: HDFS-8056 URL: https://issues.apache.org/jira/browse/HDFS-8056 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-8056-2.patch, HDFS-8056.patch We had some offline discussion with [~andrew.wang] and [~cmccabe] about this. Bring this up for more input and get the patch in place. Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However, after NN restarts, those nodes that were dead before NN restart won't be in {{datanodeMap}}. {{DatanodeManager}}'s {{getDatanodeListForReport}} will add those dead nodes, but not if they are in the exclude file. {noformat} if (listDeadNodes) { for (InetSocketAddress addr : includedNodes) { if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) { continue; } // The remaining nodes are ones that are referenced by the hosts // files but that we do not know about, ie that we have never // head from. Eg. an entry that is no longer part of the cluster // or a bogus entry was given in the hosts files // // If the host file entry specified the xferPort, we use that. // Otherwise, we guess that it is the default xfer port. // We can't ask the DataNode what it had configured, because it's // dead. DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr .getAddress().getHostAddress(), addr.getHostName(), , addr.getPort() == 0 ? defaultXferPort : addr.getPort(), defaultInfoPort, defaultInfoSecurePort, defaultIpcPort)); setDatanodeDead(dn); nodes.add(dn); } } {noformat} The issue here is the decommissioned dead node JMX will be different after NN restart. It might be better to make it consistent across NN restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7995) Implement chmod in the HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518370#comment-14518370 ] Allen Wittenauer commented on HDFS-7995: +1 lgtm Implement chmod in the HDFS Web UI -- Key: HDFS-7995 URL: https://issues.apache.org/jira/browse/HDFS-7995 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7995.01.patch, HDFS-7995.02.patch We should let users change the permissions of files and directories using the HDFS Web UI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8213: --- Attachment: HDFS-8213.002.patch DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518389#comment-14518389 ] Colin Patrick McCabe commented on HDFS-8213: Thanks for the review, [~iwasakims]. I attached a patch. Let's do the hdfs-default.xml and other docs stuff later since it's not directly related to this DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7995) Implement chmod in the HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518425#comment-14518425 ] Haohui Mai commented on HDFS-7995: -- I think the code requires some more clean up on unused ids in the HTML. Maybe we should replace the {{closest()}} call with something more performant. Implement chmod in the HDFS Web UI -- Key: HDFS-7995 URL: https://issues.apache.org/jira/browse/HDFS-7995 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7995.01.patch, HDFS-7995.02.patch We should let users change the permissions of files and directories using the HDFS Web UI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8269: - Attachment: HDFS-8269.003.patch getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, HDFS-8269.002.patch, HDFS-8269.003.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518498#comment-14518498 ] Hadoop QA commented on HDFS-8280: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 4m 1s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 11s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 18s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 165m 44s | Tests passed in hadoop-hdfs. | | | | 211m 23s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728938/HDFS-8280.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5190923 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10439/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10439/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10439/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10439/console | This message was automatically generated. Code Cleanup in DFSInputStream -- Key: HDFS-8280 URL: https://issues.apache.org/jira/browse/HDFS-8280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-8280.000.patch This is some code cleanup separate from HDFS-8272: # Avoid duplicated block reader creation code # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null instead of throwing Exception. Whether to throw Exception or not should be determined by {{getBestNodeDNAddrPair}}'s caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518554#comment-14518554 ] Brahma Reddy Battula commented on HDFS-7397: Testcase failures are unrelated to this patch.. The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397-002.patch, HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518572#comment-14518572 ] Neeta Garimella commented on HDFS-3107: --- Could someone comment on why truncate is not exposed via FileSystem Class. Are we expecting applications using this will be calling DFSClient interface directly? HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Fix For: 2.7.0 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518570#comment-14518570 ] Hadoop QA commented on HDFS-8213: - (!) The patch artifact directory on has been removed! This is a fatal error for test-patch.sh. Aborting. Jenkins (node H4) information at https://builds.apache.org/job/PreCommit-HDFS-Build/10444/ may provide some hints. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518503#comment-14518503 ] Hadoop QA commented on HDFS-8273: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:red}-1{color} | javac | 7m 43s | The applied patch generated 2 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 5m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 9s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 18s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 164m 23s | Tests failed in hadoop-hdfs. | | | | 211m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728929/HDFS-8273.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5190923 | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/10440/artifact/patchprocess/diffJavacWarnings.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10440/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10440/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10440/console | This message was automatically generated. FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8280) Code Cleanup in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8280: - Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the contribution. Code Cleanup in DFSInputStream -- Key: HDFS-8280 URL: https://issues.apache.org/jira/browse/HDFS-8280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8280.000.patch This is some code cleanup separate from HDFS-8272: # Avoid duplicated block reader creation code # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null instead of throwing Exception. Whether to throw Exception or not should be determined by {{getBestNodeDNAddrPair}}'s caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint
[ https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518528#comment-14518528 ] Andrew Wang commented on HDFS-8214: --- +1 LGTM, thanks Charles. I rekicked Jenkins, should come back clean. Secondary NN Web UI shows wrong date for Last Checkpoint Key: HDFS-8214 URL: https://issues.apache.org/jira/browse/HDFS-8214 Project: Hadoop HDFS Issue Type: Bug Components: HDFS, namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, HDFS-8214.003.patch SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in the web UI. This causes weird times, generally, just after the epoch, to be displayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518596#comment-14518596 ] Hudson commented on HDFS-8273: -- FAILURE: Integrated in Hadoop-trunk-Commit #7697 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7697/]) HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the lock. Contributed by Haohui Mai. (wheat9: rev c79e7f7d997596e0c38ae4cddff2bd0910581c16) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Fix For: 2.7.1 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518595#comment-14518595 ] Hudson commented on HDFS-8280: -- FAILURE: Integrated in Hadoop-trunk-Commit #7697 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7697/]) HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: rev 439614b0c8a3df3d8b7967451c5331a0e034e13a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Code Cleanup in DFSInputStream -- Key: HDFS-8280 URL: https://issues.apache.org/jira/browse/HDFS-8280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8280.000.patch This is some code cleanup separate from HDFS-8272: # Avoid duplicated block reader creation code # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null instead of throwing Exception. Whether to throw Exception or not should be determined by {{getBestNodeDNAddrPair}}'s caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518598#comment-14518598 ] Masatake Iwasaki commented on HDFS-8213: Thanks for the update, [~cmccabe]. I'm +1(non-binding) for 002. bq. Let's do the hdfs-default.xml and other docs stuff later since it's not directly related to this Yeah. I filed HDFS-8284. DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch, HDFS-8213.002.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516888#comment-14516888 ] Hudson commented on HDFS-8232: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #177 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/177/]) HDFS-8232. Missing datanode counters when using Metrics2 sink interface. Contributed by Anu Engineer. (cnauroth: rev feb68cb5470dc3e6c16b6bc1549141613e360601) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetricHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeFSDataSetSink.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/FSDatasetMBean.java Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) CommandFormat#parse() should not parse option as value of option
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516895#comment-14516895 ] Hudson commented on HDFS-8205: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #177 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/177/]) HDFS-8205. CommandFormat#parse() should not parse option as value of option. (Contributed by Peter Shi and Xiaoyu Yao) (arp: rev 0d5b0143cc003e132ce454415e35d55d46311416) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestCount.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandFormat.java HDFS-8205. Fix CHANGES.txt (arp: rev 6bae5962cd70ac33fe599c50fb2a906830e5d4b2) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt CommandFormat#parse() should not parse option as value of option Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Blocker Fix For: 2.8.0 Attachments: HDFS-8205.01.patch, HDFS-8205.02.patch, HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} This blocks query quota by storage type and clear quota by storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7613) Block placement policy for erasure coding groups
[ https://issues.apache.org/jira/browse/HDFS-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517015#comment-14517015 ] Junping Du commented on HDFS-7613: -- Thanks [~zhz], the multi-policies implementation here sounds reasonable to me. Some quick questions: do we want DFS_BLOCK_PLACEMENT_EC_CLASSNAME_DEFAULT to be BlockPlacementPolicyEC rather than BlockPlacementPolicyDefault? I didn't check details of BlockPlacementPolicyEC, not sure if BlockPlacementPolicyDefault can meet all cases that BlockPlacementPolicyEC should be there. Also, I see BlockPlacementPolicyEC support rack layer only, do we have plan to support NodeGroup layer as well? It would be great to make EC can be suitable for broader scenarios. Block placement policy for erasure coding groups Key: HDFS-7613 URL: https://issues.apache.org/jira/browse/HDFS-7613 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Walter Su Attachments: HDFS-7613.001.patch Blocks in an erasure coding group should be placed in different failure domains -- different DataNodes at the minimum, and different racks ideally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) CommandFormat#parse() should not parse option as value of option
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516865#comment-14516865 ] Hudson commented on HDFS-8205: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2109 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2109/]) HDFS-8205. CommandFormat#parse() should not parse option as value of option. (Contributed by Peter Shi and Xiaoyu Yao) (arp: rev 0d5b0143cc003e132ce454415e35d55d46311416) * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandFormat.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestCount.java HDFS-8205. Fix CHANGES.txt (arp: rev 6bae5962cd70ac33fe599c50fb2a906830e5d4b2) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt CommandFormat#parse() should not parse option as value of option Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Blocker Fix For: 2.8.0 Attachments: HDFS-8205.01.patch, HDFS-8205.02.patch, HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} This blocks query quota by storage type and clear quota by storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516858#comment-14516858 ] Hudson commented on HDFS-8232: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2109 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2109/]) HDFS-8232. Missing datanode counters when using Metrics2 sink interface. Contributed by Anu Engineer. (cnauroth: rev feb68cb5470dc3e6c16b6bc1549141613e360601) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/FSDatasetMBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetricHelper.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeFSDataSetSink.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) CommandFormat#parse() should not parse option as value of option
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516879#comment-14516879 ] Hudson commented on HDFS-8205: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #168 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/168/]) HDFS-8205. CommandFormat#parse() should not parse option as value of option. (Contributed by Peter Shi and Xiaoyu Yao) (arp: rev 0d5b0143cc003e132ce454415e35d55d46311416) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestCount.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandFormat.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HDFS-8205. Fix CHANGES.txt (arp: rev 6bae5962cd70ac33fe599c50fb2a906830e5d4b2) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt CommandFormat#parse() should not parse option as value of option Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Blocker Fix For: 2.8.0 Attachments: HDFS-8205.01.patch, HDFS-8205.02.patch, HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} This blocks query quota by storage type and clear quota by storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516872#comment-14516872 ] Hudson commented on HDFS-8232: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #168 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/168/]) HDFS-8232. Missing datanode counters when using Metrics2 sink interface. Contributed by Anu Engineer. (cnauroth: rev feb68cb5470dc3e6c16b6bc1549141613e360601) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/FSDatasetMBean.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetricHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeFSDataSetSink.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) CommandFormat#parse() should not parse option as value of option
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516917#comment-14516917 ] Hudson commented on HDFS-8205: -- FAILURE: Integrated in Hadoop-Yarn-trunk #911 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/911/]) HDFS-8205. CommandFormat#parse() should not parse option as value of option. (Contributed by Peter Shi and Xiaoyu Yao) (arp: rev 0d5b0143cc003e132ce454415e35d55d46311416) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestCount.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandFormat.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HDFS-8205. Fix CHANGES.txt (arp: rev 6bae5962cd70ac33fe599c50fb2a906830e5d4b2) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt CommandFormat#parse() should not parse option as value of option Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Blocker Fix For: 2.8.0 Attachments: HDFS-8205.01.patch, HDFS-8205.02.patch, HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} This blocks query quota by storage type and clear quota by storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516910#comment-14516910 ] Hudson commented on HDFS-8232: -- FAILURE: Integrated in Hadoop-Yarn-trunk #911 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/911/]) HDFS-8232. Missing datanode counters when using Metrics2 sink interface. Contributed by Anu Engineer. (cnauroth: rev feb68cb5470dc3e6c16b6bc1549141613e360601) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetricHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeFSDataSetSink.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/FSDatasetMBean.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518401#comment-14518401 ] Andrew Wang commented on HDFS-7678: --- Thanks for the patch Zhe, some nice functionality here. Some review comments: Nits: * Extra imports in DFSStripedInputStream * Some lines longer than 80chars Rest: * I see us swallowing InterruptedException which is quite naughty, but a lot of other input stream code does the same. It's a code smell, we really should be cleaning up and rethrowing the exception. Think about it at least for this patch, and we should file a follow-on for trunk and the potentially the rest of the EC code. * waitNextCompletion, shouldn't the read timeout be an overall timeout, not a per-task timeout? Users at least want an overall timeout. * throwing InterruptedException on empty futures is semantically incorrect, why not return null? * waitNextCompletion and its usage seems kind of complicated. Let's think about simplifying it. * Do we actually need missingBlkIndices or the non-success cases? It's the set complement of fetchedBlkIndices. Can determine it after. * If we enforce the overall timeout in fetchBlockByteRange, we can do the futures cleanup there too. Pass the delta timeout down to waitNextCompletion. This feels better, since it links the timeout case with the timeout cleanup. Maybe another wrapper function to encapsulate this, since waitNextCompletion is used in two places. * Comments all over this logic would be good. * Is it possible to have a 0 rp.getReadLength() ? Precondition check this? * In general I would prefer to see Precondition checks rather than asserts, since asserts are disabled outside of tests * We always go through a function called fetchExtraBlks... even if we successfully got all the blocks we need the first time. No early exit? * Also seems like we have some code dupe between fetch and fetchExtra, let's think about breaking out some shared functions. I wonder if it'd be better to do all the fetching first (including parity if necessary), then pass it over to a decode function (if necessary). * found is not used Erasure coding: DFSInputStream with decode functionality Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, HDFS-7678.000.patch, HDFS-7678.001.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7995) Implement chmod in the HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518419#comment-14518419 ] Hadoop QA commented on HDFS-7995: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 0m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | release audit | 0m 14s | The applied patch does not increase the total number of release audit warnings. | | | | 0m 20s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728954/HDFS-7995.02.patch | | Optional Tests | | | git revision | trunk / 5190923 | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10442/console | This message was automatically generated. Implement chmod in the HDFS Web UI -- Key: HDFS-7995 URL: https://issues.apache.org/jira/browse/HDFS-7995 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7995.01.patch, HDFS-7995.02.patch We should let users change the permissions of files and directories using the HDFS Web UI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518424#comment-14518424 ] Hadoop QA commented on HDFS-8269: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 55s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 4m 2s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 12s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 19s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 163m 53s | Tests failed in hadoop-hdfs. | | | | 209m 18s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestFileCreationClient | | | hadoop.hdfs.server.namenode.TestFsck | | Timed out tests | org.apache.hadoop.hdfs.server.mover.TestMover | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728915/HDFS-8269.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bc1bd7e | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10434/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10434/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10434/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10434/console | This message was automatically generated. getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, HDFS-8269.002.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode
[ https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7847: --- Attachment: HDFS-7847.004.patch .004 is rebased onto trunk. Modify NNThroughputBenchmark to be able to operate on a remote NameNode --- Key: HDFS-7847 URL: https://issues.apache.org/jira/browse/HDFS-7847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7836 Reporter: Colin Patrick McCabe Assignee: Charles Lamb Fix For: HDFS-7836 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, make_blocks.tar.gz Modify NNThroughputBenchmark to be able to operate on a NN that is not in process. A followon Jira will modify it some more to allow quantifying native and java heap sizes, and some latency numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8273: - Summary: FSNamesystem#Delete() should not call logSync() when holding the lock (was: logSync() is called inside of write lock for delete op) FSNamesystem#Delete() should not call logSync() when holding the lock - Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518545#comment-14518545 ] Xinwei Qin commented on HDFS-8201: --- [~zhz] I have noticed that JIRA. I think your suggestion is very good. Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201.001.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8277) Safemode enter fails when Standby NameNode is down
Hari Sekhon created HDFS-8277: - Summary: Safemode enter fails when Standby NameNode is down Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517164#comment-14517164 ] Hudson commented on HDFS-8232: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/178/]) HDFS-8232. Missing datanode counters when using Metrics2 sink interface. Contributed by Anu Engineer. (cnauroth: rev feb68cb5470dc3e6c16b6bc1549141613e360601) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetricHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeFSDataSetSink.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/FSDatasetMBean.java Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8271) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled
[ https://issues.apache.org/jira/browse/HDFS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517276#comment-14517276 ] Steve Loughran commented on HDFS-8271: -- Nate -why not create an Über-JIRA to cover the overall problem of Hadoop to support IPv6 .. all these things could be grouped underneath. HADOOP-11574 is mainly about recognise network problems and provide diagnostics, rather than actual IPv6 support NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled Key: HDFS-8271 URL: https://issues.apache.org/jira/browse/HDFS-8271 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 NameNode works properly on IPv4 or IPv6 single stack (assuming in the latter case that scripts have been changed to disable preferIPv4Stack, and dependent on the client/data node fix in HDFS-8078). On dual-stack machines, NameNode listens only on IPv4 (even ignoring preferIPv6Addresses being set.) Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support binding to both the IPv4 and IPv6 machine addresses so that we can support heterogenous clusters (some dual-stack and some IPv6-only machines.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) CommandFormat#parse() should not parse option as value of option
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517239#comment-14517239 ] Hudson commented on HDFS-8205: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2127/]) HDFS-8205. CommandFormat#parse() should not parse option as value of option. (Contributed by Peter Shi and Xiaoyu Yao) (arp: rev 0d5b0143cc003e132ce454415e35d55d46311416) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandFormat.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestCount.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml HDFS-8205. Fix CHANGES.txt (arp: rev 6bae5962cd70ac33fe599c50fb2a906830e5d4b2) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt CommandFormat#parse() should not parse option as value of option Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Blocker Fix For: 2.8.0 Attachments: HDFS-8205.01.patch, HDFS-8205.02.patch, HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} This blocks query quota by storage type and clear quota by storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517232#comment-14517232 ] Hudson commented on HDFS-8232: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2127/]) HDFS-8232. Missing datanode counters when using Metrics2 sink interface. Contributed by Anu Engineer. (cnauroth: rev feb68cb5470dc3e6c16b6bc1549141613e360601) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/FSDatasetMBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeFSDataSetSink.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetricHelper.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) CommandFormat#parse() should not parse option as value of option
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517171#comment-14517171 ] Hudson commented on HDFS-8205: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/178/]) HDFS-8205. CommandFormat#parse() should not parse option as value of option. (Contributed by Peter Shi and Xiaoyu Yao) (arp: rev 0d5b0143cc003e132ce454415e35d55d46311416) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestCount.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandFormat.java HDFS-8205. Fix CHANGES.txt (arp: rev 6bae5962cd70ac33fe599c50fb2a906830e5d4b2) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt CommandFormat#parse() should not parse option as value of option Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Blocker Fix For: 2.8.0 Attachments: HDFS-8205.01.patch, HDFS-8205.02.patch, HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} This blocks query quota by storage type and clear quota by storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-8246 started by feng xu. - Get HDFS file name based on block pool id and block id -- Key: HDFS-8246 URL: https://issues.apache.org/jira/browse/HDFS-8246 Project: Hadoop HDFS Issue Type: New Feature Components: HDFS, hdfs-client, namenode Reporter: feng xu Assignee: feng xu Attachments: HDFS-8246.0.patch This feature provides HDFS shell command and C/Java API to retrieve HDFS file name based on block pool id and block id. 1. The Java API in class DistributedFileSystem public String getFileName(String poolId, long blockId) throws IOException 2. The C API in hdfs.c char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) 3. The HDFS shell command hdfs dfs [generic options] -fn poolId blockId This feature is useful if you have HDFS block file name in local file system and want to find out the related HDFS file name in HDFS name space (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). Each HDFS block file name in local file system contains both block pool id and block id, for sample HDFS block file name /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id is 1073741825. The block pool id is uniquely related to a HDFS name node/name space, and the block id is uniquely related to a HDFS file within a HDFS name node/name space, so the combination of block pool id and a block id is uniquely related a HDFS file name. The shell command and C/Java API do not map the block pool id to name node, so it’s user’s responsibility to talk to the correct name node in federation environment that has multiple name nodes. The block pool id is used by name node to check if the user is talking with the correct name node. The implementation is straightforward. The client request to get HDFS file name reaches the new method String getFileName(String poolId, long blockId) in FSNamesystem in name node through RPC, and the new method does the followings, (1) Validate the block pool id. (2) Create Block based on the block id. (3) Get BlockInfoContiguous from Block. (4) Get BlockCollection from BlockInfoContiguous. (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8248) Store INodeId instead of the INodeFile object in BlockInfoContiguous
[ https://issues.apache.org/jira/browse/HDFS-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518615#comment-14518615 ] Hadoop QA commented on HDFS-8248: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 31s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 54s | The applied patch generated 4 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 7s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 21s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 165m 2s | Tests passed in hadoop-hdfs. | | | | 213m 29s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728950/HDFS-8248.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5190923 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10441/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10441/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10441/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10441/console | This message was automatically generated. Store INodeId instead of the INodeFile object in BlockInfoContiguous Key: HDFS-8248 URL: https://issues.apache.org/jira/browse/HDFS-8248 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8248.000.patch, HDFS-8248.001.patch, HDFS-8248.002.patch, HDFS-8248.003.patch Currently the namespace and the block manager are tightly coupled together. There are two couplings in terms of implementation: 1. The {{BlockInfoContiguous}} stores a reference of the {{INodeFile}} that owns the block, so that the block manager can look up the corresponding file when replicating blocks, recovering from pipeline failures, etc. 1. The {{INodeFile}} stores {{BlockInfoContiguous}} objects that the file owns. Decoupling the namespace and the block manager allows the BM to be separated out from the Java heap or even as a standalone process. This jira proposes to remove the first coupling by storing the id of the inode instead of the object reference of {{INodeFile}} in the {{BlockInfoContiguous}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7281) Missing block is marked as corrupted block
[ https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7281: -- Release Note: The patch improves the reporting around missing blocks and corrupted blocks. 1. A block is missing if and only if all DNs of its expected replicas are dead. 2. A block is corrupted if and only if all its available replicas are corrupted. So if a block has 3 replicas; one of the DN is dead, the other two replicas are corrupted; it will be marked as corrupted. 3. A new line is added to fsck output to display the corrupt block size per file. 4. A new line is added to fsck output to display the number of missing blocks in the summary section. Missing block is marked as corrupted block -- Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Labels: supportability Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, HDFS-7281.patch In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8283) DataStreamer cleanup and some minor improvement
[ https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518736#comment-14518736 ] Hadoop QA commented on HDFS-8283: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 58s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 4m 3s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 47s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 23m 40s | Tests passed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 164m 29s | Tests failed in hadoop-hdfs. | | | | 231m 58s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestAppendSnapshotTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728994/h8283_20150428.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c79e7f7 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10445/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10445/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10445/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10445/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10445/console | This message was automatically generated. DataStreamer cleanup and some minor improvement --- Key: HDFS-8283 URL: https://issues.apache.org/jira/browse/HDFS-8283 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8283_20150428.patch - When throwing an exception -* always set lastException -* always creating a new exception so that it has the new stack trace - Add LOG. - Add final to isAppend and favoredNodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518630#comment-14518630 ] Yi Liu commented on HDFS-7348: -- Thanks [~zhz] for the review! The comments are helpful. {quote} A the DN level I don't think we need to care about cellSize? Since we always recover entire blocks, the client-side logic taking care of cells can be simplified here. {quote} Yes, DN don't need to care about cellSize, here actually we just use it as a read buffer size and it divide {{bytesPerChecksum}}, so it's a bit convenient for crc calculation. {quote} Since recovering multiple missing blocks at once is a pretty rare case, should we just reconstruct all missing blocks and use DataNode#DataTransfer to push them out? Should we save a copy of reconstructed block locally? More space will be used; but it will avoid re-decoding if push fails. {quote} Good question and discussion. The best way to avoid re-decoding if push fails is to check the packet ack before we discard the decoded result and start next round decoding. Save a copy locally will increase DataNode burden (i.e, affect performance, disk space/management, calculate crc multiple time and so on) and increase management, if we don't check the packet ack, we can't know whether the recovered block is transfer correctly, if we choose to check the packet ack, we don't need to save it locally. As I described in the design above or comments inline code, currently we don't check the packet ack which is similar as continuous block replication, but EC recovery is more expensive, we could consider to check the packet ack in further improvement. I can do it (check packet ack) in separate JIRA, maybe in phase 2, of course we can discuss more here. {quote} I filed HDFS-8282 to move StripedReadResult and waitNextCompletion to StripedBlockUtil. {quote} That's good, I will review that JIRA after it's ready. {quote} In foreground recovery we read in parallel to minimize latency. It's an interesting design question whether we should we do the same in background recovery. More discussions are needed here. {quote} We can discuss more for this point here. I think it's OK and don't see bad side, if we don't recovery it as soon as possible in DN, the client also need to do on-line read recovery which may cause more network IO (multiple client). {quote} Another option is to read entire blocks and then decode {quote} It's big issue for memory, especially there may be multiple stripe block recovery at the same time. I think we should not do this On the other hand, the fast way to decode is use native code and utilize CPU instruction as we planed in the design, I have experience when writing native decryption code for HDFS encryption at rest feature, we also have a buffer (default 64KB) to invoke JNI. {quote} Maybe we can move getBlock to StripedBlockUtil too; it's a useful util to only parse the Block. If it sounds good to you I'll move it in HDFS-8282. {quote} It's good for me if you move it in HDFS-8282, I think we also need to use it in future :) I will fix the {{ArrayList}} initialization in next patch. Erasure Coding: striped block recovery -- Key: HDFS-7348 URL: https://issues.apache.org/jira/browse/HDFS-7348 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Yi Liu Attachments: ECWorker.java, HDFS-7348.001.patch This JIRA is to recover one or more missed striped block in the striped block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518628#comment-14518628 ] Walter Su commented on HDFS-7980: - 004 patch fixes failed test. By the way: 1. If a block exists in Full BR, not in IBR. It should be processed by {{processFullFullBlockReport(..)}}. ( This is how it was treated since long ago) 2. If a block exists in Full BR, also in IBR. It's ok to be processed by {{processFullFullBlockReport(..)}}. ( Because it's well handled by IBR processing logic) 3. If a block is not in Full BR, but in IBR. It's created and deleted very soon. The block doesn't relate to this issue. I think the test included in the patch is ok. If you have any idea adding more tests, please let me know. Thanks. Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch, HDFS-7980.004.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-8277: --- Issue Type: Improvement (was: Bug) Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Improvement Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: surendra singh lilhore Priority: Minor HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518648#comment-14518648 ] Yi Liu commented on HDFS-3107: -- Hi Neeta, please checkout the latest trunk or if you get Hadoop 2.7 version, you will find the truncate API exposed via FileSystem there. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Fix For: 2.7.0 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518671#comment-14518671 ] Zhe Zhang commented on HDFS-8272: - Thanks Jing for updating the patch! {{blockSeekTo}} looks good to me now. I also agree we should get rid of retry in {{readWithStrategy}}. The retry logic in {{DFSInputStream#readBuffer}} will try connecting to the same node: {code} /* possibly retry the same node so that transient errors don't * result in application level failures (e.g. Datanode could have * closed the connection because the client is idle for too long). */ sourceFound = seekToBlockSource(pos); {code} I guess we should keep this part? Erasure Coding: simplify the retry logic in DFSStripedInputStream - Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] surendra singh lilhore reassigned HDFS-8277: Assignee: surendra singh lilhore Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Assignee: surendra singh lilhore Priority: Minor HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518688#comment-14518688 ] surendra singh lilhore commented on HDFS-8277: -- I would like to work on this. I will update the status soon Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Priority: Minor HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8240) During hdfs client is writing small file, missing block showed in namenode web when ha switch
[ https://issues.apache.org/jira/browse/HDFS-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518687#comment-14518687 ] Ricky Yang commented on HDFS-8240: -- Allen, This scenario appear many times ,why do you removed the Fix Version '2.4.0' ? During hdfs client is writing small file, missing block showed in namenode web when ha switch - Key: HDFS-8240 URL: https://issues.apache.org/jira/browse/HDFS-8240 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Environment: Linux 2.6.32-279.el6.x86_64 Reporter: Ricky Yang Assignee: Ricky Yang Attachments: HDFS-8240.txt Original Estimate: 216h Remaining Estimate: 216h Description: During writing small file to hdfs , when active namenode be killed and standby namenode was changed to active namenode ,the namenode web showed that many missing block exits in hdfs cluster.Unfortunately , faild to read the file with missing block in hdfs shell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8275) Erasure Coding: Implement batched listing of enrasure coding zones
[ https://issues.apache.org/jira/browse/HDFS-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu resolved HDFS-8275. -- Resolution: Duplicate Hi Rakesh, this JIRA is duplicated with HDFS-8087. Erasure Coding: Implement batched listing of enrasure coding zones -- Key: HDFS-8275 URL: https://issues.apache.org/jira/browse/HDFS-8275 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to provide batch API in {{DistributedFileSystem}} to list the {{ECZoneInfo}}. API signature:- {code} /** * List all ErasureCoding zones. Incrementally fetches results from the server. */ public RemoteIteratorECZoneInfo listErasureCodingZones() throws IOException { return dfs.listErasureCodingZones(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518741#comment-14518741 ] Neeta Garimella commented on HDFS-3107: --- Thanks Yi. I will get the latest. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Fix For: 2.7.0 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7281) Missing block is marked as corrupted block
[ https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7281: -- Attachment: HDFS-7281-5.patch [~yzhangal], good catch. I have updated the patch and the release notes. Missing block is marked as corrupted block -- Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Labels: supportability Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, HDFS-7281-5.patch, HDFS-7281.patch In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
[ https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518706#comment-14518706 ] Hadoop QA commented on HDFS-7758: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 21 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 55s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 4s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 165m 33s | Tests passed in hadoop-hdfs. | | | | 210m 3s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728992/HDFS-7758.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5190923 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10443/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10443/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10443/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10443/console | This message was automatically generated. Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead - Key: HDFS-7758 URL: https://issues.apache.org/jira/browse/HDFS-7758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, HDFS-7758.005.patch, HDFS-7758.006.patch HDFS-7496 introduced reference-counting the volume instances being used to prevent race condition when hot swapping a volume. However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance without increasing its reference count. In this JIRA, we retire the {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer of {{FsVolume}} always has correct reference count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8275) Erasure Coding: Implement batched listing of enrasure coding zones
[ https://issues.apache.org/jira/browse/HDFS-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518723#comment-14518723 ] Rakesh R commented on HDFS-8275: OK, thanks [~hitliuyi] for pointing out this. Since I have done some background study, I'm happy to take up this task HDFS-8087. Can you assign it to me if you have not yet started:) Erasure Coding: Implement batched listing of enrasure coding zones -- Key: HDFS-8275 URL: https://issues.apache.org/jira/browse/HDFS-8275 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to provide batch API in {{DistributedFileSystem}} to list the {{ECZoneInfo}}. API signature:- {code} /** * List all ErasureCoding zones. Incrementally fetches results from the server. */ public RemoteIteratorECZoneInfo listErasureCodingZones() throws IOException { return dfs.listErasureCodingZones(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feng xu updated HDFS-8246: -- Flags: (was: Patch) Get HDFS file name based on block pool id and block id -- Key: HDFS-8246 URL: https://issues.apache.org/jira/browse/HDFS-8246 Project: Hadoop HDFS Issue Type: New Feature Components: HDFS, hdfs-client, namenode Reporter: feng xu Assignee: feng xu Attachments: HDFS-8246.0.patch This feature provides HDFS shell command and C/Java API to retrieve HDFS file name based on block pool id and block id. 1. The Java API in class DistributedFileSystem public String getFileName(String poolId, long blockId) throws IOException 2. The C API in hdfs.c char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) 3. The HDFS shell command hdfs dfs [generic options] -fn poolId blockId This feature is useful if you have HDFS block file name in local file system and want to find out the related HDFS file name in HDFS name space (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). Each HDFS block file name in local file system contains both block pool id and block id, for sample HDFS block file name /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id is 1073741825. The block pool id is uniquely related to a HDFS name node/name space, and the block id is uniquely related to a HDFS file within a HDFS name node/name space, so the combination of block pool id and a block id is uniquely related a HDFS file name. The shell command and C/Java API do not map the block pool id to name node, so it’s user’s responsibility to talk to the correct name node in federation environment that has multiple name nodes. The block pool id is used by name node to check if the user is talking with the correct name node. The implementation is straightforward. The client request to get HDFS file name reaches the new method String getFileName(String poolId, long blockId) in FSNamesystem in name node through RPC, and the new method does the followings, (1) Validate the block pool id. (2) Create Block based on the block id. (3) Get BlockInfoContiguous from Block. (4) Get BlockCollection from BlockInfoContiguous. (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feng xu updated HDFS-8246: -- Flags: Patch Get HDFS file name based on block pool id and block id -- Key: HDFS-8246 URL: https://issues.apache.org/jira/browse/HDFS-8246 Project: Hadoop HDFS Issue Type: New Feature Components: HDFS, hdfs-client, namenode Reporter: feng xu Assignee: feng xu Attachments: HDFS-8246.0.patch This feature provides HDFS shell command and C/Java API to retrieve HDFS file name based on block pool id and block id. 1. The Java API in class DistributedFileSystem public String getFileName(String poolId, long blockId) throws IOException 2. The C API in hdfs.c char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) 3. The HDFS shell command hdfs dfs [generic options] -fn poolId blockId This feature is useful if you have HDFS block file name in local file system and want to find out the related HDFS file name in HDFS name space (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). Each HDFS block file name in local file system contains both block pool id and block id, for sample HDFS block file name /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id is 1073741825. The block pool id is uniquely related to a HDFS name node/name space, and the block id is uniquely related to a HDFS file within a HDFS name node/name space, so the combination of block pool id and a block id is uniquely related a HDFS file name. The shell command and C/Java API do not map the block pool id to name node, so it’s user’s responsibility to talk to the correct name node in federation environment that has multiple name nodes. The block pool id is used by name node to check if the user is talking with the correct name node. The implementation is straightforward. The client request to get HDFS file name reaches the new method String getFileName(String poolId, long blockId) in FSNamesystem in name node through RPC, and the new method does the followings, (1) Validate the block pool id. (2) Create Block based on the block id. (3) Get BlockInfoContiguous from Block. (4) Get BlockCollection from BlockInfoContiguous. (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated HDFS-8277: -- Priority: Minor (was: Major) Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon Priority: Minor HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517333#comment-14517333 ] Hari Sekhon commented on HDFS-8277: --- Ah, I have both back up now so this command works regardless, won't be a great test. Perhaps this should be labelled improvement instead of bug since other hdfs commands do auto-failover for HA setups. Safemode enter fails when Standby NameNode is down -- Key: HDFS-8277 URL: https://issues.apache.org/jira/browse/HDFS-8277 Project: Hadoop HDFS Issue Type: Bug Components: ha, HDFS, namenode Affects Versions: 2.6.0 Environment: HDP 2.2.0 Reporter: Hari Sekhon HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). {code}hdfs dfsadmin -safemode enter safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. After I re-bootstrapped the Standby NN to fix it the command worked as expected again. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang reassigned HDFS-8246: - Assignee: Andrew Wang (was: feng xu) Get HDFS file name based on block pool id and block id -- Key: HDFS-8246 URL: https://issues.apache.org/jira/browse/HDFS-8246 Project: Hadoop HDFS Issue Type: New Feature Components: HDFS, hdfs-client, namenode Reporter: feng xu Assignee: Andrew Wang Attachments: HDFS-8246.0.patch This feature provides HDFS shell command and C/Java API to retrieve HDFS file name based on block pool id and block id. 1. The Java API in class DistributedFileSystem public String getFileName(String poolId, long blockId) throws IOException 2. The C API in hdfs.c char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) 3. The HDFS shell command hdfs dfs [generic options] -fn poolId blockId This feature is useful if you have HDFS block file name in local file system and want to find out the related HDFS file name in HDFS name space (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). Each HDFS block file name in local file system contains both block pool id and block id, for sample HDFS block file name /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id is 1073741825. The block pool id is uniquely related to a HDFS name node/name space, and the block id is uniquely related to a HDFS file within a HDFS name node/name space, so the combination of block pool id and a block id is uniquely related a HDFS file name. The shell command and C/Java API do not map the block pool id to name node, so it’s user’s responsibility to talk to the correct name node in federation environment that has multiple name nodes. The block pool id is used by name node to check if the user is talking with the correct name node. The implementation is straightforward. The client request to get HDFS file name reaches the new method String getFileName(String poolId, long blockId) in FSNamesystem in name node through RPC, and the new method does the followings, (1) Validate the block pool id. (2) Create Block based on the block id. (3) Get BlockInfoContiguous from Block. (4) Get BlockCollection from BlockInfoContiguous. (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8246: -- Status: Patch Available (was: In Progress) Get HDFS file name based on block pool id and block id -- Key: HDFS-8246 URL: https://issues.apache.org/jira/browse/HDFS-8246 Project: Hadoop HDFS Issue Type: New Feature Components: HDFS, hdfs-client, namenode Reporter: feng xu Assignee: Andrew Wang Attachments: HDFS-8246.0.patch This feature provides HDFS shell command and C/Java API to retrieve HDFS file name based on block pool id and block id. 1. The Java API in class DistributedFileSystem public String getFileName(String poolId, long blockId) throws IOException 2. The C API in hdfs.c char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) 3. The HDFS shell command hdfs dfs [generic options] -fn poolId blockId This feature is useful if you have HDFS block file name in local file system and want to find out the related HDFS file name in HDFS name space (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). Each HDFS block file name in local file system contains both block pool id and block id, for sample HDFS block file name /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id is 1073741825. The block pool id is uniquely related to a HDFS name node/name space, and the block id is uniquely related to a HDFS file within a HDFS name node/name space, so the combination of block pool id and a block id is uniquely related a HDFS file name. The shell command and C/Java API do not map the block pool id to name node, so it’s user’s responsibility to talk to the correct name node in federation environment that has multiple name nodes. The block pool id is used by name node to check if the user is talking with the correct name node. The implementation is straightforward. The client request to get HDFS file name reaches the new method String getFileName(String poolId, long blockId) in FSNamesystem in name node through RPC, and the new method does the followings, (1) Validate the block pool id. (2) Create Block based on the block id. (3) Get BlockInfoContiguous from Block. (4) Get BlockCollection from BlockInfoContiguous. (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)