[jira] [Commented] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388080#comment-14388080 ] zhouyingchao commented on HDFS-7999: Thank you for looking into the patch. Here is some explain of the logic of createTemporary() after applying the patch: 1. If there is no ReplicaInfo in volumeMap for the passed in ExtendedBlock b, then we will create one, insert into volumeMap and then return from line 1443. 2. If there is a ReplicaInfo in volumeMap and its GS is newer than the passed in ExtendedBlock b, then throw the ReplicaAlreadyExistsException from line 1447. 3. If there is a ReplicaInfo in volumeMap whereas its GS is older than the passed in ExtendedBlock b, then it means this is a new write and the earlier writer should be stopped. We will release the FsDatasetImpl lock and try to stop the earlier writer w/o the lock. 4. After the earlier writer is stopped, we need to evict earlier writer's ReplicaInfo from volumeMap, to that end we will re-acquire the FsDatasetImpl lock. However, since this thread has released the FsDatasetImpl lock when it tried to stop earlier writer, another thread might have come in and changed the ReplicaInfo of this block in VolumeMap. This situation is not very likely to happen whereas we have to handle it in case. The loop in the patch is just tried to handle this situation -- after re-acuire the FsDatasetImpl lock, it will check if the current ReplicaInfo in volumeMap is still the one before we stop the writer, if so we can simply evict it and create/insert a new one then return from line 1443. Otherwise, it implies another thread has slipped in and changed the ReplicaInfo when we were stopping earlier writer. In this condition, we check if that thread has inserted a block with even newer GS than us, if so we throws ReplicaAlreadyExistsException from line 1447. Otherwise we need to stop that thread's write just like we stop the earlier writer in step 3. FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at
[jira] [Commented] (HDFS-7889) Subclass DFSOutputStream to support writing striping layout files
[ https://issues.apache.org/jira/browse/HDFS-7889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388090#comment-14388090 ] Li Bo commented on HDFS-7889: - hi, Zhe {{stripedBlocks[i]}} is an instance of {{BlockingQueue}}, not {{LocatedBlock}} and I can not see any code that will add a non-LocatedBlock object to this queue. Is it necessary to check the type of element each time retrieved from the queue? Subclass DFSOutputStream to support writing striping layout files - Key: HDFS-7889 URL: https://issues.apache.org/jira/browse/HDFS-7889 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7889-001.patch, HDFS-7889-002.patch, HDFS-7889-003.patch, HDFS-7889-004.patch, HDFS-7889-005.patch After HDFS-7888, we can subclass {{DFSOutputStream}} to support writing striping layout files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7954) TestBalancer#testBalancerWithPinnedBlocks failed on Windows
[ https://issues.apache.org/jira/browse/HDFS-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7954: - Assignee: Xiaoyu Yao Status: Patch Available (was: Open) TestBalancer#testBalancerWithPinnedBlocks failed on Windows --- Key: HDFS-7954 URL: https://issues.apache.org/jira/browse/HDFS-7954 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7947.00.patch {code} testBalancerWithPinnedBlocks(org.apache.hadoop.hdfs.server.balancer.TestBalancer) Time elapsed: 22.624 sec FAILURE! java.lang.AssertionError: expected:-3 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithPinnedBlocks(TestBalancer.java:353) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7954) TestBalancer#testBalancerWithPinnedBlocks failed on Windows
[ https://issues.apache.org/jira/browse/HDFS-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7954: - Attachment: HDFS-7947.00.patch Post a patch to skip this test on Windows. TestBalancer#testBalancerWithPinnedBlocks failed on Windows --- Key: HDFS-7954 URL: https://issues.apache.org/jira/browse/HDFS-7954 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Xiaoyu Yao Attachments: HDFS-7947.00.patch {code} testBalancerWithPinnedBlocks(org.apache.hadoop.hdfs.server.balancer.TestBalancer) Time elapsed: 22.624 sec FAILURE! java.lang.AssertionError: expected:-3 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithPinnedBlocks(TestBalancer.java:353) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8026) Trace FSOutputSummer#writeChecksumChunks rather than DFSOutputStream#writeChunk
[ https://issues.apache.org/jira/browse/HDFS-8026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388100#comment-14388100 ] Hadoop QA commented on HDFS-8026: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708293/HDFS-8026.001.patch against trunk revision 1a495fb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.tracing.TestTracing Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10122//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10122//console This message is automatically generated. Trace FSOutputSummer#writeChecksumChunks rather than DFSOutputStream#writeChunk --- Key: HDFS-8026 URL: https://issues.apache.org/jira/browse/HDFS-8026 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-8026.001.patch We should trace FSOutputSummer#writeChecksumChunks rather than DFSOutputStream#writeChunk. When tracing writeChunk, we get a new trace span every 512 bytes; when tracing writeChecksumChunks, we normally get a new trace span only when the FSOutputSummer buffer is full (9x less often.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7811) Avoid recursive call getStoragePolicyID in INodeFile#computeQuotaUsage
[ https://issues.apache.org/jira/browse/HDFS-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388160#comment-14388160 ] Xiaoyu Yao commented on HDFS-7811: -- I can't find an easy way to unit tests the recursive call does not happen without adding test hooks in production code. Avoid recursive call getStoragePolicyID in INodeFile#computeQuotaUsage -- Key: HDFS-7811 URL: https://issues.apache.org/jira/browse/HDFS-7811 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7811.00.patch, HDFS-7811.01.patch This is a follow up based on comment from [~jingzhao] on HDFS-7723. I just noticed that INodeFile#computeQuotaUsage calls getStoragePolicyID to identify the storage policy id of the file. This may not be very efficient (especially when we're computing the quota usage of a directory) because getStoragePolicyID may recursively check the ancestral INode's storage policy. I think here an improvement can be passing the lowest parent directory's storage policy down while traversing the tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7922) ShortCircuitCache#close is not releasing ScheduledThreadPoolExecutors
[ https://issues.apache.org/jira/browse/HDFS-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388203#comment-14388203 ] Hadoop QA commented on HDFS-7922: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708313/004-HDFS-7922.patch against trunk revision cce66ba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestAuditLogs The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10123//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10123//console This message is automatically generated. ShortCircuitCache#close is not releasing ScheduledThreadPoolExecutors - Key: HDFS-7922 URL: https://issues.apache.org/jira/browse/HDFS-7922 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Attachments: 001-HDFS-7922.patch, 002-HDFS-7922.patch, 003-HDFS-7922.patch, 004-HDFS-7922.patch ShortCircuitCache has the following executors. It would be good to shutdown these pools during ShortCircuitCache#close to avoid leaks. {code} /** * The executor service that runs the cacheCleaner. */ private final ScheduledThreadPoolExecutor cleanerExecutor = new ScheduledThreadPoolExecutor(1, new ThreadFactoryBuilder(). setDaemon(true).setNameFormat(ShortCircuitCache_Cleaner). build()); /** * The executor service that runs the cacheCleaner. */ private final ScheduledThreadPoolExecutor releaserExecutor = new ScheduledThreadPoolExecutor(1, new ThreadFactoryBuilder(). setDaemon(true).setNameFormat(ShortCircuitCache_SlotReleaser). build()); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5019) Cleanup imports in HDFS project
[ https://issues.apache.org/jira/browse/HDFS-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388238#comment-14388238 ] Tsuyoshi Ozawa commented on HDFS-5019: -- Hi [~djp], thank you for updating. Could you rebase the patch? Cleanup imports in HDFS project --- Key: HDFS-5019 URL: https://issues.apache.org/jira/browse/HDFS-5019 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Junping Du Assignee: Junping Du Priority: Minor Attachments: HDFS-5019-v2.patch, HDFS-5019.patch There are some unused imported packages in current code base which cause some unnecessary java warnings. Also, the sequence of imports should follow alphabet and import x.x.* is not recommended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7939) Two fsimage_rollback_* files are created which are not deleted after rollback.
[ https://issues.apache.org/jira/browse/HDFS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-7939: - Status: Patch Available (was: Open) Two fsimage_rollback_* files are created which are not deleted after rollback. -- Key: HDFS-7939 URL: https://issues.apache.org/jira/browse/HDFS-7939 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical Attachments: HDFS-7939.1.patch During checkpoint , if any failure in uploading to the remote Namenode then restarting Namenode with rollingUpgrade started option creates 2 fsimage_rollback_* at Active Namenode . On rolling upgrade rollback , initially created fsimage_rollback_* file is not been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388273#comment-14388273 ] Xinwei Qin commented on HDFS-7999: --- Yeah, It's a good and necessary idea to avoid holding the lock for a long time by the createTemporary() method. FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7933) fsck should also report decommissioning replicas.
[ https://issues.apache.org/jira/browse/HDFS-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7933: - Attachment: (was: HDFS-7933.02.patch) fsck should also report decommissioning replicas. -- Key: HDFS-7933 URL: https://issues.apache.org/jira/browse/HDFS-7933 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Jitendra Nath Pandey Assignee: Xiaoyu Yao Attachments: HDFS-7933.00.patch, HDFS-7933.01.patch Fsck doesn't count replicas that are on decommissioning nodes. If a block has all replicas on the decommissioning nodes, it will be marked as missing, which is alarming for the admins, although the system will replicate them before nodes are decommissioned. Fsck output should also show decommissioning replicas along with the live replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7933) fsck should also report decommissioning replicas.
[ https://issues.apache.org/jira/browse/HDFS-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7933: - Attachment: HDFS-7933.02.patch Thanks [~jnp] for reviewing the patch. I've updated the patch based on your feedback. Summary of changes: 1) adding NumberReplicas#decommissioned and NumberReplicas#decommissioning to track the decommissioned and decommissioning replicas, respectively. 2) deprecating NumberReplicas#decommissionedReplicas() by NumberReplicas#decommissionedAndDecommissioning() to avoid the misleading name. 3) Display decommissioning and decommissioned replica separately in NamenodeFsck#check(). fsck should also report decommissioning replicas. -- Key: HDFS-7933 URL: https://issues.apache.org/jira/browse/HDFS-7933 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Jitendra Nath Pandey Assignee: Xiaoyu Yao Attachments: HDFS-7933.00.patch, HDFS-7933.01.patch, HDFS-7933.02.patch Fsck doesn't count replicas that are on decommissioning nodes. If a block has all replicas on the decommissioning nodes, it will be marked as missing, which is alarming for the admins, although the system will replicate them before nodes are decommissioned. Fsck output should also show decommissioning replicas along with the live replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7933) fsck should also report decommissioning replicas.
[ https://issues.apache.org/jira/browse/HDFS-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7933: - Attachment: HDFS-7933.02.patch fsck should also report decommissioning replicas. -- Key: HDFS-7933 URL: https://issues.apache.org/jira/browse/HDFS-7933 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Jitendra Nath Pandey Assignee: Xiaoyu Yao Attachments: HDFS-7933.00.patch, HDFS-7933.01.patch Fsck doesn't count replicas that are on decommissioning nodes. If a block has all replicas on the decommissioning nodes, it will be marked as missing, which is alarming for the admins, although the system will replicate them before nodes are decommissioned. Fsck output should also show decommissioning replicas along with the live replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7933) fsck should also report decommissioning replicas.
[ https://issues.apache.org/jira/browse/HDFS-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7933: - Attachment: (was: HDFS-7933.02.patch) fsck should also report decommissioning replicas. -- Key: HDFS-7933 URL: https://issues.apache.org/jira/browse/HDFS-7933 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Jitendra Nath Pandey Assignee: Xiaoyu Yao Attachments: HDFS-7933.00.patch, HDFS-7933.01.patch Fsck doesn't count replicas that are on decommissioning nodes. If a block has all replicas on the decommissioning nodes, it will be marked as missing, which is alarming for the admins, although the system will replicate them before nodes are decommissioned. Fsck output should also show decommissioning replicas along with the live replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7933) fsck should also report decommissioning replicas.
[ https://issues.apache.org/jira/browse/HDFS-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7933: - Attachment: HDFS-7933.02.patch fsck should also report decommissioning replicas. -- Key: HDFS-7933 URL: https://issues.apache.org/jira/browse/HDFS-7933 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Jitendra Nath Pandey Assignee: Xiaoyu Yao Attachments: HDFS-7933.00.patch, HDFS-7933.01.patch, HDFS-7933.02.patch Fsck doesn't count replicas that are on decommissioning nodes. If a block has all replicas on the decommissioning nodes, it will be marked as missing, which is alarming for the admins, although the system will replicate them before nodes are decommissioned. Fsck output should also show decommissioning replicas along with the live replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7701) Support reporting per storage type quota and usage with hadoop/hdfs shell
[ https://issues.apache.org/jira/browse/HDFS-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388235#comment-14388235 ] Peter Shi commented on HDFS-7701: - Thanks for giving such detailed suggestion, i will upload the fixed patth ASAP. Support reporting per storage type quota and usage with hadoop/hdfs shell - Key: HDFS-7701 URL: https://issues.apache.org/jira/browse/HDFS-7701 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Xiaoyu Yao Assignee: Peter Shi Attachments: HDFS-7701.01.patch, HDFS-7701.02.patch, HDFS-7701.03.patch hadoop fs -count -q or hdfs dfs -count -q currently shows name space/disk space quota and remaining quota information. With HDFS-7584, we want to display per storage type quota and its remaining information as well. The current output format as shown below may not easily accomodate 6 more columns = 3 (existing storage types) * 2 (quota/remaining quota). With new storage types added in future, this will make the output even more crowded. There are also compatibility issues as we don't want to break any existing scripts monitoring hadoop fs -count -q output. $ hadoop fs -count -q -v /test QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf 524288000 5242665691 15 21431 /test Propose to add -t parameter to display ONLY the storage type quota information of the directory in the separately. This way, existing scripts will work as-is without using -t parameter. 1) When -t is not followed by a specific storage type, quota and usage information for all storage types will be displayed. $ hadoop fs -count -q -t -h -v /test SSD_QUOTA REM_SSD_QUOTA DISK_QUOTA REM_DISK_QUOTA ARCHIVAL_QUOTA REM_ARCHIVAL_QUOTA PATHNAME 512MB 256MB none inf none inf/test 2) If -t is followed by a storage type, only the quota and remaining quota of the storage type is displayed. $ hadoop fs -count -q -t SSD -h -v /test SSD_QUOTA REM_SSD_QUOTA PATHNAME 512 MB 256 MB /test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7949) WebImageViewer need support file size calculation with striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7949 started by Rakesh R. -- WebImageViewer need support file size calculation with striped blocks - Key: HDFS-7949 URL: https://issues.apache.org/jira/browse/HDFS-7949 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Hui Zheng Assignee: Rakesh R Priority: Minor Attachments: HDFS-7949-001.patch The file size calculation should be changed when the blocks of the file are striped in WebImageViewer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7701) Support reporting per storage type quota and usage with hadoop/hdfs shell
[ https://issues.apache.org/jira/browse/HDFS-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Shi updated HDFS-7701: Attachment: HDFS-7701.04.patch Support reporting per storage type quota and usage with hadoop/hdfs shell - Key: HDFS-7701 URL: https://issues.apache.org/jira/browse/HDFS-7701 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Xiaoyu Yao Assignee: Peter Shi Attachments: HDFS-7701.01.patch, HDFS-7701.02.patch, HDFS-7701.03.patch, HDFS-7701.04.patch hadoop fs -count -q or hdfs dfs -count -q currently shows name space/disk space quota and remaining quota information. With HDFS-7584, we want to display per storage type quota and its remaining information as well. The current output format as shown below may not easily accomodate 6 more columns = 3 (existing storage types) * 2 (quota/remaining quota). With new storage types added in future, this will make the output even more crowded. There are also compatibility issues as we don't want to break any existing scripts monitoring hadoop fs -count -q output. $ hadoop fs -count -q -v /test QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf 524288000 5242665691 15 21431 /test Propose to add -t parameter to display ONLY the storage type quota information of the directory in the separately. This way, existing scripts will work as-is without using -t parameter. 1) When -t is not followed by a specific storage type, quota and usage information for all storage types will be displayed. $ hadoop fs -count -q -t -h -v /test SSD_QUOTA REM_SSD_QUOTA DISK_QUOTA REM_DISK_QUOTA ARCHIVAL_QUOTA REM_ARCHIVAL_QUOTA PATHNAME 512MB 256MB none inf none inf/test 2) If -t is followed by a storage type, only the quota and remaining quota of the storage type is displayed. $ hadoop fs -count -q -t SSD -h -v /test SSD_QUOTA REM_SSD_QUOTA PATHNAME 512 MB 256 MB /test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8012) Updatable HAR Filesystem
[ https://issues.apache.org/jira/browse/HDFS-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Madhan Sundararajan Devaki updated HDFS-8012: - Description: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. was: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files Updatable HAR Filesystem Key: HDFS-8012 URL: https://issues.apache.org/jira/browse/HDFS-8012 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Reporter: Madhan Sundararajan Devaki Priority: Critical Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7949) WebImageViewer need support file size calculation with striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-7949: --- Attachment: HDFS-7949-001.patch WebImageViewer need support file size calculation with striped blocks - Key: HDFS-7949 URL: https://issues.apache.org/jira/browse/HDFS-7949 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Hui Zheng Assignee: Rakesh R Priority: Minor Attachments: HDFS-7949-001.patch The file size calculation should be changed when the blocks of the file are striped in WebImageViewer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7716) Erasure Coding: extend BlockInfo to handle EC info
[ https://issues.apache.org/jira/browse/HDFS-7716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7716: Fix Version/s: HDFS-7285 Erasure Coding: extend BlockInfo to handle EC info -- Key: HDFS-7716 URL: https://issues.apache.org/jira/browse/HDFS-7716 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Fix For: HDFS-7285 Attachments: HDFS-7716.000.patch, HDFS-7716.001.patch, HDFS-7716.002.patch, HDFS-7716.003.patch The current BlockInfo's implementation only supports the replication mechanism. To use the same blocksMap handling block group and its data/parity blocks, we need to define a new BlockGroupInfo class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8012) Updatable HAR Filesystem
[ https://issues.apache.org/jira/browse/HDFS-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Madhan Sundararajan Devaki updated HDFS-8012: - Description: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files was:Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? Updatable HAR Filesystem Key: HDFS-8012 URL: https://issues.apache.org/jira/browse/HDFS-8012 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Reporter: Madhan Sundararajan Devaki Priority: Critical Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8012) Updatable HAR Filesystem
[ https://issues.apache.org/jira/browse/HDFS-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Madhan Sundararajan Devaki updated HDFS-8012: - Issue Type: Improvement (was: Bug) Updatable HAR Filesystem Key: HDFS-8012 URL: https://issues.apache.org/jira/browse/HDFS-8012 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client Reporter: Madhan Sundararajan Devaki Priority: Critical Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7652) Process block reports for erasure coded blocks
[ https://issues.apache.org/jira/browse/HDFS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7652: Fix Version/s: HDFS-7285 Process block reports for erasure coded blocks -- Key: HDFS-7652 URL: https://issues.apache.org/jira/browse/HDFS-7652 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: HDFS-7285 Attachments: HDFS-7652.001.patch, HDFS-7652.002.patch, HDFS-7652.003.patch, HDFS-7652.004.patch, HDFS-7652.005.patch, HDFS-7652.006.patch HDFS-7339 adds support in NameNode for persisting block groups. For memory efficiency, erasure coded blocks under the striping layout are not stored in {{BlockManager#blocksMap}}. Instead, entire block groups are stored in {{BlockGroupManager#blockGroups}}. When a block report arrives from the DataNode, it should be processed under the block group that it belongs to. The following naming protocol is used to calculate the group of a given block: {code} * HDFS-EC introduces a hierarchical protocol to name blocks and groups: * Contiguous: {reserved block IDs | flag | block ID} * Striped: {reserved block IDs | flag | block group ID | index in group} * * Following n bits of reserved block IDs, The (n+1)th bit in an ID * distinguishes contiguous (0) and striped (1) blocks. For a striped block, * bits (n+2) to (64-m) represent the ID of its block group, while the last m * bits represent its index of the group. The value m is determined by the * maximum number of blocks in a group (MAX_BLOCKS_IN_GROUP). {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8011) standby nn can't started
[ https://issues.apache.org/jira/browse/HDFS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388371#comment-14388371 ] Vinayakumar B commented on HDFS-8011: - Hi [~fujie] Can you attach little more log around above mentioned exceptions.? standby nn can't started Key: HDFS-8011 URL: https://issues.apache.org/jira/browse/HDFS-8011 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.3.0 Environment: centeros 6.2 64bit Reporter: fujie We have seen crash when starting the standby namenode, with fatal errors. Any solutions, workarouds, or ideas would be helpful for us. 1. Here is the context: At begining we have 2 namenodes, take A as active and B as standby. For some resons, namenode A was dead, so namenode B is working as active. When we try to restart A after a minute, it can't work. During this time a lot of files were put to HDFS, and a lot of files were renamed. Nodenode A crashed when awaiting reported blocks in safemode each time. 2. We can see error log below: 1)2015-03-30 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/xxx/_temporary/xxx/part-r-00074.bz2, replication=3, mtime=1427699913947, atime=1427699081161, blockSize=268435456, blocks=[blk_2103131025_1100889495739], permissions=dm:dm:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612] java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) 2)2015-03-30 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby N N. java.io.IOException: Failed to apply edit log operation AddBlockOp [path=/xxx/_temporary/xxx/part-m-00121, penultimateBlock=blk_2102331803_1100888911441, lastBlock=blk_2102661068_1100889009168, RpcClientId=, RpcCallId=-2]: error null at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356)
[jira] [Commented] (HDFS-7933) fsck should also report decommissioning replicas.
[ https://issues.apache.org/jira/browse/HDFS-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388386#comment-14388386 ] Hadoop QA commented on HDFS-7933: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708365/HDFS-7933.02.patch against trunk revision 85dc3c1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.hdfs.server.namenode.TestDefaultBlockPlacementPolicy Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10126//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10126//console This message is automatically generated. fsck should also report decommissioning replicas. -- Key: HDFS-7933 URL: https://issues.apache.org/jira/browse/HDFS-7933 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Jitendra Nath Pandey Assignee: Xiaoyu Yao Attachments: HDFS-7933.00.patch, HDFS-7933.01.patch, HDFS-7933.02.patch Fsck doesn't count replicas that are on decommissioning nodes. If a block has all replicas on the decommissioning nodes, it will be marked as missing, which is alarming for the admins, although the system will replicate them before nodes are decommissioned. Fsck output should also show decommissioning replicas along with the live replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388309#comment-14388309 ] Xinwei Qin commented on HDFS-7999: --- Hi [~cmccabe] Thanks for your comment. {quote} even if we made the heartbeat lockless, there are still many other problems associated with having FsDatasetImpl#createTemporary hold the FSDatasetImpl lock for a very long time. Any thread that needs to read or write from the datanode will be blocked. {quote} Make the heartbeat lockless can avoid the happening of dead DataNode, and I think it is a necessary patch([https://issues.apache.org/jira/browse/HDFS-7060]). FSDatasetImpl lock held for a long time is another problem, May be the patch of this jira can alleviate the problem. FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8027) Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits
[ https://issues.apache.org/jira/browse/HDFS-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-8027: Attachment: HDFS-8027-01.patch Attaching for reference. Jiras are ordered as per Jira resolution date Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits Key: HDFS-8027 URL: https://issues.apache.org/jira/browse/HDFS-8027 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-8027-01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8027) Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits
[ https://issues.apache.org/jira/browse/HDFS-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HDFS-8027. - Resolution: Fixed Fix Version/s: HDFS-7285 Committed to HDFS-7285 branch, Committed directly as this is only CHANGES-HDFS-7285.txt update. Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits Key: HDFS-8027 URL: https://issues.apache.org/jira/browse/HDFS-8027 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: HDFS-7285 Attachments: HDFS-8027-01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388329#comment-14388329 ] Vinayakumar B commented on HDFS-7285: - Hi, I think most of the commits to HDFS-7285 were not added to CHANGES-HDFS-EC-7285.txt. This will help to update CHANGES.txt at the time of merging to trunk, and hence recording the contributions. Very happy to see many new people Contributing to this work. For all commits till now I have updated CHANGES-HDFS-EC-7285.txt through HDFS-8027. Please take care for further commits. Thanks. Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, fsimage-analysis-20150105.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8012) Updatable HAR Filesystem
[ https://issues.apache.org/jira/browse/HDFS-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Madhan Sundararajan Devaki updated HDFS-8012: - Description: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files (Optional) This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. was: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. Updatable HAR Filesystem Key: HDFS-8012 URL: https://issues.apache.org/jira/browse/HDFS-8012 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client Reporter: Madhan Sundararajan Devaki Priority: Critical Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files (Optional) This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8012) Updatable HAR Filesystem
[ https://issues.apache.org/jira/browse/HDFS-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Madhan Sundararajan Devaki updated HDFS-8012: - Description: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files [ -a filename-uri1 filename-uri2 ... / -a dirname-uri1 dirname-uri2 ...] + Remove existing files [ -d filename-uri1 filename-uri2 ... / -d dirname-uri1 dirname-uri2 ...] + Update/Replace existing files (Optional) [ -u old-filename-uri new-filename-uri] This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. was: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files [ -a filename-uri1, filename-uri2, ...] + Remove existing files [ -d filename-uri1, filename-uri2, ...] + Update/Replace existing files (Optional) [ -u old-filename-uri new-filename-uri] This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. Updatable HAR Filesystem Key: HDFS-8012 URL: https://issues.apache.org/jira/browse/HDFS-8012 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client Reporter: Madhan Sundararajan Devaki Priority: Critical Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files [ -a filename-uri1 filename-uri2 ... / -a dirname-uri1 dirname-uri2 ...] + Remove existing files [ -d filename-uri1 filename-uri2 ... / -d dirname-uri1 dirname-uri2 ...] + Update/Replace existing files (Optional) [ -u old-filename-uri new-filename-uri] This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8027) Update CHANGES-HDFS-7285.txt with branch commits
Vinayakumar B created HDFS-8027: --- Summary: Update CHANGES-HDFS-7285.txt with branch commits Key: HDFS-8027 URL: https://issues.apache.org/jira/browse/HDFS-8027 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8027) Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits
[ https://issues.apache.org/jira/browse/HDFS-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-8027: Summary: Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits (was: Update CHANGES-HDFS-7285.txt with branch commits) Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits Key: HDFS-8027 URL: https://issues.apache.org/jira/browse/HDFS-8027 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8027) Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits
[ https://issues.apache.org/jira/browse/HDFS-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-8027: Description: Latest branch commits are not tracked in CHANGES-HDFS-7285.txt. Erasure Coding: Update CHANGES-HDFS-7285.txt with branch commits Key: HDFS-8027 URL: https://issues.apache.org/jira/browse/HDFS-8027 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: HDFS-7285 Attachments: HDFS-8027-01.patch Latest branch commits are not tracked in CHANGES-HDFS-7285.txt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8011) standby nn can't started
[ https://issues.apache.org/jira/browse/HDFS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388330#comment-14388330 ] fujie commented on HDFS-8011: - HDFS-6825 affects version is 2.5.0, but our hadoop version is 2.3.0. So are you sure it is the same issue? 1. I am sure that the file was deleted. And I have some new findings. Such as we have image-file-1, editlog-file-1 and editlog-file-inprogress when start the standby namenode A. I found the below behavior of these files: step-1) SNN will load image-file-1 and editlog-file-1 and generate new image file, take it as image-file-2. step-2) SNN will cp image-file-2 to ative namenode. step-3) editlog-file-inprogress will be renamed to editlog-file-2 and a new editlog-file-inprogress will be opened. step-4) SNN will load editlog-file-2, at the same time datanode will report heartbeat to both active and standby. The crash happends at step-4. We print all the failed files and all of them are in editlog-file-2. We alse have a statistics, 20,000 operations failed in 500,000 operations. Then we parsed editlog-file-2, and got the familar contents of failed records. All of them, RPC_CLIENTID is null(blank) , and RPC_CALLID is -2. RECORD OPCODEOP_ADD_BLOCK/OPCODE DATA TXID7660428426/TXID PATH/workspace/dm/recommend/VideoQuality/VRII/AppList/data/interactivedata_month/_temporary/1/_temporary/attempt_1427018831005_178665_r_02_0/part -r-2/PATH BLOCK BLOCK_ID2107099231/BLOCK_ID NUM_BYTES0/NUM_BYTES GENSTAMP1100893452304/GENSTAMP /BLOCK RPC_CLIENTID/RPC_CLIENTID RPC_CALLID-2/RPC_CALLID /DATA /RECORD 2. If we restart SNN A again, editlog-file-2 could be loaded correctly just like editlog-file-1 in last restart operation. It's weird. Does the reported heartbeat impact its behavior? But the load process and report process should asynchronous, isn't it? We are looking forward to you reply. standby nn can't started Key: HDFS-8011 URL: https://issues.apache.org/jira/browse/HDFS-8011 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.3.0 Environment: centeros 6.2 64bit Reporter: fujie We have seen crash when starting the standby namenode, with fatal errors. Any solutions, workarouds, or ideas would be helpful for us. 1. Here is the context: At begining we have 2 namenodes, take A as active and B as standby. For some resons, namenode A was dead, so namenode B is working as active. When we try to restart A after a minute, it can't work. During this time a lot of files were put to HDFS, and a lot of files were renamed. Nodenode A crashed when awaiting reported blocks in safemode each time. 2. We can see error log below: 1)2015-03-30 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/xxx/_temporary/xxx/part-r-00074.bz2, replication=3, mtime=1427699913947, atime=1427699081161, blockSize=268435456, blocks=[blk_2103131025_1100889495739], permissions=dm:dm:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612] java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at
[jira] [Commented] (HDFS-8011) standby nn can't started
[ https://issues.apache.org/jira/browse/HDFS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388332#comment-14388332 ] fujie commented on HDFS-8011: - HDFS-6825 affects version is 2.5.0, but our hadoop version is 2.3.0. So are you sure it is the same issue? 1. I am sure that the file was deleted. And I have some new findings. Such as we have image-file-1, editlog-file-1 and editlog-file-inprogress when start the standby namenode A. I found the below behavior of these files: step-1) SNN will load image-file-1 and editlog-file-1 and generate new image file, take it as image-file-2. step-2) SNN will cp image-file-2 to ative namenode. step-3) editlog-file-inprogress will be renamed to editlog-file-2 and a new editlog-file-inprogress will be opened. step-4) SNN will load editlog-file-2, at the same time datanode will report heartbeat to both active and standby. The crash happends at step-4. We print all the failed files and all of them are in editlog-file-2. We alse have a statistics, 20,000 operations failed in 500,000 operations. Then we parsed editlog-file-2, and got the familar contents of failed records. All of them, RPC_CLIENTID is null(blank) , and RPC_CALLID is -2. RECORD OPCODEOP_ADD_BLOCK/OPCODE DATA TXID7660428426/TXID PATH/workspace/dm/recommend/VideoQuality/VRII/AppList/data/interactivedata_month/_temporary/1/_temporary/attempt_1427018831005_178665_r_02_0/part -r-2/PATH BLOCK BLOCK_ID2107099231/BLOCK_ID NUM_BYTES0/NUM_BYTES GENSTAMP1100893452304/GENSTAMP /BLOCK RPC_CLIENTID/RPC_CLIENTID RPC_CALLID-2/RPC_CALLID /DATA /RECORD 2. If we restart SNN A again, editlog-file-2 could be loaded correctly just like editlog-file-1 in last restart operation. It's weird. Does the reported heartbeat impact its behavior? But the load process and report process should asynchronous, isn't it? We are looking forward to you reply. standby nn can't started Key: HDFS-8011 URL: https://issues.apache.org/jira/browse/HDFS-8011 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.3.0 Environment: centeros 6.2 64bit Reporter: fujie We have seen crash when starting the standby namenode, with fatal errors. Any solutions, workarouds, or ideas would be helpful for us. 1. Here is the context: At begining we have 2 namenodes, take A as active and B as standby. For some resons, namenode A was dead, so namenode B is working as active. When we try to restart A after a minute, it can't work. During this time a lot of files were put to HDFS, and a lot of files were renamed. Nodenode A crashed when awaiting reported blocks in safemode each time. 2. We can see error log below: 1)2015-03-30 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/xxx/_temporary/xxx/part-r-00074.bz2, replication=3, mtime=1427699913947, atime=1427699081161, blockSize=268435456, blocks=[blk_2103131025_1100889495739], permissions=dm:dm:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612] java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at
[jira] [Commented] (HDFS-7701) Support reporting per storage type quota and usage with hadoop/hdfs shell
[ https://issues.apache.org/jira/browse/HDFS-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388349#comment-14388349 ] Hadoop QA commented on HDFS-7701: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708388/HDFS-7701.04.patch against trunk revision b5a22e9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10128//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10128//console This message is automatically generated. Support reporting per storage type quota and usage with hadoop/hdfs shell - Key: HDFS-7701 URL: https://issues.apache.org/jira/browse/HDFS-7701 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Xiaoyu Yao Assignee: Peter Shi Attachments: HDFS-7701.01.patch, HDFS-7701.02.patch, HDFS-7701.03.patch, HDFS-7701.04.patch hadoop fs -count -q or hdfs dfs -count -q currently shows name space/disk space quota and remaining quota information. With HDFS-7584, we want to display per storage type quota and its remaining information as well. The current output format as shown below may not easily accomodate 6 more columns = 3 (existing storage types) * 2 (quota/remaining quota). With new storage types added in future, this will make the output even more crowded. There are also compatibility issues as we don't want to break any existing scripts monitoring hadoop fs -count -q output. $ hadoop fs -count -q -v /test QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf 524288000 5242665691 15 21431 /test Propose to add -t parameter to display ONLY the storage type quota information of the directory in the separately. This way, existing scripts will work as-is without using -t parameter. 1) When -t is not followed by a specific storage type, quota and usage information for all storage types will be displayed. $ hadoop fs -count -q -t -h -v /test SSD_QUOTA REM_SSD_QUOTA DISK_QUOTA REM_DISK_QUOTA ARCHIVAL_QUOTA REM_ARCHIVAL_QUOTA PATHNAME 512MB 256MB none inf none inf/test 2) If -t is followed by a storage type, only the quota and remaining quota of the storage type is displayed. $ hadoop fs -count -q -t SSD -h -v /test SSD_QUOTA REM_SSD_QUOTA PATHNAME 512 MB 256 MB /test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8012) Updatable HAR Filesystem
[ https://issues.apache.org/jira/browse/HDFS-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Madhan Sundararajan Devaki updated HDFS-8012: - Description: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files [ -a filename-uri1, filename-uri2, ...] + Remove existing files [ -d filename-uri1, filename-uri2, ...] + Update/Replace existing files (Optional) [ -u old-filename-uri new-filename-uri] This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. was: Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files + Remove existing files + Replace existing files (Optional) This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. Updatable HAR Filesystem Key: HDFS-8012 URL: https://issues.apache.org/jira/browse/HDFS-8012 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client Reporter: Madhan Sundararajan Devaki Priority: Critical Is there a plan to support updatable HAR Filesystem? If so, by when is this expected please? The following operations may be supported. + Add new files [ -a filename-uri1, filename-uri2, ...] + Remove existing files [ -d filename-uri1, filename-uri2, ...] + Update/Replace existing files (Optional) [ -u old-filename-uri new-filename-uri] This is required in cases where data is stored in AVRO format in HDFS and the corresponding .avsc files are used to create Hive external tables. This will lead to the small files (.avsc files in this case) problem when there are a large number of tables that need to be loaded into Hive as external tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7888) Change DataStreamer/DFSOutputStream/DFSPacket for convenience of subclassing
[ https://issues.apache.org/jira/browse/HDFS-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388304#comment-14388304 ] Hadoop QA commented on HDFS-7888: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708333/HDFS-7888-trunk-001.patch against trunk revision 85dc3c1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10124//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10124//console This message is automatically generated. Change DataStreamer/DFSOutputStream/DFSPacket for convenience of subclassing Key: HDFS-7888 URL: https://issues.apache.org/jira/browse/HDFS-7888 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7888-001.patch, HDFS-7888-trunk-001.patch HDFS-7793 refactors class {{DFSOutputStream}} on trunk which makes {{DFSOutputStream}} a class without any inner classes. We want to subclass {{DFSOutputStream}} to support striping layout writing. This JIRA depends upon HDFS-7793 and tries to change DataStreamer/DFSOutputStream/DFSPacket for convenience of subclassing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7954) TestBalancer#testBalancerWithPinnedBlocks failed on Windows
[ https://issues.apache.org/jira/browse/HDFS-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388335#comment-14388335 ] Hadoop QA commented on HDFS-7954: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708345/HDFS-7947.00.patch against trunk revision 85dc3c1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10125//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10125//console This message is automatically generated. TestBalancer#testBalancerWithPinnedBlocks failed on Windows --- Key: HDFS-7954 URL: https://issues.apache.org/jira/browse/HDFS-7954 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7947.00.patch {code} testBalancerWithPinnedBlocks(org.apache.hadoop.hdfs.server.balancer.TestBalancer) Time elapsed: 22.624 sec FAILURE! java.lang.AssertionError: expected:-3 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithPinnedBlocks(TestBalancer.java:353) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7937) Erasure Coding: INodeFile quota computation unit tests
[ https://issues.apache.org/jira/browse/HDFS-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388482#comment-14388482 ] Rakesh R commented on HDFS-7937: Thanks [~kaisasak], good number of unit test cases. I've few minor comments, please see it. # TestINodeFile#testBlockStripedTotalBlockCount, Do we need the below logic in this testcase ? {code} +INodeFile inf = createINodeFile(HdfsConstants.EC_STORAGE_POLICY_ID); +inf.addStripedBlocksFeature(); {code} # Could you please do the assertion by reversing the {{actual}} and {{expected}} arguments. I could see this kind of usage in many places, please modify all such cases. For example, case-1) {code} assertEquals(inf.getBlocks().length, 1); can be written as : assertEquals(1, inf.getBlocks().length); {code} Case-2) {code} assertEquals(blockInfoStriped.getTotalBlockNum(), 9); can be written as : assertEquals(9, blockInfoStriped.getTotalBlockNum()); {code} Erasure Coding: INodeFile quota computation unit tests -- Key: HDFS-7937 URL: https://issues.apache.org/jira/browse/HDFS-7937 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Sasaki Assignee: Kai Sasaki Priority: Minor Attachments: HDFS-7937.1.patch, HDFS-7937.2.patch Unit test for [HDFS-7826|https://issues.apache.org/jira/browse/HDFS-7826] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7937) Erasure Coding: INodeFile quota computation unit tests
[ https://issues.apache.org/jira/browse/HDFS-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-7937: --- Status: Open (was: Patch Available) Erasure Coding: INodeFile quota computation unit tests -- Key: HDFS-7937 URL: https://issues.apache.org/jira/browse/HDFS-7937 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Sasaki Assignee: Kai Sasaki Priority: Minor Attachments: HDFS-7937.1.patch, HDFS-7937.2.patch Unit test for [HDFS-7826|https://issues.apache.org/jira/browse/HDFS-7826] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8002) Website refers to /trash directory
[ https://issues.apache.org/jira/browse/HDFS-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388414#comment-14388414 ] Hudson commented on HDFS-8002: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/]) HDFS-8002. Website refers to /trash directory. Contributd by Brahma Reddy Battula. (aajisaka: rev e7ea2a8e8f0a7b428ef10552885757b99b59e4dc) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Website refers to /trash directory -- Key: HDFS-8002 URL: https://issues.apache.org/jira/browse/HDFS-8002 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Mike Drob Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-8002.patch, HDFS-8003-002.patch On http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#File_Deletes_and_Undeletes the section on trash refers to files residing in {{/trash}}. I think this is an error, as files actually go to user specific trash directories like {{/user/hdfs/.Trash}} Either the site needs to be updated to mention user specific directories, or if this is a change from previous behaviour then maybe that can be mentioned instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3918) EditLogTailer shouldn't log WARN when other node is in standby mode
[ https://issues.apache.org/jira/browse/HDFS-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388415#comment-14388415 ] Hudson commented on HDFS-3918: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/]) HDFS-3918. EditLogTailer shouldn't log WARN when other node is in standby mode. Contributed by Todd Lipcon. (harsh: rev cce66ba3c9ec293e8ba1afd0eb518c7ca0bbc7c9) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java EditLogTailer shouldn't log WARN when other node is in standby mode --- Key: HDFS-3918 URL: https://issues.apache.org/jira/browse/HDFS-3918 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.0.3-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 2.8.0 Attachments: hdfs-3918.txt If both nodes are in standby mode, each will be trying to roll the others' logs, which results in errors like: Unable to trigger a roll of the active NN org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is not supported in state standby We should catch this specific exception and not log it at WARN level, since it's expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7261) storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState()
[ https://issues.apache.org/jira/browse/HDFS-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388413#comment-14388413 ] Hudson commented on HDFS-7261: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/]) HDFS-7261. storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState() (Brahma Reddy Battula via Colin P. McCabe) (cmccabe: rev 1feb9569f366a29ecb43592d71ee21023162c18f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState() --- Key: HDFS-7261 URL: https://issues.apache.org/jira/browse/HDFS-7261 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7261-001.patch, HDFS-7261-002.patch, HDFS-7261.patch Here is the code: {code} failedStorageInfos = new HashSetDatanodeStorageInfo( storageMap.values()); {code} In other places, the lock on DatanodeDescriptor.storageMap is held: {code} synchronized (storageMap) { final CollectionDatanodeStorageInfo storages = storageMap.values(); return storages.toArray(new DatanodeStorageInfo[storages.size()]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7944) Minor cleanup of BlockPoolManager#getAllNamenodeThreads
[ https://issues.apache.org/jira/browse/HDFS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388422#comment-14388422 ] Hudson commented on HDFS-7944: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/]) HDFS-7944. Minor cleanup of BlockPoolManager#getAllNamenodeThreads. (Arpit Agarwal) (arp: rev 85dc3c14b2ca4b01a93361bb925c39a22a6fd8db) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestTriggerBlockReport.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBlockReports.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestRefreshNamenodes.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMultipleRegistrations.java Minor cleanup of BlockPoolManager#getAllNamenodeThreads --- Key: HDFS-7944 URL: https://issues.apache.org/jira/browse/HDFS-7944 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Fix For: 2.8.0 Attachments: HDFS-7944.01.patch, HDFS-7944.02.patch {{BlockPoolManager#getAllNamenodeThreads}} can avoid unnecessary list to array conversion and vice versa by returning an unmodifiable list. Since NN addition/removal is relatively rare we can just use a {{CopyOnWriteArrayList}} for concurrency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8028) TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704
hongyu bi created HDFS-8028: --- Summary: TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704 Key: HDFS-8028 URL: https://issues.apache.org/jira/browse/HDFS-8028 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: hongyu bi Assignee: hongyu bi Priority: Minor HDFS-7704 makes BadBlockReport asynchronously however BlockReportTestBase#blockreport_02 doesn't wait for a while after blockreport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8028) TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704
[ https://issues.apache.org/jira/browse/HDFS-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongyu bi updated HDFS-8028: Attachment: HDFS-8028-v0.patch TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704 -- Key: HDFS-8028 URL: https://issues.apache.org/jira/browse/HDFS-8028 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: hongyu bi Assignee: hongyu bi Priority: Minor Attachments: HDFS-8028-v0.patch HDFS-7704 makes BadBlockReport asynchronously however BlockReportTestBase#blockreport_02 doesn't wait for a while after blockreport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8028) TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704
[ https://issues.apache.org/jira/browse/HDFS-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongyu bi updated HDFS-8028: Attachment: HDFS-8028-v0.patch TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704 -- Key: HDFS-8028 URL: https://issues.apache.org/jira/browse/HDFS-8028 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: hongyu bi Assignee: hongyu bi Priority: Minor Attachments: HDFS-8028-v0.patch HDFS-7704 makes BadBlockReport asynchronously however BlockReportTestBase#blockreport_02 doesn't wait for a while after blockreport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times
[ https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388416#comment-14388416 ] Hudson commented on HDFS-7645: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/]) HDFS-7645. Rolling upgrade is restoring blocks from trash multiple times (Contributed by Vinayakumar B and Keisuke Ogiwara) (arp: rev 1a495fbb489c9e9a23b341a52696d10e9e272b04) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/RollingUpgradeStatus.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeRollingUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/RollingUpgradeInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java Rolling upgrade is restoring blocks from trash multiple times - Key: HDFS-7645 URL: https://issues.apache.org/jira/browse/HDFS-7645 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Keisuke Ogiwara Fix For: 2.8.0 Attachments: HDFS-7645.01.patch, HDFS-7645.02.patch, HDFS-7645.03.patch, HDFS-7645.04.patch, HDFS-7645.05.patch, HDFS-7645.06.patch, HDFS-7645.07.patch When performing an HDFS rolling upgrade, the trash directory is getting restored twice when under normal circumstances it shouldn't need to be restored at all. iiuc, the only time these blocks should be restored is if we need to rollback a rolling upgrade. On a busy cluster, this can cause significant and unnecessary block churn both on the datanodes, and more importantly in the namenode. The two times this happens are: 1) restart of DN onto new software {code} private void doTransition(DataNode datanode, StorageDirectory sd, NamespaceInfo nsInfo, StartupOption startOpt) throws IOException { if (startOpt == StartupOption.ROLLBACK sd.getPreviousDir().exists()) { Preconditions.checkState(!getTrashRootDir(sd).exists(), sd.getPreviousDir() + and + getTrashRootDir(sd) + should not + both be present.); doRollback(sd, nsInfo); // rollback if applicable } else { // Restore all the files in the trash. The restored files are retained // during rolling upgrade rollback. They are deleted during rolling // upgrade downgrade. int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd)); LOG.info(Restored + restored + block files from trash.); } {code} 2) When heartbeat response no longer indicates a rollingupgrade is in progress {code} /** * Signal the current rolling upgrade status as indicated by the NN. * @param inProgress true if a rolling upgrade is in progress */ void signalRollingUpgrade(boolean inProgress) throws IOException { String bpid = getBlockPoolId(); if (inProgress) { dn.getFSDataset().enableTrash(bpid); dn.getFSDataset().setRollingUpgradeMarker(bpid); } else { dn.getFSDataset().restoreTrash(bpid); dn.getFSDataset().clearRollingUpgradeMarker(bpid); } } {code} HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely clear whether this is somehow intentional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7742) favoring decommissioning node for replication can cause a block to stay underreplicated for long periods
[ https://issues.apache.org/jira/browse/HDFS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388412#comment-14388412 ] Hudson commented on HDFS-7742: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/]) HDFS-7742. Favoring decommissioning node for replication can cause a block to stay (kihwal: rev 04ee18ed48ceef34598f954ff40940abc9fde1d2) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java favoring decommissioning node for replication can cause a block to stay underreplicated for long periods Key: HDFS-7742 URL: https://issues.apache.org/jira/browse/HDFS-7742 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Fix For: 2.7.0 Attachments: HDFS-7742-v0.patch When choosing a source node to replicate a block from, a decommissioning node is favored. The reason for the favoritism is that decommissioning nodes aren't servicing any writes so in-theory they are less loaded. However, the same selection algorithm also tries to make sure it doesn't get stuck on any particular node: {noformat} // switch to a different node randomly // this to prevent from deterministically selecting the same node even // if the node failed to replicate the block on previous iterations {noformat} Unfortunately, the decommissioning check is prior to this randomness so the algorithm can get stuck trying to replicate from a decommissioning node. We've seen this in practice where a decommissioning datanode was failing to replicate a block for many days, when other viable replicas of the block were available. Given that we limit the number of streams we'll assign to a given node (default soft limit of 2, hard limit of 4), It doesn't seem like favoring a decommissioning node has significant benefit. i.e. when there is significant replication work to do, we'll quickly hit the stream limit of the decommissioning nodes and use other nodes in the cluster anyway; when there isn't significant replication work then in theory we've got plenty of replication bandwidth available so choosing a decommissioning node isn't much of a win. I see two choices: 1) Change the algorithm to still favor decommissioning nodes but with some level of randomness that will avoid always selecting the decommissioning node 2) Remove the favoritism for decommissioning nodes I prefer #2. It simplifies the algorithm, and given the other throttles we have in place, I'm not sure there is a significant benefit to selecting decommissioning nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7748) Separate ECN flags from the Status in the DataTransferPipelineAck
[ https://issues.apache.org/jira/browse/HDFS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388418#comment-14388418 ] Hudson commented on HDFS-7748: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/]) HDFS-7748. Separate ECN flags from the Status in the DataTransferPipelineAck. Contributed by Anu Engineer and Haohui Mai. (wheat9: rev b80457158daf0dc712fbe5695625cc17d70d4bb4) * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java Addendum for HDFS-7748. (wheat9: rev 0967b1d99d7001cd1d09ebd29b9360f1079410e8) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java Separate ECN flags from the Status in the DataTransferPipelineAck - Key: HDFS-7748 URL: https://issues.apache.org/jira/browse/HDFS-7748 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Anu Engineer Priority: Blocker Attachments: HDFS-7748.007-addendum.patch, HDFS-7748.007.patch, hdfs-7748.001.patch, hdfs-7748.002.patch, hdfs-7748.003.patch, hdfs-7748.004.patch, hdfs-7748.005.patch, hdfs-7748.006.patch, hdfs-7748.branch-2.7.006.patch Prior to the discussions on HDFS-7270, the old clients might fail to talk to the newer server when ECN is turned on. This jira proposes to separate the ECN flags in a separate protobuf field to make the ack compatible on both versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8028) TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704
[ https://issues.apache.org/jira/browse/HDFS-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongyu bi updated HDFS-8028: Attachment: (was: HDFS-8028-v0.patch) TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704 -- Key: HDFS-8028 URL: https://issues.apache.org/jira/browse/HDFS-8028 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: hongyu bi Assignee: hongyu bi Priority: Minor Attachments: HDFS-8028-v0.patch HDFS-7704 makes BadBlockReport asynchronously however BlockReportTestBase#blockreport_02 doesn't wait for a while after blockreport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7261) storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState()
[ https://issues.apache.org/jira/browse/HDFS-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388433#comment-14388433 ] Hudson commented on HDFS-7261: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/883/]) HDFS-7261. storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState() (Brahma Reddy Battula via Colin P. McCabe) (cmccabe: rev 1feb9569f366a29ecb43592d71ee21023162c18f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState() --- Key: HDFS-7261 URL: https://issues.apache.org/jira/browse/HDFS-7261 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-7261-001.patch, HDFS-7261-002.patch, HDFS-7261.patch Here is the code: {code} failedStorageInfos = new HashSetDatanodeStorageInfo( storageMap.values()); {code} In other places, the lock on DatanodeDescriptor.storageMap is held: {code} synchronized (storageMap) { final CollectionDatanodeStorageInfo storages = storageMap.values(); return storages.toArray(new DatanodeStorageInfo[storages.size()]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7742) favoring decommissioning node for replication can cause a block to stay underreplicated for long periods
[ https://issues.apache.org/jira/browse/HDFS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388432#comment-14388432 ] Hudson commented on HDFS-7742: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/883/]) HDFS-7742. Favoring decommissioning node for replication can cause a block to stay (kihwal: rev 04ee18ed48ceef34598f954ff40940abc9fde1d2) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt favoring decommissioning node for replication can cause a block to stay underreplicated for long periods Key: HDFS-7742 URL: https://issues.apache.org/jira/browse/HDFS-7742 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Fix For: 2.7.0 Attachments: HDFS-7742-v0.patch When choosing a source node to replicate a block from, a decommissioning node is favored. The reason for the favoritism is that decommissioning nodes aren't servicing any writes so in-theory they are less loaded. However, the same selection algorithm also tries to make sure it doesn't get stuck on any particular node: {noformat} // switch to a different node randomly // this to prevent from deterministically selecting the same node even // if the node failed to replicate the block on previous iterations {noformat} Unfortunately, the decommissioning check is prior to this randomness so the algorithm can get stuck trying to replicate from a decommissioning node. We've seen this in practice where a decommissioning datanode was failing to replicate a block for many days, when other viable replicas of the block were available. Given that we limit the number of streams we'll assign to a given node (default soft limit of 2, hard limit of 4), It doesn't seem like favoring a decommissioning node has significant benefit. i.e. when there is significant replication work to do, we'll quickly hit the stream limit of the decommissioning nodes and use other nodes in the cluster anyway; when there isn't significant replication work then in theory we've got plenty of replication bandwidth available so choosing a decommissioning node isn't much of a win. I see two choices: 1) Change the algorithm to still favor decommissioning nodes but with some level of randomness that will avoid always selecting the decommissioning node 2) Remove the favoritism for decommissioning nodes I prefer #2. It simplifies the algorithm, and given the other throttles we have in place, I'm not sure there is a significant benefit to selecting decommissioning nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3918) EditLogTailer shouldn't log WARN when other node is in standby mode
[ https://issues.apache.org/jira/browse/HDFS-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388435#comment-14388435 ] Hudson commented on HDFS-3918: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/883/]) HDFS-3918. EditLogTailer shouldn't log WARN when other node is in standby mode. Contributed by Todd Lipcon. (harsh: rev cce66ba3c9ec293e8ba1afd0eb518c7ca0bbc7c9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt EditLogTailer shouldn't log WARN when other node is in standby mode --- Key: HDFS-3918 URL: https://issues.apache.org/jira/browse/HDFS-3918 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.0.3-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 2.8.0 Attachments: hdfs-3918.txt If both nodes are in standby mode, each will be trying to roll the others' logs, which results in errors like: Unable to trigger a roll of the active NN org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is not supported in state standby We should catch this specific exception and not log it at WARN level, since it's expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8002) Website refers to /trash directory
[ https://issues.apache.org/jira/browse/HDFS-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388434#comment-14388434 ] Hudson commented on HDFS-8002: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/883/]) HDFS-8002. Website refers to /trash directory. Contributd by Brahma Reddy Battula. (aajisaka: rev e7ea2a8e8f0a7b428ef10552885757b99b59e4dc) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Website refers to /trash directory -- Key: HDFS-8002 URL: https://issues.apache.org/jira/browse/HDFS-8002 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Mike Drob Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-8002.patch, HDFS-8003-002.patch On http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#File_Deletes_and_Undeletes the section on trash refers to files residing in {{/trash}}. I think this is an error, as files actually go to user specific trash directories like {{/user/hdfs/.Trash}} Either the site needs to be updated to mention user specific directories, or if this is a change from previous behaviour then maybe that can be mentioned instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times
[ https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388436#comment-14388436 ] Hudson commented on HDFS-7645: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/883/]) HDFS-7645. Rolling upgrade is restoring blocks from trash multiple times (Contributed by Vinayakumar B and Keisuke Ogiwara) (arp: rev 1a495fbb489c9e9a23b341a52696d10e9e272b04) * hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/RollingUpgradeInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/RollingUpgradeStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeRollingUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java Rolling upgrade is restoring blocks from trash multiple times - Key: HDFS-7645 URL: https://issues.apache.org/jira/browse/HDFS-7645 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Keisuke Ogiwara Fix For: 2.8.0 Attachments: HDFS-7645.01.patch, HDFS-7645.02.patch, HDFS-7645.03.patch, HDFS-7645.04.patch, HDFS-7645.05.patch, HDFS-7645.06.patch, HDFS-7645.07.patch When performing an HDFS rolling upgrade, the trash directory is getting restored twice when under normal circumstances it shouldn't need to be restored at all. iiuc, the only time these blocks should be restored is if we need to rollback a rolling upgrade. On a busy cluster, this can cause significant and unnecessary block churn both on the datanodes, and more importantly in the namenode. The two times this happens are: 1) restart of DN onto new software {code} private void doTransition(DataNode datanode, StorageDirectory sd, NamespaceInfo nsInfo, StartupOption startOpt) throws IOException { if (startOpt == StartupOption.ROLLBACK sd.getPreviousDir().exists()) { Preconditions.checkState(!getTrashRootDir(sd).exists(), sd.getPreviousDir() + and + getTrashRootDir(sd) + should not + both be present.); doRollback(sd, nsInfo); // rollback if applicable } else { // Restore all the files in the trash. The restored files are retained // during rolling upgrade rollback. They are deleted during rolling // upgrade downgrade. int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd)); LOG.info(Restored + restored + block files from trash.); } {code} 2) When heartbeat response no longer indicates a rollingupgrade is in progress {code} /** * Signal the current rolling upgrade status as indicated by the NN. * @param inProgress true if a rolling upgrade is in progress */ void signalRollingUpgrade(boolean inProgress) throws IOException { String bpid = getBlockPoolId(); if (inProgress) { dn.getFSDataset().enableTrash(bpid); dn.getFSDataset().setRollingUpgradeMarker(bpid); } else { dn.getFSDataset().restoreTrash(bpid); dn.getFSDataset().clearRollingUpgradeMarker(bpid); } } {code} HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely clear whether this is somehow intentional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7944) Minor cleanup of BlockPoolManager#getAllNamenodeThreads
[ https://issues.apache.org/jira/browse/HDFS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388442#comment-14388442 ] Hudson commented on HDFS-7944: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/883/]) HDFS-7944. Minor cleanup of BlockPoolManager#getAllNamenodeThreads. (Arpit Agarwal) (arp: rev 85dc3c14b2ca4b01a93361bb925c39a22a6fd8db) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMultipleRegistrations.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestRefreshNamenodes.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBlockReports.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestTriggerBlockReport.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolManager.java Minor cleanup of BlockPoolManager#getAllNamenodeThreads --- Key: HDFS-7944 URL: https://issues.apache.org/jira/browse/HDFS-7944 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Fix For: 2.8.0 Attachments: HDFS-7944.01.patch, HDFS-7944.02.patch {{BlockPoolManager#getAllNamenodeThreads}} can avoid unnecessary list to array conversion and vice versa by returning an unmodifiable list. Since NN addition/removal is relatively rare we can just use a {{CopyOnWriteArrayList}} for concurrency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7748) Separate ECN flags from the Status in the DataTransferPipelineAck
[ https://issues.apache.org/jira/browse/HDFS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388438#comment-14388438 ] Hudson commented on HDFS-7748: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/883/]) HDFS-7748. Separate ECN flags from the Status in the DataTransferPipelineAck. Contributed by Anu Engineer and Haohui Mai. (wheat9: rev b80457158daf0dc712fbe5695625cc17d70d4bb4) * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java Addendum for HDFS-7748. (wheat9: rev 0967b1d99d7001cd1d09ebd29b9360f1079410e8) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java Separate ECN flags from the Status in the DataTransferPipelineAck - Key: HDFS-7748 URL: https://issues.apache.org/jira/browse/HDFS-7748 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Anu Engineer Priority: Blocker Attachments: HDFS-7748.007-addendum.patch, HDFS-7748.007.patch, hdfs-7748.001.patch, hdfs-7748.002.patch, hdfs-7748.003.patch, hdfs-7748.004.patch, hdfs-7748.005.patch, hdfs-7748.006.patch, hdfs-7748.branch-2.7.006.patch Prior to the discussions on HDFS-7270, the old clients might fail to talk to the newer server when ECN is turned on. This jira proposes to separate the ECN flags in a separate protobuf field to make the ack compatible on both versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7939) Two fsimage_rollback_* files are created which are not deleted after rollback.
[ https://issues.apache.org/jira/browse/HDFS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388445#comment-14388445 ] Hadoop QA commented on HDFS-7939: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708145/HDFS-7939.1.patch against trunk revision 85dc3c1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10127//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10127//console This message is automatically generated. Two fsimage_rollback_* files are created which are not deleted after rollback. -- Key: HDFS-7939 URL: https://issues.apache.org/jira/browse/HDFS-7939 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical Attachments: HDFS-7939.1.patch During checkpoint , if any failure in uploading to the remote Namenode then restarting Namenode with rollingUpgrade started option creates 2 fsimage_rollback_* at Active Namenode . On rolling upgrade rollback , initially created fsimage_rollback_* file is not been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7939) Two fsimage_rollback_* files are created which are not deleted after rollback.
[ https://issues.apache.org/jira/browse/HDFS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388459#comment-14388459 ] J.Andreina commented on HDFS-7939: -- Testcase failures are not related to this path. Please review the patch. Two fsimage_rollback_* files are created which are not deleted after rollback. -- Key: HDFS-7939 URL: https://issues.apache.org/jira/browse/HDFS-7939 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical Attachments: HDFS-7939.1.patch During checkpoint , if any failure in uploading to the remote Namenode then restarting Namenode with rollingUpgrade started option creates 2 fsimage_rollback_* at Active Namenode . On rolling upgrade rollback , initially created fsimage_rollback_* file is not been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8009) Signal congestion on the DataNode
[ https://issues.apache.org/jira/browse/HDFS-8009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389847#comment-14389847 ] Hadoop QA commented on HDFS-8009: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708538/HDFS-8009.000.patch against trunk revision e428fea. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10133//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10133//console This message is automatically generated. Signal congestion on the DataNode - Key: HDFS-8009 URL: https://issues.apache.org/jira/browse/HDFS-8009 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8009.000.patch The DataNode should signal congestion (i.e. I'm too busy) in the PipelineAck using the mechanism introduced in HDFS-7270. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8020) Erasure Coding: restore BlockGroup and schema info from stripping coding command
[ https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki reassigned HDFS-8020: Assignee: Kai Sasaki (was: Kai Zheng) Erasure Coding: restore BlockGroup and schema info from stripping coding command Key: HDFS-8020 URL: https://issues.apache.org/jira/browse/HDFS-8020 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Sasaki As a task of HDFS-7344, to process *stripping* coding commands from NameNode or other scheduler services/tools, we need to first be able to restore BlockGroup and schema information in DataNode, which will be used to construct coding work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8035) Move checking replication and get client DN to BM and DM respectively
[ https://issues.apache.org/jira/browse/HDFS-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389944#comment-14389944 ] Hadoop QA commented on HDFS-8035: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708569/HDFS-8035.000.patch against trunk revision 2daa478. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestLeaseRecovery org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength org.apache.hadoop.hdfs.TestSafeMode org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache org.apache.hadoop.hdfs.TestParallelShortCircuitRead org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader org.apache.hadoop.hdfs.TestFSInputChecker org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks org.apache.hadoop.hdfs.tools.TestDebugAdmin org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles org.apache.hadoop.fs.TestEnhancedByteBufferAccess org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.TestMultiThreadedHflush org.apache.hadoop.hdfs.TestParallelRead org.apache.hadoop.hdfs.server.namenode.snapshot.TestSetQuotaWithSnapshot org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS org.apache.hadoop.hdfs.tools.TestStoragePolicyCommands org.apache.hadoop.hdfs.TestDFSRemove org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot org.apache.hadoop.hdfs.TestHFlush org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.TestSetTimes org.apache.hadoop.hdfs.server.namenode.TestAddBlock org.apache.hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold org.apache.hadoop.hdfs.TestMissingBlocksAlert org.apache.hadoop.hdfs.TestParallelShortCircuitReadNoChecksum org.apache.hadoop.hdfs.TestBlocksScheduledCounter org.apache.hadoop.hdfs.TestEncryptedTransfer org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs org.apache.hadoop.hdfs.server.mover.TestMover org.apache.hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode org.apache.hadoop.fs.TestUnbuffer org.apache.hadoop.hdfs.TestParallelShortCircuitLegacyRead org.apache.hadoop.hdfs.TestQuota org.apache.hadoop.hdfs.TestDFSClientFailover
[jira] [Assigned] (HDFS-8037) WebHDFS: CheckAccess silently accepts certain malformed FsActions
[ https://issues.apache.org/jira/browse/HDFS-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su reassigned HDFS-8037: --- Assignee: Walter Su WebHDFS: CheckAccess silently accepts certain malformed FsActions - Key: HDFS-8037 URL: https://issues.apache.org/jira/browse/HDFS-8037 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jake Low Assignee: Walter Su Priority: Minor Labels: easyfix, newbie WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises an exception, as it should. {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x; HTTP/1.1 403 Forbidden Content-Type: application/json { RemoteException: { exception: AccessControlException, javaClassName: org.apache.hadoop.security.AccessControlException, message: Permission denied: user=nobody, access=READ_WRITE, inode=\\/myfile\:root:supergroup:drwxr-xr-x } } {code} But if we instead request {{r-w}} access, the call appears to succeed: {code:none} curl -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w; HTTP/1.1 200 OK Content-Length: 0 {code} As I see it, the fix would be to change the regex pattern in {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7937) Erasure Coding: INodeFile quota computation unit tests
[ https://issues.apache.org/jira/browse/HDFS-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389967#comment-14389967 ] Rakesh R commented on HDFS-7937: Thanks [~kaisasak]! In the latest patch {{INodeFile#computeQuotaUsageWithStriped}} related changes are missing, could you please tell me any specific reason for this?. Apart from this latest patch looks pretty good. Also, one general observation - instead of {{SubmitPatch}} can we do {{StartProgress}}. This would avoid triggering the Jenkins and then adding Hudson QA comment in jira. Like [~zhz] mentioned jenkins only work in {{trunk}} Erasure Coding: INodeFile quota computation unit tests -- Key: HDFS-7937 URL: https://issues.apache.org/jira/browse/HDFS-7937 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Sasaki Assignee: Kai Sasaki Priority: Minor Attachments: HDFS-7937.1.patch, HDFS-7937.2.patch, HDFS-7937.3.patch Unit test for [HDFS-7826|https://issues.apache.org/jira/browse/HDFS-7826] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7888) Change DataStreamer/DFSOutputStream/DFSPacket for convenience of subclassing
[ https://issues.apache.org/jira/browse/HDFS-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389983#comment-14389983 ] Li Bo commented on HDFS-7888: - That's a very good improvement of the patch. I will also update the patch of HDFS-7889 Change DataStreamer/DFSOutputStream/DFSPacket for convenience of subclassing Key: HDFS-7888 URL: https://issues.apache.org/jira/browse/HDFS-7888 Project: Hadoop HDFS Issue Type: Improvement Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7888-001.patch, HDFS-7888-trunk-001.patch, HDFS-7888-trunk-002.patch HDFS-7793 refactors class {{DFSOutputStream}} on trunk which makes {{DFSOutputStream}} a class without any inner classes. We want to subclass {{DFSOutputStream}} to support striping layout writing. This JIRA depends upon HDFS-7793 and tries to change DataStreamer/DFSOutputStream/DFSPacket for convenience of subclassing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7889) Subclass DFSOutputStream to support writing striping layout files
[ https://issues.apache.org/jira/browse/HDFS-7889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7889: Attachment: HDFS-7889-006.patch Subclass DFSOutputStream to support writing striping layout files - Key: HDFS-7889 URL: https://issues.apache.org/jira/browse/HDFS-7889 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7889-001.patch, HDFS-7889-002.patch, HDFS-7889-003.patch, HDFS-7889-004.patch, HDFS-7889-005.patch, HDFS-7889-006.patch After HDFS-7888, we can subclass {{DFSOutputStream}} to support writing striping layout files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8008) Support client-side back off when the datanodes are congested
[ https://issues.apache.org/jira/browse/HDFS-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389808#comment-14389808 ] Hadoop QA commented on HDFS-8008: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708532/HDFS-8008.000.patch against trunk revision e428fea. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10132//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/10132//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10132//console This message is automatically generated. Support client-side back off when the datanodes are congested - Key: HDFS-8008 URL: https://issues.apache.org/jira/browse/HDFS-8008 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8008.000.patch HDFS-7270 introduces the mechanism for DataNode to signal congestions. DFSClient should be able to recognize the signals and back off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389891#comment-14389891 ] GAO Rui commented on HDFS-8033: --- [~zhz] bq. stateful (non-positional) read means reading the whole file without any position requirement? Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8034) Fix TestDFSClientRetries#testDFSClientConfigurationLocateFollowingBlockInitialDelay for Windows
[ https://issues.apache.org/jira/browse/HDFS-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389950#comment-14389950 ] Hadoop QA commented on HDFS-8034: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708565/HDFS-8034.00.patch against trunk revision 18a91fe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10134//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10134//console This message is automatically generated. Fix TestDFSClientRetries#testDFSClientConfigurationLocateFollowingBlockInitialDelay for Windows Key: HDFS-8034 URL: https://issues.apache.org/jira/browse/HDFS-8034 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-8034.00.patch TestDFSClientRetries#testDFSClientConfigurationLocateFollowingBlockInitialDelay failed subsequent tests on Windows because this test case fails to shutdown the MiniDFS cluster. I will post a patch for it shortly. {code} testRetryOnChecksumFailure(org.apache.hadoop.hdfs.TestDFSClientRetries) Time elapsed: 0.012 sec ERROR! java.io.IOException: Could not fully delete D:\w\hbk\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1 at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:943) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:814) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:473) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:432) at org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure(TestDFSClientRetries.java:1091) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8036) Use snapshot path as source when using snapshot diff report in DistCp
[ https://issues.apache.org/jira/browse/HDFS-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8036: Attachment: HDFS-8036.000.patch Initial patch to fix. Use snapshot path as source when using snapshot diff report in DistCp - Key: HDFS-8036 URL: https://issues.apache.org/jira/browse/HDFS-8036 Project: Hadoop HDFS Issue Type: Bug Components: distcp Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8036.000.patch When using snapshot diff report for distcp (HDFS-7535), the semantic should be apply the diff to the target in order to sync the target with source@snapshot2. Therefore after syncing based on the snapshot diff report, we should append the name of snapshot2 to the original source path and use it as the new source name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8037) WebHDFS: CheckAccess silently accepts certain malformed FsActions
[ https://issues.apache.org/jira/browse/HDFS-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Low updated HDFS-8037: --- Description: WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises an exception, as it should. {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x; HTTP/1.1 403 Forbidden Content-Type: application/json { RemoteException: { exception: AccessControlException, javaClassName: org.apache.hadoop.security.AccessControlException, message: Permission denied: user=nobody, access=READ_WRITE, inode=\\/myfile\:root:supergroup:drwxr-xr-x } } {code} But if we instead request {{r-w}} access, the call appears to succeed: {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w; HTTP/1.1 200 OK Content-Length: 0 {code} As I see it, the fix would be to change the regex pattern in {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}. was: WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises an exception, as it should. {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x; HTTP/1.1 403 Forbidden Content-Type: application/json { RemoteException: { exception: AccessControlException, javaClassName: org.apache.hadoop.security.AccessControlException, message: Permission denied: user=nobody, access=READ_WRITE, inode=\\/myfile\:root:supergroup:drwxr-xr-x } } {code} But if we instead request {{r-w}} access, the call appears to succeed: {code:none} curl -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w; HTTP/1.1 200 OK Content-Length: 0 {code} As I see it, the fix would be to change the regex pattern in {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}. WebHDFS: CheckAccess silently accepts certain malformed FsActions - Key: HDFS-8037 URL: https://issues.apache.org/jira/browse/HDFS-8037 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jake Low Assignee: Walter Su Priority: Minor Labels: easyfix, newbie WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises
[jira] [Commented] (HDFS-7937) Erasure Coding: INodeFile quota computation unit tests
[ https://issues.apache.org/jira/browse/HDFS-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389867#comment-14389867 ] Hadoop QA commented on HDFS-7937: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708581/HDFS-7937.3.patch against trunk revision 2daa478. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10137//console This message is automatically generated. Erasure Coding: INodeFile quota computation unit tests -- Key: HDFS-7937 URL: https://issues.apache.org/jira/browse/HDFS-7937 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Sasaki Assignee: Kai Sasaki Priority: Minor Attachments: HDFS-7937.1.patch, HDFS-7937.2.patch, HDFS-7937.3.patch Unit test for [HDFS-7826|https://issues.apache.org/jira/browse/HDFS-7826] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6666) Abort NameNode and DataNode startup if security is enabled but block access token is not enabled.
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389948#comment-14389948 ] Arpit Agarwal commented on HDFS-: - Hi [~vijaysbhat], thank you for volunteering to help with this issue and adding a test case. You will need to enable the Maven startKdc profile for running secure NN tests. Secure NN uses ApacheDS but unfortunately the URL is broken. Looks like we'll need to fix the download URL to get startKdc working. Do you want to give it a shot too? {code} $ mvn -q test -PtestKerberos,startKdc -Dtest=TestSecureNameNode [exec] Result: 1 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (kdc) on project hadoop-common: An Ant BuildException has occured: Can't get http://newverhost.com/pub//directory/apacheds/unstable/1.5/1.5.7/apacheds-1.5.7.tar.gz to /Users/aagarwal/src/hdp/hadoop-common-project/hadoop-common/target/test-classes/kdc/downloads/apacheds-1.5.7.tar.gz [ERROR] around Ant part ...get dest=/Users/aagarwal/src/hdp/hadoop-common-project/hadoop-common/target/test-classes/kdc/downloads skipexisting=true verbose=true src=http://newverhost.com/pub//directory/apacheds/unstable/1.5/1.5.7/apacheds-1.5.7.tar.gz/.. {code} Abort NameNode and DataNode startup if security is enabled but block access token is not enabled. - Key: HDFS- URL: https://issues.apache.org/jira/browse/HDFS- Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode, security Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Assignee: Vijay Bhat Priority: Minor Currently, if security is enabled by setting hadoop.security.authentication to kerberos, but HDFS block access tokens are disabled by setting dfs.block.access.token.enable to false (which is the default), then the NameNode logs an error and proceeds, and the DataNode proceeds without even logging an error. This jira proposes that this it's invalid to turn on security but not turn on block access tokens, and that it would be better to fail fast and abort the daemons during startup if this happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8037) WebHDFS: CheckAccess silently accepts certain malformed FsActions
Jake Low created HDFS-8037: -- Summary: WebHDFS: CheckAccess silently accepts certain malformed FsActions Key: HDFS-8037 URL: https://issues.apache.org/jira/browse/HDFS-8037 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jake Low Priority: Minor WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises an exception, as it should. {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x; HTTP/1.1 403 Forbidden Content-Type: application/json { RemoteException: { exception: AccessControlException, javaClassName: org.apache.hadoop.security.AccessControlException, message: Permission denied: user=nobody, access=READ_WRITE, inode=\\/myfile\:root:supergroup:drwxr-xr-x } } {code} But if we instead request {{r-w}} access, the call appears to succeed: {code:none} curl -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w; HTTP/1.1 200 OK Content-Length: 0 {code} As I see it, the fix would be to change the regex pattern in {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8036) Use snapshot path as source when using snapshot diff report in DistCp
Jing Zhao created HDFS-8036: --- Summary: Use snapshot path as source when using snapshot diff report in DistCp Key: HDFS-8036 URL: https://issues.apache.org/jira/browse/HDFS-8036 Project: Hadoop HDFS Issue Type: Bug Components: distcp Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Jing Zhao When using snapshot diff report for distcp (HDFS-7535), the semantic should be apply the diff to the target in order to sync the target with source@snapshot2. Therefore after syncing based on the snapshot diff report, we should append the name of snapshot2 to the original source path and use it as the new source name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7937) Erasure Coding: INodeFile quota computation unit tests
[ https://issues.apache.org/jira/browse/HDFS-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated HDFS-7937: - Attachment: HDFS-7937.3.patch Erasure Coding: INodeFile quota computation unit tests -- Key: HDFS-7937 URL: https://issues.apache.org/jira/browse/HDFS-7937 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Sasaki Assignee: Kai Sasaki Priority: Minor Attachments: HDFS-7937.1.patch, HDFS-7937.2.patch, HDFS-7937.3.patch Unit test for [HDFS-7826|https://issues.apache.org/jira/browse/HDFS-7826] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7937) Erasure Coding: INodeFile quota computation unit tests
[ https://issues.apache.org/jira/browse/HDFS-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated HDFS-7937: - Status: Patch Available (was: Open) Erasure Coding: INodeFile quota computation unit tests -- Key: HDFS-7937 URL: https://issues.apache.org/jira/browse/HDFS-7937 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Sasaki Assignee: Kai Sasaki Priority: Minor Attachments: HDFS-7937.1.patch, HDFS-7937.2.patch, HDFS-7937.3.patch Unit test for [HDFS-7826|https://issues.apache.org/jira/browse/HDFS-7826] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7889) Subclass DFSOutputStream to support writing striping layout files
[ https://issues.apache.org/jira/browse/HDFS-7889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389987#comment-14389987 ] Li Bo commented on HDFS-7889: - Patch 006 removes {{getStreamer()}} and switches streamer in {{DFSStripedOutputStream}} Subclass DFSOutputStream to support writing striping layout files - Key: HDFS-7889 URL: https://issues.apache.org/jira/browse/HDFS-7889 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7889-001.patch, HDFS-7889-002.patch, HDFS-7889-003.patch, HDFS-7889-004.patch, HDFS-7889-005.patch, HDFS-7889-006.patch After HDFS-7888, we can subclass {{DFSOutputStream}} to support writing striping layout files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8035) Move checking replication and get client DN to BM and DM respectively
[ https://issues.apache.org/jira/browse/HDFS-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8035: - Attachment: HDFS-8035.001.patch Move checking replication and get client DN to BM and DM respectively - Key: HDFS-8035 URL: https://issues.apache.org/jira/browse/HDFS-8035 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8035.000.patch, HDFS-8035.001.patch There are functionality in {{FSNameSystem}} to check replication and to get datanode based on the client name. This jira proposes to move these functionality to {{BlockManager}} and {{DatanodeManager}} respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8036) Use snapshot path as source when using snapshot diff report in DistCp
[ https://issues.apache.org/jira/browse/HDFS-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8036: Status: Patch Available (was: Open) Use snapshot path as source when using snapshot diff report in DistCp - Key: HDFS-8036 URL: https://issues.apache.org/jira/browse/HDFS-8036 Project: Hadoop HDFS Issue Type: Bug Components: distcp Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8036.000.patch When using snapshot diff report for distcp (HDFS-7535), the semantic should be apply the diff to the target in order to sync the target with source@snapshot2. Therefore after syncing based on the snapshot diff report, we should append the name of snapshot2 to the original source path and use it as the new source name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8036) Use snapshot path as source when using snapshot diff report in DistCp
[ https://issues.apache.org/jira/browse/HDFS-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389855#comment-14389855 ] Hadoop QA commented on HDFS-8036: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708575/HDFS-8036.000.patch against trunk revision 2daa478. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10136//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10136//console This message is automatically generated. Use snapshot path as source when using snapshot diff report in DistCp - Key: HDFS-8036 URL: https://issues.apache.org/jira/browse/HDFS-8036 Project: Hadoop HDFS Issue Type: Bug Components: distcp Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8036.000.patch When using snapshot diff report for distcp (HDFS-7535), the semantic should be apply the diff to the target in order to sync the target with source@snapshot2. Therefore after syncing based on the snapshot diff report, we should append the name of snapshot2 to the original source path and use it as the new source name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8004) Use KeyProviderCryptoExtension#warmUpEncryptedKeys when creating an encryption zone
[ https://issues.apache.org/jira/browse/HDFS-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8004: -- Fix Version/s: 2.8.0 thanks Arun, don't forget to set the fix version ;) Use KeyProviderCryptoExtension#warmUpEncryptedKeys when creating an encryption zone --- Key: HDFS-8004 URL: https://issues.apache.org/jira/browse/HDFS-8004 Project: Hadoop HDFS Issue Type: Improvement Components: encryption Affects Versions: 2.6.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial Fix For: 2.8.0 Attachments: hdfs-8004.001.patch It'd be slightly better to use the provided warm-up method, even though what we do now (getting and throwing away a key) is functionally the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8011) standby nn can't started
[ https://issues.apache.org/jira/browse/HDFS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388618#comment-14388618 ] Yongjun Zhang commented on HDFS-8011: - Hi [~fujie], I am sure that the file was deleted, if a file was deleted, and it still has an OP_CLOSE in the edit log file, then why If we restart SNN A again, editlog-file-2 could be loaded correctly just like editlog-file-1 in last restart operation. is indeed mysterious, unless OP_CLOSE silently ignores deleted file. Can we dump the edit log with oev tool, and see if the involved file in OP_CLOSE operation that throws NPE was deleted (either it OR its parent has an OP_DELETE) before the OP_CLOSE? What it means by 20,000 operations failed in 500,000 operations? what are the error symptom? As Vinayakumar requested, can we analysze the trace stack of all failures to see if they have the same exception stack? Since you mentioned one problem OP_ADD_BLOCK, it seems that we are adding block to a deleted file? If it's deleted file, I think it's very likely related to delayed block removal, which relates to at the same time datanode will report heartbeat to both active and standby. Thanks. standby nn can't started Key: HDFS-8011 URL: https://issues.apache.org/jira/browse/HDFS-8011 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.3.0 Environment: centeros 6.2 64bit Reporter: fujie We have seen crash when starting the standby namenode, with fatal errors. Any solutions, workarouds, or ideas would be helpful for us. 1. Here is the context: At begining we have 2 namenodes, take A as active and B as standby. For some resons, namenode A was dead, so namenode B is working as active. When we try to restart A after a minute, it can't work. During this time a lot of files were put to HDFS, and a lot of files were renamed. Nodenode A crashed when awaiting reported blocks in safemode each time. 2. We can see error log below: 1)2015-03-30 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/xxx/_temporary/xxx/part-r-00074.bz2, replication=3, mtime=1427699913947, atime=1427699081161, blockSize=268435456, blocks=[blk_2103131025_1100889495739], permissions=dm:dm:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612] java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) 2)2015-03-30 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby N N. java.io.IOException: Failed to apply edit log operation AddBlockOp [path=/xxx/_temporary/xxx/part-m-00121, penultimateBlock=blk_2102331803_1100888911441, lastBlock=blk_2102661068_1100889009168, RpcClientId=, RpcCallId=-2]: error null at
[jira] [Commented] (HDFS-8002) Website refers to /trash directory
[ https://issues.apache.org/jira/browse/HDFS-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388627#comment-14388627 ] Hudson commented on HDFS-8002: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #140 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/140/]) HDFS-8002. Website refers to /trash directory. Contributd by Brahma Reddy Battula. (aajisaka: rev e7ea2a8e8f0a7b428ef10552885757b99b59e4dc) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Website refers to /trash directory -- Key: HDFS-8002 URL: https://issues.apache.org/jira/browse/HDFS-8002 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Mike Drob Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: HDFS-8002.patch, HDFS-8003-002.patch On http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#File_Deletes_and_Undeletes the section on trash refers to files residing in {{/trash}}. I think this is an error, as files actually go to user specific trash directories like {{/user/hdfs/.Trash}} Either the site needs to be updated to mention user specific directories, or if this is a change from previous behaviour then maybe that can be mentioned instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8029) NPE during disk usage calculation on snapshot directory, after a sub folder is deleted
kanaka kumar avvaru created HDFS-8029: - Summary: NPE during disk usage calculation on snapshot directory, after a sub folder is deleted Key: HDFS-8029 URL: https://issues.apache.org/jira/browse/HDFS-8029 Project: Hadoop HDFS Issue Type: Bug Reporter: kanaka kumar avvaru Assignee: kanaka kumar avvaru ContentSummary computation is causing NullPointerException on snapshot directory if some sub directory is deleted. Following are the steps to reproduce the issue. 1. Create a root directory /test 2. Create sub dir named as /test/sub1 3. Create sub dir in sub1 as /test/sub1/sub2 4. Create a file at /test/sub1/file1 5. Create a file at /test/sub1/sub2/file1 6. Enable shotshot on sub1 (hadoop dfsadmin -allowSnapshot test/sub1) 7. Create snapshot1 on /test/sub1 8. Delete directory /test/sub1/sub2 (recursively) 9. Create snapshot2 on /test/sub1 10. Execute du command on /test (hadoop fs -du /test/) Gives NullPointerException in CLI. NameNode logs the exception as ... java.lang.NullPointerExceptionat org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.getBlockStoragePolicySuite(ContentSummaryComputationContext.java:122) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7671) hdfs user guide should point to the common rack awareness doc
[ https://issues.apache.org/jira/browse/HDFS-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388717#comment-14388717 ] Hudson commented on HDFS-7671: -- FAILURE: Integrated in Hadoop-trunk-Commit #7476 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7476/]) HDFS-7671. hdfs user guide should point to the common rack awareness doc. Contributed by Kai Sasaki. (aajisaka: rev 859cab2f2273f563fd70e3e616758edef91ccf41) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsUserGuide.md hdfs user guide should point to the common rack awareness doc - Key: HDFS-7671 URL: https://issues.apache.org/jira/browse/HDFS-7671 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Kai Sasaki Fix For: 2.8.0 Attachments: HDFS-7671.1.patch, HDFS-7671.2.patch, HDFS-7671.3.patch HDFS user guide has a section on rack awareness that should really just be a pointer to the common doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8010) Erasure coding: extend UnderReplicatedBlocks to accurately handle striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388555#comment-14388555 ] Rakesh R commented on HDFS-8010: Minor mistake in my above comments, please read the suggested way as : {code} private boolean veryUnderReplicated(int curReplicas, int expectedReplicas, boolean isStriped) { if (!isStriped) { return (curReplicas * 3) expectedReplicas; } else { return curReplicas = HdfsConstants.NUM_DATA_BLOCKS + 2; } } private boolean highestPrioirty(int curReplicas, boolean isStriped) { if(!isStriped){ return curReplicas == 1; } else { return curReplicas == HdfsConstants.NUM_DATA_BLOCKS; } } {code} Erasure coding: extend UnderReplicatedBlocks to accurately handle striped blocks Key: HDFS-8010 URL: https://issues.apache.org/jira/browse/HDFS-8010 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8010-000.patch This JIRA tracks efforts to accurately assess the _risk level_ of a striped block groups with missing blocks, when added to {{UnderReplicatedBlocks}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6634) inotify in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388642#comment-14388642 ] Benoit Perroud commented on HDFS-6634: -- We started to migrate our code to this implementation. It's just awesome. Thanks a lot [~james.thomas] for the work! I still have a quick question: any reason why the transaction id is not embedded in the Event object? inotify in HDFS --- Key: HDFS-6634 URL: https://issues.apache.org/jira/browse/HDFS-6634 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, qjm Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: HDFS-6634.2.patch, HDFS-6634.3.patch, HDFS-6634.4.patch, HDFS-6634.5.patch, HDFS-6634.6.patch, HDFS-6634.7.patch, HDFS-6634.8.patch, HDFS-6634.9.patch, HDFS-6634.patch, inotify-design.2.pdf, inotify-design.3.pdf, inotify-design.4.pdf, inotify-design.pdf, inotify-intro.2.pdf, inotify-intro.pdf Design a mechanism for applications like search engines to access the HDFS edit stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7748) Separate ECN flags from the Status in the DataTransferPipelineAck
[ https://issues.apache.org/jira/browse/HDFS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388631#comment-14388631 ] Hudson commented on HDFS-7748: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #140 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/140/]) HDFS-7748. Separate ECN flags from the Status in the DataTransferPipelineAck. Contributed by Anu Engineer and Haohui Mai. (wheat9: rev b80457158daf0dc712fbe5695625cc17d70d4bb4) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java Addendum for HDFS-7748. (wheat9: rev 0967b1d99d7001cd1d09ebd29b9360f1079410e8) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java Separate ECN flags from the Status in the DataTransferPipelineAck - Key: HDFS-7748 URL: https://issues.apache.org/jira/browse/HDFS-7748 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Anu Engineer Priority: Blocker Attachments: HDFS-7748.007-addendum.patch, HDFS-7748.007.patch, hdfs-7748.001.patch, hdfs-7748.002.patch, hdfs-7748.003.patch, hdfs-7748.004.patch, hdfs-7748.005.patch, hdfs-7748.006.patch, hdfs-7748.branch-2.7.006.patch Prior to the discussions on HDFS-7270, the old clients might fail to talk to the newer server when ECN is turned on. This jira proposes to separate the ECN flags in a separate protobuf field to make the ack compatible on both versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7944) Minor cleanup of BlockPoolManager#getAllNamenodeThreads
[ https://issues.apache.org/jira/browse/HDFS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388636#comment-14388636 ] Hudson commented on HDFS-7944: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #140 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/140/]) HDFS-7944. Minor cleanup of BlockPoolManager#getAllNamenodeThreads. (Arpit Agarwal) (arp: rev 85dc3c14b2ca4b01a93361bb925c39a22a6fd8db) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockScanner.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBlockReports.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestRefreshNamenodes.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMultipleRegistrations.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestTriggerBlockReport.java Minor cleanup of BlockPoolManager#getAllNamenodeThreads --- Key: HDFS-7944 URL: https://issues.apache.org/jira/browse/HDFS-7944 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Fix For: 2.8.0 Attachments: HDFS-7944.01.patch, HDFS-7944.02.patch {{BlockPoolManager#getAllNamenodeThreads}} can avoid unnecessary list to array conversion and vice versa by returning an unmodifiable list. Since NN addition/removal is relatively rare we can just use a {{CopyOnWriteArrayList}} for concurrency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7941) hsync() not working
[ https://issues.apache.org/jira/browse/HDFS-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sverre Bakke updated HDFS-7941: --- Description: When using SequenceFile.Writer and appending+syncing to file repeatedly, the sync does not appear to work other than: - once after writing headers - when closing. Imagine the following test case: http://pastebin.com/Y9xysCRX This code would append a new record every second and then immediately sync it. One would also imagine that the file would grow for every append, however, this does not happen. After watching the behavior I have noticed that it only syncs the headers at the very beginning (providing a file of 164 bytes) and then never again until its closed. This despite it is asked to hsync() after every append. Looking into the debug logs, this also claims the same behavior (executed the provided code example and grepped for sync): SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 2015-03-17 15:55:14 DEBUG ProtobufRpcEngine:253 - Call: fsync took 11ms This was the only time the code ran fsync throughout the entire execution. This has been tested (with similar result) for the following deployments: - sequencefile with no compression - sequencefile with record compression - sequencefile with block compression - textfile with no compression was: When using SequenceFile.Writer and appending+syncing to file repeatedly, the sync does not appear to work other than: - once after writing headers - when closing. Imagine the following test case: http://pastebin.com/Y9xysCRX This code would append a new record every second and then immediately sync it. One would also imagine that the file would grow for every append, however, this does not happen. After watching the behavior I have noticed that it only syncs the headers at the very beginning (providing a file of 164 bytes) and then never again until its closed. This despite it is asked to hsync() after every append. Looking into the debug logs, this also claims the same behavior (executed the provided code example and grepped for sync): SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 2015-03-17 15:55:14 DEBUG ProtobufRpcEngine:253 - Call: fsync took 11ms This was the only time the code ran fsync throughout the entire execution. hsync() not working --- Key: HDFS-7941 URL: https://issues.apache.org/jira/browse/HDFS-7941 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Environment: HDP 2.2 running on Redhat Reporter: Sverre Bakke When using SequenceFile.Writer and appending+syncing to file repeatedly, the sync does not appear to work other than: - once after writing headers - when closing. Imagine the following test case: http://pastebin.com/Y9xysCRX This code would append a new record every second and then immediately sync it. One would also imagine that the file would grow for every append, however, this does not happen. After watching the behavior I have noticed that it only syncs the headers at the very beginning (providing a file of 164 bytes) and then never again until its closed. This despite it is asked to hsync() after every append. Looking into the debug logs, this also claims the same behavior (executed the provided code example and grepped for sync): SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 2015-03-17 15:55:14 DEBUG ProtobufRpcEngine:253 - Call: fsync took 11ms This was the only time the code ran fsync throughout the entire execution. This has been tested (with similar result) for the following deployments: - sequencefile with no compression - sequencefile with record compression - sequencefile with block compression - textfile with no compression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6945) BlockManager should remove a block from excessReplicateMap and decrement ExcessBlocks metric when the block is removed
[ https://issues.apache.org/jira/browse/HDFS-6945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6945: Attachment: HDFS-6945-005.patch Thanks [~szetszwo] for the comment. Cleaned up the patch. BlockManager should remove a block from excessReplicateMap and decrement ExcessBlocks metric when the block is removed -- Key: HDFS-6945 URL: https://issues.apache.org/jira/browse/HDFS-6945 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Critical Labels: metrics Attachments: HDFS-6945-003.patch, HDFS-6945-004.patch, HDFS-6945-005.patch, HDFS-6945.2.patch, HDFS-6945.patch I'm seeing ExcessBlocks metric increases to more than 300K in some clusters, however, there are no over-replicated blocks (confirmed by fsck). After a further research, I noticed when deleting a block, BlockManager does not remove the block from excessReplicateMap or decrement excessBlocksCount. Usually the metric is decremented when processing block report, however, if the block has been deleted, BlockManager does not remove the block from excessReplicateMap or decrement the metric. That way the metric and excessReplicateMap can increase infinitely (i.e. memory leak can occur). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7786) Handle slow writers for DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7786: Summary: Handle slow writers for DFSStripedOutputStream (was: Handle slow writers for DFSOutputStream when there're multiple data streamers) Handle slow writers for DFSStripedOutputStream -- Key: HDFS-7786 URL: https://issues.apache.org/jira/browse/HDFS-7786 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Fix For: HDFS-7285 There're multiple data streamers in DFSOutputStream if it is used to write a striping layout file. These streamers may have different write speed, and some may write data very slowly. Some streamers may fail and exit. We need to consider these situations and give reliable handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7786) Handle slow writers for DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7786: Description: The streamers in DFSStripedOutputStream may have different write speed. We need to consider and handle the situation when one or more writers begin to write slowly. (was: There're multiple data streamers in DFSOutputStream if it is used to write a striping layout file. These streamers may have different write speed, and some may write data very slowly. Some streamers may fail and exit. We need to consider these situations and give reliable handling. ) Handle slow writers for DFSStripedOutputStream -- Key: HDFS-7786 URL: https://issues.apache.org/jira/browse/HDFS-7786 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Fix For: HDFS-7285 The streamers in DFSStripedOutputStream may have different write speed. We need to consider and handle the situation when one or more writers begin to write slowly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7786) Handle slow writers for DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7786: Description: The streamers in DFSStripedOutputStream may have different write speed. We need to consider and handle the situation when one or more streamers begin to write slowly. (was: The streamers in DFSStripedOutputStream may have different write speed. We need to consider and handle the situation when one or more writers begin to write slowly.) Handle slow writers for DFSStripedOutputStream -- Key: HDFS-7786 URL: https://issues.apache.org/jira/browse/HDFS-7786 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Fix For: HDFS-7285 The streamers in DFSStripedOutputStream may have different write speed. We need to consider and handle the situation when one or more streamers begin to write slowly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7991) Allow users to skip checkpoint when stopping NameNode
[ https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388796#comment-14388796 ] Allen Wittenauer edited comment on HDFS-7991 at 3/31/15 4:31 PM: - bq. (since the stop command only waits 5s) This is easily fixed by just increasing the timeout or adding logic other logic such as asking if the NN is still alive, etc. But in any case, it occurred to me this morning that the current code just flat out won't work in practice. The problem is that HADOOP_OPTS has the NN's configuration inside it. So, for example, if a user sets the heap size to 64g, then dfsadmin is going to run with a 64g heap as well. Same thing with gc logs and any other custom JVM setting. The code absolutely must shell out another bin/hdfs process to get the proper HADOOP_OPTS setting. I suspect it will actually have to use a subshell plus parameter captures so that the environment is clean due to various {{export}} statements throughout the code and in a lot of user's *-env.sh files. was (Author: aw): bq. (since the stop command only waits 5s) This is easily fixed by just increasing the timeout or adding logic other logic such as asking if the NN is still alive, etc. But in any case, it occurred to me this morning that the current code just flat out won't work in practice. The problem is that HADOOP_OPTS has the NN's configuration inside it. So, for example, if a user sets the heap size to 64g, then dfsadmin is going to run with a 64g heap as well. Same thing with gc logs and any other custom JVM setting. The code absolutely must shell out another bin/hdfs process to get the proper HADOOP_OPTS setting. I suspect it will actually have to use a subshell plus captures parameters so that the environment is clean due to various {{export}} statements throughout the code and in a lot of user's *-env.sh files. Allow users to skip checkpoint when stopping NameNode - Key: HDFS-7991 URL: https://issues.apache.org/jira/browse/HDFS-7991 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7991.000.patch, HDFS-7991.001.patch, HDFS-7991.002.patch, HDFS-7991.003.patch This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to check if saving namespace is necessary before stopping namenode. As [~kihwal] pointed out in this [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898], in a secured cluster this new functionality requires the user to be kinit'ed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)