[jira] [Updated] (HDFS-7360) Test libhdfs3 against MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7360: --- Attachment: HDFS-7360.patch > Test libhdfs3 against MiniDFSCluster > > > Key: HDFS-7360 > URL: https://issues.apache.org/jira/browse/HDFS-7360 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Zhanwei Wang >Priority: Critical > Attachments: HDFS-7360.patch > > > Currently the branch has enough code to interact with HDFS servers. We should > test the code against MiniDFSCluster to ensure the correctness of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7589) Break the dependency between libnative_mini_dfs and libhdfs
[ https://issues.apache.org/jira/browse/HDFS-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7589: --- Attachment: HDFS-7589.patch > Break the dependency between libnative_mini_dfs and libhdfs > --- > > Key: HDFS-7589 > URL: https://issues.apache.org/jira/browse/HDFS-7589 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7589.patch > > > Currently libnative_mini_dfs links with libhdfs to reuse some common code. > Other applications which want to use libnative_mini_dfs have to link to > libhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7589) Break the dependency between libnative_mini_dfs and libhdfs
[ https://issues.apache.org/jira/browse/HDFS-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7589: --- Attachment: (was: HDFS-7589.patch) > Break the dependency between libnative_mini_dfs and libhdfs > --- > > Key: HDFS-7589 > URL: https://issues.apache.org/jira/browse/HDFS-7589 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7589.patch > > > Currently libnative_mini_dfs links with libhdfs to reuse some common code. > Other applications which want to use libnative_mini_dfs have to link to > libhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7501) TransactionsSinceLastCheckpoint can be negative on SBNs
[ https://issues.apache.org/jira/browse/HDFS-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267346#comment-14267346 ] Harsh J commented on HDFS-7501: --- [~daryn] - Won't the metric lag at the StandBy even if we were to correct things up (for that metric) during checkpoints? Is a laggy metric OK to display (better than negatives, but still)? > TransactionsSinceLastCheckpoint can be negative on SBNs > --- > > Key: HDFS-7501 > URL: https://issues.apache.org/jira/browse/HDFS-7501 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.0 >Reporter: Harsh J >Assignee: Gautam Gopalakrishnan >Priority: Trivial > Attachments: HDFS-7501-2.patch, HDFS-7501.patch > > > The metric TransactionsSinceLastCheckpoint is derived as FSEditLog.txid minus > NNStorage.mostRecentCheckpointTxId. > In Standby mode, the former does not increment beyond the loaded or > last-when-active value, but the latter does change due to checkpoints done > regularly in this mode. Thereby, the SBN will eventually end up showing > negative values for TransactionsSinceLastCheckpoint. > This is not an issue as the metric only makes sense to be monitored on the > Active NameNode, but we should perhaps just show the value 0 by detecting if > the NN is in SBN form, as allowing a negative number is confusing to view > within a chart that tracks it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7589) Break the dependency between libnative_mini_dfs and libhdfs
[ https://issues.apache.org/jira/browse/HDFS-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang reassigned HDFS-7589: -- Assignee: Zhanwei Wang > Break the dependency between libnative_mini_dfs and libhdfs > --- > > Key: HDFS-7589 > URL: https://issues.apache.org/jira/browse/HDFS-7589 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7589.patch > > > Currently libnative_mini_dfs links with libhdfs to reuse some common code. > Other applications which want to use libnative_mini_dfs have to link to > libhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7589) Break the dependency between libnative_mini_dfs and libhdfs
[ https://issues.apache.org/jira/browse/HDFS-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7589 started by Zhanwei Wang. -- > Break the dependency between libnative_mini_dfs and libhdfs > --- > > Key: HDFS-7589 > URL: https://issues.apache.org/jira/browse/HDFS-7589 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7589.patch > > > Currently libnative_mini_dfs links with libhdfs to reuse some common code. > Other applications which want to use libnative_mini_dfs have to link to > libhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7589) Break the dependency between libnative_mini_dfs and libhdfs
[ https://issues.apache.org/jira/browse/HDFS-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7589: --- Attachment: HDFS-7589.patch > Break the dependency between libnative_mini_dfs and libhdfs > --- > > Key: HDFS-7589 > URL: https://issues.apache.org/jira/browse/HDFS-7589 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang > Attachments: HDFS-7589.patch > > > Currently libnative_mini_dfs links with libhdfs to reuse some common code. > Other applications which want to use libnative_mini_dfs have to link to > libhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7589) Break the dependency between libnative_mini_dfs and libhdfs
Zhanwei Wang created HDFS-7589: -- Summary: Break the dependency between libnative_mini_dfs and libhdfs Key: HDFS-7589 URL: https://issues.apache.org/jira/browse/HDFS-7589 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Currently libnative_mini_dfs links with libhdfs to reuse some common code. Other applications which want to use libnative_mini_dfs have to link to libhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7360) Test libhdfs3 against MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7360 started by Zhanwei Wang. -- > Test libhdfs3 against MiniDFSCluster > > > Key: HDFS-7360 > URL: https://issues.apache.org/jira/browse/HDFS-7360 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Zhanwei Wang >Priority: Critical > > Currently the branch has enough code to interact with HDFS servers. We should > test the code against MiniDFSCluster to ensure the correctness of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7360) Test libhdfs3 against MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang reassigned HDFS-7360: -- Assignee: Zhanwei Wang > Test libhdfs3 against MiniDFSCluster > > > Key: HDFS-7360 > URL: https://issues.apache.org/jira/browse/HDFS-7360 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Zhanwei Wang >Priority: Critical > > Currently the branch has enough code to interact with HDFS servers. We should > test the code against MiniDFSCluster to ensure the correctness of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7565) NFS gateway UID overflow
[ https://issues.apache.org/jira/browse/HDFS-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267248#comment-14267248 ] Yongjun Zhang commented on HDFS-7565: - HI [~harisekhon], I assume you applied your own fix of HDFS-7563, and then see this problem, right? What does your static map file exactly look like? (would you please cat the file and paste it here?) Would you please try "getent passwd hari", "getent passwd 10002", "getent passwd <4B>" (where <4B> is the 4 biilion number you are using) on the node that runs nfs gateway, and share the results here? Thanks. > NFS gateway UID overflow > > > Key: HDFS-7565 > URL: https://issues.apache.org/jira/browse/HDFS-7565 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 > Environment: HDP 2.2 (Apache Hadoop 2.6.0) >Reporter: Hari Sekhon >Assignee: Yongjun Zhang > > It appears that my Windows 7 workstation is passing a UID around 4 billion to > the NFS gateway and the getUserName() method is being passed "-2", so it > looks like the UID is an int and is overflowing: > {code}security.ShellBasedIdMapping > (ShellBasedIdMapping.java:getUserName(358)) - Can't find user name for uid > -2. Use default user name nobody{code} > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations
[ https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267244#comment-14267244 ] Hadoop QA commented on HDFS-5631: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625158/HDFS-5631.patch against trunk revision 788ee35. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9149//console This message is automatically generated. > Expose interfaces required by FsDatasetSpi implementations > -- > > Key: HDFS-5631 > URL: https://issues.apache.org/jira/browse/HDFS-5631 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: David Powell >Assignee: David Powell >Priority: Minor > Attachments: HDFS-5631.patch, HDFS-5631.patch > > > This sub-task addresses section 4.1 of the document attached to HDFS-5194, > the exposure of interfaces needed by a FsDatasetSpi implementation. > Specifically it makes ChunkChecksum public and BlockMetadataHeader's > readHeader() and writeHeader() methods public. > The changes to BlockReaderUtil (and related classes) discussed by section > 4.1 are only needed if supporting short-circuit, and should be addressed > as part of an effort to provide such support rather than this JIRA. > To help ensure these changes are complete and are not regressed in the > future, tests that gauge the accessibility (though *not* behavior) > of interfaces needed by a FsDatasetSpi subclass are also included. > These take the form of a dummy FsDatasetSpi subclass -- a successful > compilation is effectively a pass. Trivial unit tests are included so > that there is something tangible to track. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations
[ https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267241#comment-14267241 ] Tsz Wo Nicholas Sze commented on HDFS-5631: --- Changing ChunkChecksum and BlockMetadataHeader's readHeader() and writeHeader() to public sound good. For the test, how would someone add new methods to FsDatasetSpi or change the existing methods? Are they supposed to update the test at the same time? > Expose interfaces required by FsDatasetSpi implementations > -- > > Key: HDFS-5631 > URL: https://issues.apache.org/jira/browse/HDFS-5631 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: David Powell >Assignee: David Powell >Priority: Minor > Attachments: HDFS-5631.patch, HDFS-5631.patch > > > This sub-task addresses section 4.1 of the document attached to HDFS-5194, > the exposure of interfaces needed by a FsDatasetSpi implementation. > Specifically it makes ChunkChecksum public and BlockMetadataHeader's > readHeader() and writeHeader() methods public. > The changes to BlockReaderUtil (and related classes) discussed by section > 4.1 are only needed if supporting short-circuit, and should be addressed > as part of an effort to provide such support rather than this JIRA. > To help ensure these changes are complete and are not regressed in the > future, tests that gauge the accessibility (though *not* behavior) > of interfaces needed by a FsDatasetSpi subclass are also included. > These take the form of a dummy FsDatasetSpi subclass -- a successful > compilation is effectively a pass. Trivial unit tests are included so > that there is something tangible to track. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7466) Allow different values for dfs.datanode.balance.max.concurrent.moves per datanode
[ https://issues.apache.org/jira/browse/HDFS-7466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267239#comment-14267239 ] Tsz Wo Nicholas Sze commented on HDFS-7466: --- Benoy, thanks for showing the use case. Approach 1 sounds good. When should mover/balancer contacts the datanodes? How about contacting a datanode when dispatching a move, i.e. PendingMove.dispatch()? In that way, the datanode queries are run in parallel and are executed on-demand. > Allow different values for dfs.datanode.balance.max.concurrent.moves per > datanode > - > > Key: HDFS-7466 > URL: https://issues.apache.org/jira/browse/HDFS-7466 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > > It is possible to configure different values for > _dfs.datanode.balance.max.concurrent.moves_ per datanode. But the value will > be used by balancer/mover which obtains the value from its own configuration. > The correct approach will be to obtain the value from the datanode itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation
[ https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7587: -- Component/s: namenode > Edit log corruption can happen if append fails with a quota violation > - > > Key: HDFS-7587 > URL: https://issues.apache.org/jira/browse/HDFS-7587 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Daryn Sharp >Priority: Blocker > > We have seen a standby namenode crashing due to edit log corruption. It was > complaining that {{OP_CLOSE}} cannot be applied because the file is not > under-construction. > When a client was trying to append to the file, the remaining space quota was > very small. This caused a failure in {{prepareFileForWrite()}}, but after the > inode was already converted for writing and a lease added. Since these were > not undone when the quota violation was detected, the file was left in > under-construction with an active lease without edit logging {{OP_ADD}}. > A subsequent {{append()}} eventually caused a lease recovery after the soft > limit period. This resulted in {{commitBlockSynchronization()}}, which closed > the file with {{OP_CLOSE}} being logged. Since there was no corresponding > {{OP_ADD}}, edit replaying could not apply this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation
[ https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267227#comment-14267227 ] Tsz Wo Nicholas Sze commented on HDFS-7587: --- > ... . Daryn Sharp has suggested that the quota check be done before > converting inode/block. ... Sounds good. All the checks (quota, permission, etc.) should the performed before any change to the namespace. > Edit log corruption can happen if append fails with a quota violation > - > > Key: HDFS-7587 > URL: https://issues.apache.org/jira/browse/HDFS-7587 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Daryn Sharp >Priority: Blocker > > We have seen a standby namenode crashing due to edit log corruption. It was > complaining that {{OP_CLOSE}} cannot be applied because the file is not > under-construction. > When a client was trying to append to the file, the remaining space quota was > very small. This caused a failure in {{prepareFileForWrite()}}, but after the > inode was already converted for writing and a lease added. Since these were > not undone when the quota violation was detected, the file was left in > under-construction with an active lease without edit logging {{OP_ADD}}. > A subsequent {{append()}} eventually caused a lease recovery after the soft > limit period. This resulted in {{commitBlockSynchronization()}}, which closed > the file with {{OP_CLOSE}} being logged. Since there was no corresponding > {{OP_ADD}}, edit replaying could not apply this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7467) Provide storage tier information for a directory via fsck
[ https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267222#comment-14267222 ] Tsz Wo Nicholas Sze commented on HDFS-7467: --- > My recommendation is to display the policy along with the combination if > possible. ... It is a good idea. We should also consider a file's specified storage policy and actually storage media. If a file does not satisfies the specified policy, fsck should show such information. E.g. the specified storage policy of file foo is hot but all the replicas are stored in ARCHIVE, then it should not be counted as "frozen". It should be counted as "ARCHIVE:3" in order to indicate that it does not satisfies the specified policy. > Provide storage tier information for a directory via fsck > - > > Key: HDFS-7467 > URL: https://issues.apache.org/jira/browse/HDFS-7467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-7467.patch > > > Currently _fsck_ provides information regarding blocks for a directory. > It should be augmented to provide storage tier information (optionally). > The sample report could be as follows : > {code} > Storage Tier Combination# of blocks % of blocks > DISK:1,ARCHIVE:2 340730 97.7393% > > ARCHIVE:3 39281.1268% > > DISK:2,ARCHIVE:231220.8956% > > DISK:2,ARCHIVE:1 7480.2146% > > DISK:1,ARCHIVE:3 440.0126% > > DISK:3,ARCHIVE:2 300.0086% > > DISK:3,ARCHIVE:1 90.0026% > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7564) NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map
[ https://issues.apache.org/jira/browse/HDFS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267198#comment-14267198 ] Yongjun Zhang commented on HDFS-7564: - Many thanks Brandon! > NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map > > > Key: HDFS-7564 > URL: https://issues.apache.org/jira/browse/HDFS-7564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Yongjun Zhang >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7564.001.patch, HDFS-7564.002.patch, > HDFS-7564.003.patch > > > Add dynamic reload of the NFS gateway UID/GID mappings file /etc/nfs.map > (default for static.id.mapping.file). > It seems that the mappings file is currently only read upon restart of the > NFS gateway which would cause any active clients NFS mount points to hang or > fail. > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7585) TestEnhancedByteBufferAccess hard code the block size
[ https://issues.apache.org/jira/browse/HDFS-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267151#comment-14267151 ] sam liu commented on HDFS-7585: --- Could please help review this patch? Thanks! > TestEnhancedByteBufferAccess hard code the block size > - > > Key: HDFS-7585 > URL: https://issues.apache.org/jira/browse/HDFS-7585 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: sam liu >Assignee: sam liu >Priority: Blocker > Attachments: HDFS-7585.001.patch > > > The test TestEnhancedByteBufferAccess hard code the block size, and it fails > with exceptions on power linux. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7564) NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map
[ https://issues.apache.org/jira/browse/HDFS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267058#comment-14267058 ] Hudson commented on HDFS-7564: -- FAILURE: Integrated in Hadoop-trunk-Commit #6820 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6820/]) HDFS-7564. NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map. Contributed by Yongjun Zhang (brandonli: rev 788ee35e2bf0f3d445e03e6ea9bd02c40c8fdfe3) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedIdMapping.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestShellBasedIdMapping.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map > > > Key: HDFS-7564 > URL: https://issues.apache.org/jira/browse/HDFS-7564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Yongjun Zhang >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7564.001.patch, HDFS-7564.002.patch, > HDFS-7564.003.patch > > > Add dynamic reload of the NFS gateway UID/GID mappings file /etc/nfs.map > (default for static.id.mapping.file). > It seems that the mappings file is currently only read upon restart of the > NFS gateway which would cause any active clients NFS mount points to hang or > fail. > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7564) NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map
[ https://issues.apache.org/jira/browse/HDFS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7564: - Fix Version/s: 2.7.0 > NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map > > > Key: HDFS-7564 > URL: https://issues.apache.org/jira/browse/HDFS-7564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Yongjun Zhang >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7564.001.patch, HDFS-7564.002.patch, > HDFS-7564.003.patch > > > Add dynamic reload of the NFS gateway UID/GID mappings file /etc/nfs.map > (default for static.id.mapping.file). > It seems that the mappings file is currently only read upon restart of the > NFS gateway which would cause any active clients NFS mount points to hang or > fail. > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7564) NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map
[ https://issues.apache.org/jira/browse/HDFS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267052#comment-14267052 ] Brandon Li commented on HDFS-7564: -- I'll committed the patch. Thank you, [~yzhangal], for the contribution! > NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map > > > Key: HDFS-7564 > URL: https://issues.apache.org/jira/browse/HDFS-7564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Yongjun Zhang >Priority: Minor > Attachments: HDFS-7564.001.patch, HDFS-7564.002.patch, > HDFS-7564.003.patch > > > Add dynamic reload of the NFS gateway UID/GID mappings file /etc/nfs.map > (default for static.id.mapping.file). > It seems that the mappings file is currently only read upon restart of the > NFS gateway which would cause any active clients NFS mount points to hang or > fail. > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7564) NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map
[ https://issues.apache.org/jira/browse/HDFS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7564: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map > > > Key: HDFS-7564 > URL: https://issues.apache.org/jira/browse/HDFS-7564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Yongjun Zhang >Priority: Minor > Attachments: HDFS-7564.001.patch, HDFS-7564.002.patch, > HDFS-7564.003.patch > > > Add dynamic reload of the NFS gateway UID/GID mappings file /etc/nfs.map > (default for static.id.mapping.file). > It seems that the mappings file is currently only read upon restart of the > NFS gateway which would cause any active clients NFS mount points to hang or > fail. > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7564) NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map
[ https://issues.apache.org/jira/browse/HDFS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267042#comment-14267042 ] Brandon Li commented on HDFS-7564: -- +1. I will commit the patch soon. > NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map > > > Key: HDFS-7564 > URL: https://issues.apache.org/jira/browse/HDFS-7564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Yongjun Zhang >Priority: Minor > Attachments: HDFS-7564.001.patch, HDFS-7564.002.patch, > HDFS-7564.003.patch > > > Add dynamic reload of the NFS gateway UID/GID mappings file /etc/nfs.map > (default for static.id.mapping.file). > It seems that the mappings file is currently only read upon restart of the > NFS gateway which would cause any active clients NFS mount points to hang or > fail. > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7588) Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs and uploading files
Ravi Prakash created HDFS-7588: -- Summary: Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs and uploading files Key: HDFS-7588 URL: https://issues.apache.org/jira/browse/HDFS-7588 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ravi Prakash The new HTML5 web browser is neat, however it lacks a few features that might make it more useful: 1. chown 2. chmod 3. Uploading files 4. mkdir -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation
[ https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7587: - Assignee: Daryn Sharp > Edit log corruption can happen if append fails with a quota violation > - > > Key: HDFS-7587 > URL: https://issues.apache.org/jira/browse/HDFS-7587 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Daryn Sharp >Priority: Blocker > > We have seen a standby namenode crashing due to edit log corruption. It was > complaining that {{OP_CLOSE}} cannot be applied because the file is not > under-construction. > When a client was trying to append to the file, the remaining space quota was > very small. This caused a failure in {{prepareFileForWrite()}}, but after the > inode was already converted for writing and a lease added. Since these were > not undone when the quota violation was detected, the file was left in > under-construction with an active lease without edit logging {{OP_ADD}}. > A subsequent {{append()}} eventually caused a lease recovery after the soft > limit period. This resulted in {{commitBlockSynchronization()}}, which closed > the file with {{OP_CLOSE}} being logged. Since there was no corresponding > {{OP_ADD}}, edit replaying could not apply this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7467) Provide storage tier information for a directory via fsck
[ https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266871#comment-14266871 ] Benoy Antony edited comment on HDFS-7467 at 1/6/15 10:22 PM: - 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold(DISK:1,ARCHIVE:2) 340730 97.7393% frozen(ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:2 31220.8956% warm(DISK:2,ARCHIVE:1) 7480.2146% DISK:1,ARCHIVE:3440.0126% DISK:3,ARCHIVE:2300.0086% DISK:3,ARCHIVE:1 90.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . was (Author: benoyantony): 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold(DISK:1,ARCHIVE:2) 340730 97.7393% frozen(ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:2 31220.8956% warm(DISK:2,ARCHIVE:1) 7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:190.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . > Provide storage tier information for a directory via fsck > - > > Key: HDFS-7467 > URL: https://issues.apache.org/jira/browse/HDFS-7467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-7467.patch > > > Currently _fsck_ provides information regarding blocks for a directory. > It should be augmented to provide storage tier information (optionally). > The sample report could be as follows : > {code} > Storage Tier Combination# of blocks % of blocks > DISK:1,ARCHIVE:2 340730 97.7393% > > ARCHIVE:3 39281.1268% > > DISK:2,ARCHIVE:231220.8956% > > DISK:2,ARCHIVE:1 7480.2146% > > DISK:1,ARCHIVE:3 440.0126% > > DISK:3,ARCHIVE:2 300.0086% > > DISK:3,ARCHIVE:1 90.0026% > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7467) Provide storage tier information for a directory via fsck
[ https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266871#comment-14266871 ] Benoy Antony edited comment on HDFS-7467 at 1/6/15 10:18 PM: - 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold(DISK:1,ARCHIVE:2) 340730 97.7393% frozen(ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:2 31220.8956% warm(DISK:2,ARCHIVE:1) 7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:190.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . was (Author: benoyantony): 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold(DISK:1,ARCHIVE:2) 340730 97.7393% frozen(ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:2 31220.8956% warm(DISK:2,ARCHIVE:1) 7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:190.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . > Provide storage tier information for a directory via fsck > - > > Key: HDFS-7467 > URL: https://issues.apache.org/jira/browse/HDFS-7467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-7467.patch > > > Currently _fsck_ provides information regarding blocks for a directory. > It should be augmented to provide storage tier information (optionally). > The sample report could be as follows : > {code} > Storage Tier Combination# of blocks % of blocks > DISK:1,ARCHIVE:2 340730 97.7393% > > ARCHIVE:3 39281.1268% > > DISK:2,ARCHIVE:231220.8956% > > DISK:2,ARCHIVE:1 7480.2146% > > DISK:1,ARCHIVE:3 440.0126% > > DISK:3,ARCHIVE:2 300.0086% > > DISK:3,ARCHIVE:1 90.0026% > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7467) Provide storage tier information for a directory via fsck
[ https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266871#comment-14266871 ] Benoy Antony edited comment on HDFS-7467 at 1/6/15 10:17 PM: - 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold(DISK:1,ARCHIVE:2) 340730 97.7393% frozen(ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:2 31220.8956% warm(DISK:2,ARCHIVE:1) 7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:190.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . was (Author: benoyantony): 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold (DISK:1,ARCHIVE:2)340730 97.7393% frozen (ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:2 31220.8956% warm (DISK:2,ARCHIVE:1)7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:190.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . > Provide storage tier information for a directory via fsck > - > > Key: HDFS-7467 > URL: https://issues.apache.org/jira/browse/HDFS-7467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-7467.patch > > > Currently _fsck_ provides information regarding blocks for a directory. > It should be augmented to provide storage tier information (optionally). > The sample report could be as follows : > {code} > Storage Tier Combination# of blocks % of blocks > DISK:1,ARCHIVE:2 340730 97.7393% > > ARCHIVE:3 39281.1268% > > DISK:2,ARCHIVE:231220.8956% > > DISK:2,ARCHIVE:1 7480.2146% > > DISK:1,ARCHIVE:3 440.0126% > > DISK:3,ARCHIVE:2 300.0086% > > DISK:3,ARCHIVE:1 90.0026% > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7467) Provide storage tier information for a directory via fsck
[ https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266871#comment-14266871 ] Benoy Antony edited comment on HDFS-7467 at 1/6/15 10:16 PM: - 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold (DISK:1,ARCHIVE:2)340730 97.7393% frozen (ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:2 31220.8956% warm (DISK:2,ARCHIVE:1)7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:190.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . was (Author: benoyantony): 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold (DISK:1,ARCHIVE:2) 340730 97.7393% frozen (ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:231220.8956% warm (DISK:2,ARCHIVE:1) 7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:1 90.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . > Provide storage tier information for a directory via fsck > - > > Key: HDFS-7467 > URL: https://issues.apache.org/jira/browse/HDFS-7467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-7467.patch > > > Currently _fsck_ provides information regarding blocks for a directory. > It should be augmented to provide storage tier information (optionally). > The sample report could be as follows : > {code} > Storage Tier Combination# of blocks % of blocks > DISK:1,ARCHIVE:2 340730 97.7393% > > ARCHIVE:3 39281.1268% > > DISK:2,ARCHIVE:231220.8956% > > DISK:2,ARCHIVE:1 7480.2146% > > DISK:1,ARCHIVE:3 440.0126% > > DISK:3,ARCHIVE:2 300.0086% > > DISK:3,ARCHIVE:1 90.0026% > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7467) Provide storage tier information for a directory via fsck
[ https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266871#comment-14266871 ] Benoy Antony commented on HDFS-7467: 1. {quote} Are all storage policies in fallback storage equivalent to other storage policies that this output can always be fully described by the percentages that Tsz has suggested? {quote} There is a possibility that some storage tier combination may not belong to a storage policy. My recommendation is to display the policy along with the combination if possible. If not, display the combination. Lowercase for policy name is intentional. {code} Storage Policy # of blocks % of blocks cold (DISK:1,ARCHIVE:2) 340730 97.7393% frozen (ARCHIVE:3) 39281.1268% DISK:2,ARCHIVE:231220.8956% warm (DISK:2,ARCHIVE:1) 7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:1 90.0026% {code} 2. {quote} There should also be some warning messages as well in fsck for all files that are unable to meet the requested ideal for their storage policy and are using fallback storage, perhaps with a switch since that could become overly volumous output. {quote} This is a nice feature. Will look into that . > Provide storage tier information for a directory via fsck > - > > Key: HDFS-7467 > URL: https://issues.apache.org/jira/browse/HDFS-7467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-7467.patch > > > Currently _fsck_ provides information regarding blocks for a directory. > It should be augmented to provide storage tier information (optionally). > The sample report could be as follows : > {code} > Storage Tier Combination# of blocks % of blocks > DISK:1,ARCHIVE:2 340730 97.7393% > > ARCHIVE:3 39281.1268% > > DISK:2,ARCHIVE:231220.8956% > > DISK:2,ARCHIVE:1 7480.2146% > > DISK:1,ARCHIVE:3 440.0126% > > DISK:3,ARCHIVE:2 300.0086% > > DISK:3,ARCHIVE:1 90.0026% > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266816#comment-14266816 ] Andrew Wang commented on HDFS-7337: --- Hey Kai, thanks for getting us started here. I gave this a quick look, had a few comments: * Could you generate normal plaintext diffs rather than a zip? We might also want to reorganize things into existing packages. The rawcoder stuff could go somewhere in hadoop-common for instance. We could move the block grouper classes into blockmanagement. etc. * I see mixed tabs and spaces, we do spaces only in Hadoop. * Since the LRC stuff is still up in the air, could we defer everything related to that to a later JIRA? * In RSBlockGrouper, using ExtendedBlockId is overkill, since the bpid is the same for everything Configuration * The XML file approach seems potentially error-prone. IIUC after a set of parameters are assigned to a schema name, the parameters should never be changed. We thus also need to keep the xml file in sync between the NN, DN, and client. The client part is especially troublesome. Are we planning to put into the editlog/image down the road, like how we do storage policies? * Also, I think we want to separate out the the type of erasure coding from the implementation. The schema definition from the PDF encodes both together, e.g. JerasureRS. While it's not possible to change the RS part, the user might want to swap out Jerasure for ISAL which should be allowed. This is sort of like how we did things for encryption; we define a CipherSuite (i.e. AES-CTR) and then the user can choose among the multiple pluggable implementations for that cipher. BlockGroup: * Zhe told me this is a placeholder class, but a few comments nonetheless. * Can we just set the two fields in the constructor? They should also be final. * Since the schema encodes the layout, does SubBlockGroup need to encode both data and parity? Do we even need SubBlockGroup? Seems like a single array and a schema (a concrete object, which also encodes the RS or LRC parameters) tells you the layout, which is sufficient. This will save some memory. > Configurable and pluggable Erasure Codec and schema > --- > > Key: HDFS-7337 > URL: https://issues.apache.org/jira/browse/HDFS-7337 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Kai Zheng > Attachments: HDFS-7337-prototype-v1.patch, > HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, > PluggableErasureCodec.pdf > > > According to HDFS-7285 and the design, this considers to support multiple > Erasure Codecs via pluggable approach. It allows to define and configure > multiple codec schemas with different coding algorithms and parameters. The > resultant codec schemas can be utilized and specified via command tool for > different file folders. While design and implement such pluggable framework, > it’s also to implement a concrete codec by default (Reed Solomon) to prove > the framework is useful and workable. Separate JIRA could be opened for the > RS codec implementation. > Note HDFS-7353 will focus on the very low level codec API and implementation > to make concrete vendor libraries transparent to the upper layer. This JIRA > focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6292) Display HDFS per user and per group usage on the webUI
[ https://issues.apache.org/jira/browse/HDFS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-6292: --- Attachment: HDFS-6292.01.patch Ok! Here's the skeleton code that has come out of my attempt to add this functionality to the NameNode. DISCLAIMER: This patch is not ready and I'm uploading it only so that you folks can see what I'm thinking so far. I would request feedback on the following (and whatever else you think of): 1. Should HdfsUsageMetricsSource be thread safe? Should I just assume the FSN write lock is always held when calling into here? 2. I understand that we need to plug into a LOT of places to correctly update the stats. I have only plugged into 2-3 places (so obviously the usage will be incorrect if you venture out of those ops: create / delete / chown files+dirs and even these have wrinkles I need to smooth) . I propose we do this all as another sub-task after the framework gets committed. 3. I still need to figure out how best to let this be configurable for any of the HDFS daemons: NameNode/Standby/SecondaryNamenode 4. Enable and disable this feature dynamically. > Display HDFS per user and per group usage on the webUI > -- > > Key: HDFS-6292 > URL: https://issues.apache.org/jira/browse/HDFS-6292 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-6292.01.patch, HDFS-6292.patch, HDFS-6292.png > > > It would be nice to show HDFS usage per user and per group on a web ui. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7466) Allow different values for dfs.datanode.balance.max.concurrent.moves per datanode
[ https://issues.apache.org/jira/browse/HDFS-7466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266805#comment-14266805 ] Benoy Antony commented on HDFS-7466: [~szetszwo], My usecase is as follows: We have DISK tier and ARCHIVE tier. The ARCHIVAL nodes doesn't have yarn containers running on them. The read of the ARCHIVAL data is very less. The major activity that happens on ARCHIVAL nodes is when someone moves the the data between tiers. The DISK nodes have 12 drives whereas ARCHIVAL nodes have 60 drives. We like to keep the dfs.datanode.balance.max.concurrent.moves on ARCHIVAL nodes to around 60. The DISK nodes use the default value of 5. > Allow different values for dfs.datanode.balance.max.concurrent.moves per > datanode > - > > Key: HDFS-7466 > URL: https://issues.apache.org/jira/browse/HDFS-7466 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > > It is possible to configure different values for > _dfs.datanode.balance.max.concurrent.moves_ per datanode. But the value will > be used by balancer/mover which obtains the value from its own configuration. > The correct approach will be to obtain the value from the datanode itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
[ https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266772#comment-14266772 ] Arpit Agarwal commented on HDFS-7572: - Thanks Chris for committing this! > TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows > --- > > Key: HDFS-7572 > URL: https://issues.apache.org/jira/browse/HDFS-7572 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.7.0 > > Attachments: HDFS-7572.001.patch > > > *Error Message* > Expected: is > but: was > *Stacktrace* > java.lang.AssertionError: > Expected: is > but: was > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.junit.Assert.assertThat(Assert.java:865) > at org.junit.Assert.assertThat(Assert.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7579) Improve log reporting during block report rpc failure
[ https://issues.apache.org/jira/browse/HDFS-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266709#comment-14266709 ] Charles Lamb commented on HDFS-7579: The test failure is a timeout and appears to be unrelated. I reran it myself on my local machine with the patch applied and it passed. > Improve log reporting during block report rpc failure > - > > Key: HDFS-7579 > URL: https://issues.apache.org/jira/browse/HDFS-7579 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Labels: supportability > Attachments: HDFS-7579.000.patch > > > During block reporting, if the block report RPC fails, for example because it > exceeded the max rpc len, we should still produce some sort of LOG.info > output to help with debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation
[ https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266694#comment-14266694 ] Kihwal Lee commented on HDFS-7587: -- This is a side-effect of HDFS-6423. [~daryn] has suggested that the quota check be done before converting inode/block. If something goes wrong, undoing the quota update is easier. > Edit log corruption can happen if append fails with a quota violation > - > > Key: HDFS-7587 > URL: https://issues.apache.org/jira/browse/HDFS-7587 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Blocker > > We have seen a standby namenode crashing due to edit log corruption. It was > complaining that {{OP_CLOSE}} cannot be applied because the file is not > under-construction. > When a client was trying to append to the file, the remaining space quota was > very small. This caused a failure in {{prepareFileForWrite()}}, but after the > inode was already converted for writing and a lease added. Since these were > not undone when the quota violation was detected, the file was left in > under-construction with an active lease without edit logging {{OP_ADD}}. > A subsequent {{append()}} eventually caused a lease recovery after the soft > limit period. This resulted in {{commitBlockSynchronization()}}, which closed > the file with {{OP_CLOSE}} being logged. Since there was no corresponding > {{OP_ADD}}, edit replaying could not apply this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation
Kihwal Lee created HDFS-7587: Summary: Edit log corruption can happen if append fails with a quota violation Key: HDFS-7587 URL: https://issues.apache.org/jira/browse/HDFS-7587 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Blocker We have seen a standby namenode crashing due to edit log corruption. It was complaining that {{OP_CLOSE}} cannot be applied because the file is not under-construction. When a client was trying to append to the file, the remaining space quota was very small. This caused a failure in {{prepareFileForWrite()}}, but after the inode was already converted for writing and a lease added. Since these were not undone when the quota violation was detected, the file was left in under-construction with an active lease without edit logging {{OP_ADD}}. A subsequent {{append()}} eventually caused a lease recovery after the soft limit period. This resulted in {{commitBlockSynchronization()}}, which closed the file with {{OP_CLOSE}} being logged. Since there was no corresponding {{OP_ADD}}, edit replaying could not apply this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7579) Improve log reporting during block report rpc failure
[ https://issues.apache.org/jira/browse/HDFS-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266641#comment-14266641 ] Hadoop QA commented on HDFS-7579: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690360/HDFS-7579.000.patch against trunk revision 4cd66f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.qjournal.TestNNWithQJM Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9148//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9148//console This message is automatically generated. > Improve log reporting during block report rpc failure > - > > Key: HDFS-7579 > URL: https://issues.apache.org/jira/browse/HDFS-7579 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Labels: supportability > Attachments: HDFS-7579.000.patch > > > During block reporting, if the block report RPC fails, for example because it > exceeded the max rpc len, we should still produce some sort of LOG.info > output to help with debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) NameNode support for erasure coding block groups
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266566#comment-14266566 ] Andrew Wang commented on HDFS-7339: --- Thanks for working on this Zhe, here are some quick thoughts about the patch: * Could make this into a INode Feature, like how we do ACL and XAttrs. I think we can get rid of isStriped then too. * Need to wire up getAdditionalBlockGroups. previous handling also needs to account for block groups. * LocatedBlockGroup is also missing a bunch of functionality from LocatedBlock, which I think we need. Check around a bit in the client for how it uses LocatedBlock too, we will want comparable functionality for erasure coded and not files. * Would prefer to throw UnsupportedOperationException for stubbed methods, to be very clear * Since BlockGroupManager#chooseNewGroupTargets is called without any locks held, need to make sure it is threadsafe. Worth adding a comment? * What's the interaction between the two SequentialBlockIDGenerator classes? since they don't use the same count, there will be conflicts. * Why do we have both BlockGroupInfo and BlockGroup? If we put BlockInfos rather than Blocks in BlockGroup, wouldn't it fill the need. Could move BlockGroup to blockmanagement package then too. > NameNode support for erasure coding block groups > > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7018) Implement C interface for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266484#comment-14266484 ] Colin Patrick McCabe commented on HDFS-7018: Thanks, this looks a lot better. The CMake changes look good. {code} 64 static inline char *Strdup(const char *str) { 65 if (str == NULL) { 66 return NULL; 67 } 68 69 int len = strlen(str); 70 char *retval = new char[len + 1]; 71 memcpy(retval, str, len + 1); 72 return retval; 73 } {code} Please, let's just use the regular {{strdup}} provided by the system. Then we also don't have to worry about "array delete" versus regular delete either. {{struct hdfsFile_internal}}: we can simplify this a lot. Rather than having setters, just have a constructor that takes an {{InputStream}}, and another one that takes an {{OutputStream}}. We shouldn't need to alter the streams after the {{hdfsFile_internal}} object has been created. Using a "union" here is overly complicated, and not really saving any space. On a 64-bit machine the boolean you need to select which type the union pads the structure out to 16 bytes anyway. Just have a pointer to an input stream, and a pointer to an output stream, and the invariant that one of them is always {{null}}. {code} 166 class DefaultConfig { 167 public: 168 DefaultConfig() : reportError(false) { 169 const char *env = getenv("LIBHDFS3_CONF"); 170 std::string confPath = env ? env : ""; {code} We should be looking at CLASSPATH and searching all those directories for XML files, so that we can be compatible with the existing libhdfs code. Also, Hadoop configurations include multiple files, not just a single file. You can look at how I did it in the HADOOP-10388 branch, which has a working implementation of this. Alternately we could punt this to a follow-on JIRA. {code} 224 struct hdfsBuilder { 225 public: 226 hdfsBuilder(const Config &conf) : conf(conf), port(0) { 227 } 228 229 ~hdfsBuilder() { 230 } 231 232 public: {code} We don't need line 232. It's a bit confusing because I expected the line to say "private" {{PARAMETER_ASSERT}}: this isn't what people usually mean by an {{assert}}. Usually an {{assert}} is something that only takes effect in debug builds, and is used to guard against programmer mistakes. In contrast, this is validating a parameter passed in by the library user. I would prefer not to have this macro at all since I think we ought to actually provide a detailed error message explaining what is wrong. This macro will just fill in something like "invalid parameter" for EINVAL, which is not very informative. I also think it's confusing to have "return" statements in macros... maybe we can do this occasionally, but only for a VERY good reason or in a unit test. {code} 519 } catch (const std::bad_alloc &e) { 520 SetErrorMessage("Out of memory"); 521 errno = ENOMEM; 522 } catch (...) { 523 SetErrorMessage("Unknown Error"); 524 errno = EINTERNAL; 525 } {code} I see this repeated a lot in the code. Why can't we use {{CreateStatusFromException}} to figure out exactly what is wrong, and derive the errno and error message from the Status object? Since we're adopting the Google C\+\+ style, we will eventually remove the throw statements from other parts of the code, and then these "outer catch block" in the C API will be the only catch blocks left, and the only users of {{CreateStatusFromException}}, right? {{hdfs.h}}: it's problematic to add stuff to this file until the other implementations support it. We could get away with returning ENOTSUP from these functions. But I think we need to discuss what some of them do... I'm not familiar with the "get delegation token", "free delegation token" APIs and we need to discuss what they do and if we want them, etc. I think it's best to file a follow-on for that and leave it out for now. > Implement C interface for libhdfs3 > -- > > Key: HDFS-7018 > URL: https://issues.apache.org/jira/browse/HDFS-7018 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7018-pnative.002.patch, > HDFS-7018-pnative.003.patch, HDFS-7018.patch > > > Implement C interface for libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7586) HFTP does not work when namenode bind on wildcard
[ https://issues.apache.org/jira/browse/HDFS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266446#comment-14266446 ] Yongjun Zhang commented on HDFS-7586: - HI [~bperroud], Thanks for reporting the issue. I once ran into the same issue myself, and found out the issue was incorrect setting. As [~daryn] pointed out, " The rpc-address key should be an actual ip/host:port. There is a rpc-bind-host key that should be set to 0.0.0.0 for multihoming". The rpc-bin-host key was introduced by HDFS-5128, which is at least in 2.3.0, if not earlier. IIRC, my testing with 2.3 was successful after fixing the setting. Thanks. > HFTP does not work when namenode bind on wildcard > - > > Key: HDFS-7586 > URL: https://issues.apache.org/jira/browse/HDFS-7586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Benoit Perroud >Priority: Minor > Attachments: HDFS-7586-v0.1.txt > > > When wildcard binding for NameNode RPC is turned on (i.e. > dfs.namenode.rpc-address=0.0.0.0:8020), HFTP download is failing. > Call to http://namenode:50070/data/.. returns the header Location with > parameter nnaddr=0.0.0.0:8020, which is unlikely to ever succeed :) > The idea would be, if wildcard binding is enabled, to get read the IP address > the request is actually connected to from the HttpServletRequest and return > this one. > WDYT? > How to reproduce: > 1. Turn on wildcard binding > {code}dfs.namenode.rpc-address=0.0.0.0:8020{code} > 2. Upload a file > {code}$ echo "123" | hdfs dfs -put - /tmp/randomFile.txt{code} > 3. Validate it's failing > {code} > $ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt > {code} > 4. Get more details via curl > {code} > $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep > "Location:" > Location: > http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=0.0.0.0:8020 > {code} > We can clearly see the 0.0.0.0 returned as the NN ip. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7018) Implement C interface for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266434#comment-14266434 ] Colin Patrick McCabe commented on HDFS-7018: Sorry, this has been on my queue to review for a while, but stuff kept coming up. I'll take a look today. > Implement C interface for libhdfs3 > -- > > Key: HDFS-7018 > URL: https://issues.apache.org/jira/browse/HDFS-7018 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7018-pnative.002.patch, > HDFS-7018-pnative.003.patch, HDFS-7018.patch > > > Implement C interface for libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7586) HFTP does not work when namenode bind on wildcard
[ https://issues.apache.org/jira/browse/HDFS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Perroud updated HDFS-7586: - Affects Version/s: (was: 2.6.0) (was: 2.5.0) > HFTP does not work when namenode bind on wildcard > - > > Key: HDFS-7586 > URL: https://issues.apache.org/jira/browse/HDFS-7586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Benoit Perroud >Priority: Minor > Attachments: HDFS-7586-v0.1.txt > > > When wildcard binding for NameNode RPC is turned on (i.e. > dfs.namenode.rpc-address=0.0.0.0:8020), HFTP download is failing. > Call to http://namenode:50070/data/.. returns the header Location with > parameter nnaddr=0.0.0.0:8020, which is unlikely to ever succeed :) > The idea would be, if wildcard binding is enabled, to get read the IP address > the request is actually connected to from the HttpServletRequest and return > this one. > WDYT? > How to reproduce: > 1. Turn on wildcard binding > {code}dfs.namenode.rpc-address=0.0.0.0:8020{code} > 2. Upload a file > {code}$ echo "123" | hdfs dfs -put - /tmp/randomFile.txt{code} > 3. Validate it's failing > {code} > $ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt > {code} > 4. Get more details via curl > {code} > $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep > "Location:" > Location: > http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=0.0.0.0:8020 > {code} > We can clearly see the 0.0.0.0 returned as the NN ip. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7586) HFTP does not work when namenode bind on wildcard
[ https://issues.apache.org/jira/browse/HDFS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266360#comment-14266360 ] Benoit Perroud commented on HDFS-7586: -- Thanks for the pointer. You're right in >=2.5.0. In <2.5, the way to do was setting 0.0.0.0:8020, which leads to the issue described here. And has the multihoming been tested with HFTP too? > HFTP does not work when namenode bind on wildcard > - > > Key: HDFS-7586 > URL: https://issues.apache.org/jira/browse/HDFS-7586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Benoit Perroud >Priority: Minor > Attachments: HDFS-7586-v0.1.txt > > > When wildcard binding for NameNode RPC is turned on (i.e. > dfs.namenode.rpc-address=0.0.0.0:8020), HFTP download is failing. > Call to http://namenode:50070/data/.. returns the header Location with > parameter nnaddr=0.0.0.0:8020, which is unlikely to ever succeed :) > The idea would be, if wildcard binding is enabled, to get read the IP address > the request is actually connected to from the HttpServletRequest and return > this one. > WDYT? > How to reproduce: > 1. Turn on wildcard binding > {code}dfs.namenode.rpc-address=0.0.0.0:8020{code} > 2. Upload a file > {code}$ echo "123" | hdfs dfs -put - /tmp/randomFile.txt{code} > 3. Validate it's failing > {code} > $ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt > {code} > 4. Get more details via curl > {code} > $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep > "Location:" > Location: > http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=0.0.0.0:8020 > {code} > We can clearly see the 0.0.0.0 returned as the NN ip. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7579) Improve log reporting during block report rpc failure
[ https://issues.apache.org/jira/browse/HDFS-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7579: --- Labels: supportability (was: ) > Improve log reporting during block report rpc failure > - > > Key: HDFS-7579 > URL: https://issues.apache.org/jira/browse/HDFS-7579 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Labels: supportability > Attachments: HDFS-7579.000.patch > > > During block reporting, if the block report RPC fails, for example because it > exceeded the max rpc len, we should still produce some sort of LOG.info > output to help with debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7579) Improve log reporting during block report rpc failure
[ https://issues.apache.org/jira/browse/HDFS-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7579: --- Status: Patch Available (was: Open) > Improve log reporting during block report rpc failure > - > > Key: HDFS-7579 > URL: https://issues.apache.org/jira/browse/HDFS-7579 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Labels: supportability > Attachments: HDFS-7579.000.patch > > > During block reporting, if the block report RPC fails, for example because it > exceeded the max rpc len, we should still produce some sort of LOG.info > output to help with debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7579) Improve log reporting during block report rpc failure
[ https://issues.apache.org/jira/browse/HDFS-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266331#comment-14266331 ] Charles Lamb commented on HDFS-7579: Also, since this is just a remodularization of the LOG.info call, I did not add a unit test. > Improve log reporting during block report rpc failure > - > > Key: HDFS-7579 > URL: https://issues.apache.org/jira/browse/HDFS-7579 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7579.000.patch > > > During block reporting, if the block report RPC fails, for example because it > exceeded the max rpc len, we should still produce some sort of LOG.info > output to help with debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7579) Improve log reporting during block report rpc failure
[ https://issues.apache.org/jira/browse/HDFS-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7579: --- Attachment: HDFS-7579.000.patch The attached patch modifies BPServiceActor so that even if one or more of the block report rpcs fails, a LOG.info message will still be displayed. This will help diagnose cases where the RPC throws an exception. This patch also adds a toString() to ServerCommand so that the LOG.info message displays something reasonable for the commands that it received back rather than just Object.toString(). > Improve log reporting during block report rpc failure > - > > Key: HDFS-7579 > URL: https://issues.apache.org/jira/browse/HDFS-7579 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7579.000.patch > > > During block reporting, if the block report RPC fails, for example because it > exceeded the max rpc len, we should still produce some sort of LOG.info > output to help with debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7586) HFTP does not work when namenode bind on wildcard
[ https://issues.apache.org/jira/browse/HDFS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266315#comment-14266315 ] Daryn Sharp commented on HDFS-7586: --- I think this is due to a misconfig. The rpc-address key should be an actual ip/host:port. There is a rpc-bind-host key that should be set to 0.0.0.0 for multihoming. For more details, see: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html > HFTP does not work when namenode bind on wildcard > - > > Key: HDFS-7586 > URL: https://issues.apache.org/jira/browse/HDFS-7586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0 >Reporter: Benoit Perroud >Priority: Minor > Attachments: HDFS-7586-v0.1.txt > > > When wildcard binding for NameNode RPC is turned on (i.e. > dfs.namenode.rpc-address=0.0.0.0:8020), HFTP download is failing. > Call to http://namenode:50070/data/.. returns the header Location with > parameter nnaddr=0.0.0.0:8020, which is unlikely to ever succeed :) > The idea would be, if wildcard binding is enabled, to get read the IP address > the request is actually connected to from the HttpServletRequest and return > this one. > WDYT? > How to reproduce: > 1. Turn on wildcard binding > {code}dfs.namenode.rpc-address=0.0.0.0:8020{code} > 2. Upload a file > {code}$ echo "123" | hdfs dfs -put - /tmp/randomFile.txt{code} > 3. Validate it's failing > {code} > $ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt > {code} > 4. Get more details via curl > {code} > $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep > "Location:" > Location: > http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=0.0.0.0:8020 > {code} > We can clearly see the 0.0.0.0 returned as the NN ip. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
[ https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266258#comment-14266258 ] Hudson commented on HDFS-7572: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2016 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2016/]) HDFS-7572. TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows. Contributed by Arpit Agarwal. (cnauroth: rev dfd2589bcb0e83f073eab30e32badcf2e9f75a62) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java > TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows > --- > > Key: HDFS-7572 > URL: https://issues.apache.org/jira/browse/HDFS-7572 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.7.0 > > Attachments: HDFS-7572.001.patch > > > *Error Message* > Expected: is > but: was > *Stacktrace* > java.lang.AssertionError: > Expected: is > but: was > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.junit.Assert.assertThat(Assert.java:865) > at org.junit.Assert.assertThat(Assert.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7583) Fix findbug in TransferFsImage.java
[ https://issues.apache.org/jira/browse/HDFS-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266252#comment-14266252 ] Hudson commented on HDFS-7583: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2016 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2016/]) HDFS-7583. Fix findbug in TransferFsImage.java (Contributed by Vinayakumar B) (vinayakumarb: rev 4cd66f7fb280e53e2d398a62e922a8d68d150679) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java > Fix findbug in TransferFsImage.java > --- > > Key: HDFS-7583 > URL: https://issues.apache.org/jira/browse/HDFS-7583 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7583-001.patch, HDFS-7583-002.patch > > > Fix following findbug resulting in recent jenkins runs > {noformat}Exceptional return value of java.io.File.delete() ignored in > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Bug type RV_RETURN_VALUE_IGNORED_BAD_PRACTICE (click for details) > In class org.apache.hadoop.hdfs.server.namenode.TransferFsImage > In method > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Called method java.io.File.delete() > At TransferFsImage.java:[line 577]{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7574) Make cmake work in Windows Visual Studio 2010
[ https://issues.apache.org/jira/browse/HDFS-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266247#comment-14266247 ] Thanh Do commented on HDFS-7574: Hi [~cmccabe]. In Windows, the existing test (in {{CMakeTestCompileStrerror.cpp}}) won't work because {{strerror}} has different signature. Specifically, Windows does not have {{strerror_r(errorno, buf, len)}}. The equivalence is {{strerror_s(buf, len, errorno)}}, with different parameter order. This make the test fails and {{STRERROR_R_RETURN_INT}} is always equal {{NO}}. A cleaner fix may be put a few lines in {{CMakeTestCompileStrerror}}: {code} #ifdef _WIN32 #define strerror_r(errnum, buf, buflen) strerror_s((buf), (buflen), (errnum)) #endif {code} Thoughts? > Make cmake work in Windows Visual Studio 2010 > - > > Key: HDFS-7574 > URL: https://issues.apache.org/jira/browse/HDFS-7574 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Windows Visual Studio 2010 >Reporter: Thanh Do >Assignee: Thanh Do > Attachments: HDFS-7574-branch-HDFS-6994-1.patch > > > Cmake should be able to generate a solution file in Windows Visual Studio > 2010. This is the first step in a series of steps making libhdfs3 built > successfully in Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7583) Fix findbug in TransferFsImage.java
[ https://issues.apache.org/jira/browse/HDFS-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266209#comment-14266209 ] Hudson commented on HDFS-7583: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #66 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/66/]) HDFS-7583. Fix findbug in TransferFsImage.java (Contributed by Vinayakumar B) (vinayakumarb: rev 4cd66f7fb280e53e2d398a62e922a8d68d150679) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Fix findbug in TransferFsImage.java > --- > > Key: HDFS-7583 > URL: https://issues.apache.org/jira/browse/HDFS-7583 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7583-001.patch, HDFS-7583-002.patch > > > Fix following findbug resulting in recent jenkins runs > {noformat}Exceptional return value of java.io.File.delete() ignored in > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Bug type RV_RETURN_VALUE_IGNORED_BAD_PRACTICE (click for details) > In class org.apache.hadoop.hdfs.server.namenode.TransferFsImage > In method > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Called method java.io.File.delete() > At TransferFsImage.java:[line 577]{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
[ https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266215#comment-14266215 ] Hudson commented on HDFS-7572: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #66 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/66/]) HDFS-7572. TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows. Contributed by Arpit Agarwal. (cnauroth: rev dfd2589bcb0e83f073eab30e32badcf2e9f75a62) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows > --- > > Key: HDFS-7572 > URL: https://issues.apache.org/jira/browse/HDFS-7572 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.7.0 > > Attachments: HDFS-7572.001.patch > > > *Error Message* > Expected: is > but: was > *Stacktrace* > java.lang.AssertionError: > Expected: is > but: was > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.junit.Assert.assertThat(Assert.java:865) > at org.junit.Assert.assertThat(Assert.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7583) Fix findbug in TransferFsImage.java
[ https://issues.apache.org/jira/browse/HDFS-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266168#comment-14266168 ] Hudson commented on HDFS-7583: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #62 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/62/]) HDFS-7583. Fix findbug in TransferFsImage.java (Contributed by Vinayakumar B) (vinayakumarb: rev 4cd66f7fb280e53e2d398a62e922a8d68d150679) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Fix findbug in TransferFsImage.java > --- > > Key: HDFS-7583 > URL: https://issues.apache.org/jira/browse/HDFS-7583 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7583-001.patch, HDFS-7583-002.patch > > > Fix following findbug resulting in recent jenkins runs > {noformat}Exceptional return value of java.io.File.delete() ignored in > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Bug type RV_RETURN_VALUE_IGNORED_BAD_PRACTICE (click for details) > In class org.apache.hadoop.hdfs.server.namenode.TransferFsImage > In method > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Called method java.io.File.delete() > At TransferFsImage.java:[line 577]{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
[ https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266174#comment-14266174 ] Hudson commented on HDFS-7572: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #62 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/62/]) HDFS-7572. TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows. Contributed by Arpit Agarwal. (cnauroth: rev dfd2589bcb0e83f073eab30e32badcf2e9f75a62) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java > TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows > --- > > Key: HDFS-7572 > URL: https://issues.apache.org/jira/browse/HDFS-7572 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.7.0 > > Attachments: HDFS-7572.001.patch > > > *Error Message* > Expected: is > but: was > *Stacktrace* > java.lang.AssertionError: > Expected: is > but: was > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.junit.Assert.assertThat(Assert.java:865) > at org.junit.Assert.assertThat(Assert.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7501) TransactionsSinceLastCheckpoint can be negative on SBNs
[ https://issues.apache.org/jira/browse/HDFS-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266158#comment-14266158 ] Daryn Sharp commented on HDFS-7501: --- I don't agree with returning a hardcoded 0 on the standby. I'd like to see the correct metric returned on both active and standby. > TransactionsSinceLastCheckpoint can be negative on SBNs > --- > > Key: HDFS-7501 > URL: https://issues.apache.org/jira/browse/HDFS-7501 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.0 >Reporter: Harsh J >Assignee: Gautam Gopalakrishnan >Priority: Trivial > Attachments: HDFS-7501-2.patch, HDFS-7501.patch > > > The metric TransactionsSinceLastCheckpoint is derived as FSEditLog.txid minus > NNStorage.mostRecentCheckpointTxId. > In Standby mode, the former does not increment beyond the loaded or > last-when-active value, but the latter does change due to checkpoints done > regularly in this mode. Thereby, the SBN will eventually end up showing > negative values for TransactionsSinceLastCheckpoint. > This is not an issue as the metric only makes sense to be monitored on the > Active NameNode, but we should perhaps just show the value 0 by detecting if > the NN is in SBN form, as allowing a negative number is confusing to view > within a chart that tracks it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7583) Fix findbug in TransferFsImage.java
[ https://issues.apache.org/jira/browse/HDFS-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266145#comment-14266145 ] Hudson commented on HDFS-7583: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1997 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1997/]) HDFS-7583. Fix findbug in TransferFsImage.java (Contributed by Vinayakumar B) (vinayakumarb: rev 4cd66f7fb280e53e2d398a62e922a8d68d150679) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Fix findbug in TransferFsImage.java > --- > > Key: HDFS-7583 > URL: https://issues.apache.org/jira/browse/HDFS-7583 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7583-001.patch, HDFS-7583-002.patch > > > Fix following findbug resulting in recent jenkins runs > {noformat}Exceptional return value of java.io.File.delete() ignored in > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Bug type RV_RETURN_VALUE_IGNORED_BAD_PRACTICE (click for details) > In class org.apache.hadoop.hdfs.server.namenode.TransferFsImage > In method > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Called method java.io.File.delete() > At TransferFsImage.java:[line 577]{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
[ https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266151#comment-14266151 ] Hudson commented on HDFS-7572: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1997 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1997/]) HDFS-7572. TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows. Contributed by Arpit Agarwal. (cnauroth: rev dfd2589bcb0e83f073eab30e32badcf2e9f75a62) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows > --- > > Key: HDFS-7572 > URL: https://issues.apache.org/jira/browse/HDFS-7572 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.7.0 > > Attachments: HDFS-7572.001.patch > > > *Error Message* > Expected: is > but: was > *Stacktrace* > java.lang.AssertionError: > Expected: is > but: was > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.junit.Assert.assertThat(Assert.java:865) > at org.junit.Assert.assertThat(Assert.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7586) HFTP does not work when namenode bind on wildcard
[ https://issues.apache.org/jira/browse/HDFS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Perroud updated HDFS-7586: - Attachment: HDFS-7586-v0.1.txt Draft patch. The idea is to read from HttpServletRequest when namenode url is 0.0.0.0. As the MiniDFSCluster is hardcoded to bind to 127.0.0.1, it's not completely trivial to test. > HFTP does not work when namenode bind on wildcard > - > > Key: HDFS-7586 > URL: https://issues.apache.org/jira/browse/HDFS-7586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0 >Reporter: Benoit Perroud >Priority: Minor > Attachments: HDFS-7586-v0.1.txt > > > When wildcard binding for NameNode RPC is turned on (i.e. > dfs.namenode.rpc-address=0.0.0.0:8020), HFTP download is failing. > Call to http://namenode:50070/data/.. returns the header Location with > parameter nnaddr=0.0.0.0:8020, which is unlikely to ever succeed :) > The idea would be, if wildcard binding is enabled, to get read the IP address > the request is actually connected to from the HttpServletRequest and return > this one. > WDYT? > How to reproduce: > 1. Turn on wildcard binding > {code}dfs.namenode.rpc-address=0.0.0.0:8020{code} > 2. Upload a file > {code}$ echo "123" | hdfs dfs -put - /tmp/randomFile.txt{code} > 3. Validate it's failing > {code} > $ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt > {code} > 4. Get more details via curl > {code} > $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep > "Location:" > Location: > http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=0.0.0.0:8020 > {code} > We can clearly see the 0.0.0.0 returned as the NN ip. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7586) HFTP does not work when namenode bind on wildcard
Benoit Perroud created HDFS-7586: Summary: HFTP does not work when namenode bind on wildcard Key: HDFS-7586 URL: https://issues.apache.org/jira/browse/HDFS-7586 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0, 2.5.0, 2.4.0, 2.3.0, 2.2.0 Reporter: Benoit Perroud Priority: Minor When wildcard binding for NameNode RPC is turned on (i.e. dfs.namenode.rpc-address=0.0.0.0:8020), HFTP download is failing. Call to http://namenode:50070/data/.. returns the header Location with parameter nnaddr=0.0.0.0:8020, which is unlikely to ever succeed :) The idea would be, if wildcard binding is enabled, to get read the IP address the request is actually connected to from the HttpServletRequest and return this one. WDYT? How to reproduce: 1. Turn on wildcard binding {code}dfs.namenode.rpc-address=0.0.0.0:8020{code} 2. Upload a file {code}$ echo "123" | hdfs dfs -put - /tmp/randomFile.txt{code} 3. Validate it's failing {code} $ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt {code} 4. Get more details via curl {code} $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep "Location:" Location: http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=0.0.0.0:8020 {code} We can clearly see the 0.0.0.0 returned as the NN ip. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7480) Namenodes loops on 'block does not belong to any file' after deleting many files
[ https://issues.apache.org/jira/browse/HDFS-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266045#comment-14266045 ] Frode Halvorsen commented on HDFS-7480: --- 2.6.1 is not out yet, but one thought; This fix might resolve the issue when namenodes are started with a lot of incoming information about 'loose' data-blokcs, but it probably won't resolve the issue that causes the namenodes to be killed by zookeeper when I delete a lot of files. Athe the delete-moment, I don't think that the logging is that problematic. The logging-issue, I believe, is secondary. I believe that the active namenode gets busy calculating/distributing delete-orders to datanodes when I drop 500.000 files at once, and that this is the causer fo the zookeeper-shutdown. When the namenode gets overloaded with caclulating/distributing those delete-orders, it doesn't keep up with responses to zoo-keeper, which the kills the namenode in order to failover to NN2. > Namenodes loops on 'block does not belong to any file' after deleting many > files > > > Key: HDFS-7480 > URL: https://issues.apache.org/jira/browse/HDFS-7480 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 > Environment: CentOS - HDFS-HA (journal), zookeeper >Reporter: Frode Halvorsen > > A small cluster has 8 servers with 32 G RAM. > Two is namenodes (HA-configured), six is Datanodes (8x3 TB disks configured > with RAID as one 21 TB drive). > The cluster recieves avg 400.000 small files each day. I started archiving > (HAR) each day as separate archives. After deleting the orinigal files for > one month, the namenodes stared acting up really bad. > When restaring those, both active and passive nodes seems to work OK for some > time, but then starts to report a lot of blocks belonging to no files, and > the name-node just spins those messages in a massive loop. If the passive > node is first, it also influences the active node in susch a way that it's no > longer possible to archive new files. If the active node also starts in this > loop, it suddenly dies without any error-message. > The only way I'm able to get rid of the problem, is to start decommission > nodes, watching the cluster closely to avoid downtime, and make sure every > datanode gets a 'clean' start. After all datanodes has been decommisioned (in > turns), and restarted with clean disks, the problem is gone. But if I then > delete a lot of files in a short time, the problem starts again... > The main problem (I think), is that the recieving and reporting of those > blocks takes so many resources, that the namenodes is too busy to tell the > datanodes to delete those blocks.. > If the active name-node starts on the loop, it does the 'right' thing by > telling the datanode to invalidate the block, But the amount of blocks is so > massive, that the namenode doesn't do anything else. Just now, I have about > 1200-1400 log-entries pr second in the passive node. > update : > Just got the active namenode in the loop - it logs 1000 lines pr second. > 500 'BlockStateChange: BLOCK* processReport: blk_1080796332_7056241 on > x.x.x.x:50010 size 1742 does not belong to any file' > and > 500 ' BlockStateChange: BLOCK* InvalidateBlocks: add blk_1080796332_7056241 > to x.x.x.x:50010' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7480) Namenodes loops on 'block does not belong to any file' after deleting many files
[ https://issues.apache.org/jira/browse/HDFS-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266042#comment-14266042 ] Frode Halvorsen commented on HDFS-7480: --- 2.6.1 is not out yet, but one thought; This fix might resolve the issue when namenodes are started with a lot of incoming information about 'loose' data-blokcs, but it probably won't resolve the issue that causes the namenodes to be killed by zookeeper when I delete a lot of files. Athe the delete-moment, I don't think that the logging is that problematic. The logging-issue, I believe, is secondary. I believe that the active namenode gets busy calculating/distributing delete-orders to datanodes when I drop 500.000 files at once, and that this is the causer fo the zookeeper-shutdown. When the namenode gets overloaded with caclulating/distributing those delete-orders, it doesn't keep up with responses to zoo-keeper, which the kills the namenode in order to failover to NN2. > Namenodes loops on 'block does not belong to any file' after deleting many > files > > > Key: HDFS-7480 > URL: https://issues.apache.org/jira/browse/HDFS-7480 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 > Environment: CentOS - HDFS-HA (journal), zookeeper >Reporter: Frode Halvorsen > > A small cluster has 8 servers with 32 G RAM. > Two is namenodes (HA-configured), six is Datanodes (8x3 TB disks configured > with RAID as one 21 TB drive). > The cluster recieves avg 400.000 small files each day. I started archiving > (HAR) each day as separate archives. After deleting the orinigal files for > one month, the namenodes stared acting up really bad. > When restaring those, both active and passive nodes seems to work OK for some > time, but then starts to report a lot of blocks belonging to no files, and > the name-node just spins those messages in a massive loop. If the passive > node is first, it also influences the active node in susch a way that it's no > longer possible to archive new files. If the active node also starts in this > loop, it suddenly dies without any error-message. > The only way I'm able to get rid of the problem, is to start decommission > nodes, watching the cluster closely to avoid downtime, and make sure every > datanode gets a 'clean' start. After all datanodes has been decommisioned (in > turns), and restarted with clean disks, the problem is gone. But if I then > delete a lot of files in a short time, the problem starts again... > The main problem (I think), is that the recieving and reporting of those > blocks takes so many resources, that the namenodes is too busy to tell the > datanodes to delete those blocks.. > If the active name-node starts on the loop, it does the 'right' thing by > telling the datanode to invalidate the block, But the amount of blocks is so > massive, that the namenode doesn't do anything else. Just now, I have about > 1200-1400 log-entries pr second in the passive node. > update : > Just got the active namenode in the loop - it logs 1000 lines pr second. > 500 'BlockStateChange: BLOCK* processReport: blk_1080796332_7056241 on > x.x.x.x:50010 size 1742 does not belong to any file' > and > 500 ' BlockStateChange: BLOCK* InvalidateBlocks: add blk_1080796332_7056241 > to x.x.x.x:50010' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7480) Namenodes loops on 'block does not belong to any file' after deleting many files
[ https://issues.apache.org/jira/browse/HDFS-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266046#comment-14266046 ] Frode Halvorsen commented on HDFS-7480: --- 2.6.1 is not out yet, but one thought; This fix might resolve the issue when namenodes are started with a lot of incoming information about 'loose' data-blokcs, but it probably won't resolve the issue that causes the namenodes to be killed by zookeeper when I delete a lot of files. Athe the delete-moment, I don't think that the logging is that problematic. The logging-issue, I believe, is secondary. I believe that the active namenode gets busy calculating/distributing delete-orders to datanodes when I drop 500.000 files at once, and that this is the causer fo the zookeeper-shutdown. When the namenode gets overloaded with caclulating/distributing those delete-orders, it doesn't keep up with responses to zoo-keeper, which the kills the namenode in order to failover to NN2. > Namenodes loops on 'block does not belong to any file' after deleting many > files > > > Key: HDFS-7480 > URL: https://issues.apache.org/jira/browse/HDFS-7480 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 > Environment: CentOS - HDFS-HA (journal), zookeeper >Reporter: Frode Halvorsen > > A small cluster has 8 servers with 32 G RAM. > Two is namenodes (HA-configured), six is Datanodes (8x3 TB disks configured > with RAID as one 21 TB drive). > The cluster recieves avg 400.000 small files each day. I started archiving > (HAR) each day as separate archives. After deleting the orinigal files for > one month, the namenodes stared acting up really bad. > When restaring those, both active and passive nodes seems to work OK for some > time, but then starts to report a lot of blocks belonging to no files, and > the name-node just spins those messages in a massive loop. If the passive > node is first, it also influences the active node in susch a way that it's no > longer possible to archive new files. If the active node also starts in this > loop, it suddenly dies without any error-message. > The only way I'm able to get rid of the problem, is to start decommission > nodes, watching the cluster closely to avoid downtime, and make sure every > datanode gets a 'clean' start. After all datanodes has been decommisioned (in > turns), and restarted with clean disks, the problem is gone. But if I then > delete a lot of files in a short time, the problem starts again... > The main problem (I think), is that the recieving and reporting of those > blocks takes so many resources, that the namenodes is too busy to tell the > datanodes to delete those blocks.. > If the active name-node starts on the loop, it does the 'right' thing by > telling the datanode to invalidate the block, But the amount of blocks is so > massive, that the namenode doesn't do anything else. Just now, I have about > 1200-1400 log-entries pr second in the passive node. > update : > Just got the active namenode in the loop - it logs 1000 lines pr second. > 500 'BlockStateChange: BLOCK* processReport: blk_1080796332_7056241 on > x.x.x.x:50010 size 1742 does not belong to any file' > and > 500 ' BlockStateChange: BLOCK* InvalidateBlocks: add blk_1080796332_7056241 > to x.x.x.x:50010' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7480) Namenodes loops on 'block does not belong to any file' after deleting many files
[ https://issues.apache.org/jira/browse/HDFS-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266044#comment-14266044 ] Frode Halvorsen commented on HDFS-7480: --- 2.6.1 is not out yet, but one thought; This fix might resolve the issue when namenodes are started with a lot of incoming information about 'loose' data-blokcs, but it probably won't resolve the issue that causes the namenodes to be killed by zookeeper when I delete a lot of files. Athe the delete-moment, I don't think that the logging is that problematic. The logging-issue, I believe, is secondary. I believe that the active namenode gets busy calculating/distributing delete-orders to datanodes when I drop 500.000 files at once, and that this is the causer fo the zookeeper-shutdown. When the namenode gets overloaded with caclulating/distributing those delete-orders, it doesn't keep up with responses to zoo-keeper, which the kills the namenode in order to failover to NN2. > Namenodes loops on 'block does not belong to any file' after deleting many > files > > > Key: HDFS-7480 > URL: https://issues.apache.org/jira/browse/HDFS-7480 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 > Environment: CentOS - HDFS-HA (journal), zookeeper >Reporter: Frode Halvorsen > > A small cluster has 8 servers with 32 G RAM. > Two is namenodes (HA-configured), six is Datanodes (8x3 TB disks configured > with RAID as one 21 TB drive). > The cluster recieves avg 400.000 small files each day. I started archiving > (HAR) each day as separate archives. After deleting the orinigal files for > one month, the namenodes stared acting up really bad. > When restaring those, both active and passive nodes seems to work OK for some > time, but then starts to report a lot of blocks belonging to no files, and > the name-node just spins those messages in a massive loop. If the passive > node is first, it also influences the active node in susch a way that it's no > longer possible to archive new files. If the active node also starts in this > loop, it suddenly dies without any error-message. > The only way I'm able to get rid of the problem, is to start decommission > nodes, watching the cluster closely to avoid downtime, and make sure every > datanode gets a 'clean' start. After all datanodes has been decommisioned (in > turns), and restarted with clean disks, the problem is gone. But if I then > delete a lot of files in a short time, the problem starts again... > The main problem (I think), is that the recieving and reporting of those > blocks takes so many resources, that the namenodes is too busy to tell the > datanodes to delete those blocks.. > If the active name-node starts on the loop, it does the 'right' thing by > telling the datanode to invalidate the block, But the amount of blocks is so > massive, that the namenode doesn't do anything else. Just now, I have about > 1200-1400 log-entries pr second in the passive node. > update : > Just got the active namenode in the loop - it logs 1000 lines pr second. > 500 'BlockStateChange: BLOCK* processReport: blk_1080796332_7056241 on > x.x.x.x:50010 size 1742 does not belong to any file' > and > 500 ' BlockStateChange: BLOCK* InvalidateBlocks: add blk_1080796332_7056241 > to x.x.x.x:50010' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7480) Namenodes loops on 'block does not belong to any file' after deleting many files
[ https://issues.apache.org/jira/browse/HDFS-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266043#comment-14266043 ] Frode Halvorsen commented on HDFS-7480: --- 2.6.1 is not out yet, but one thought; This fix might resolve the issue when namenodes are started with a lot of incoming information about 'loose' data-blokcs, but it probably won't resolve the issue that causes the namenodes to be killed by zookeeper when I delete a lot of files. Athe the delete-moment, I don't think that the logging is that problematic. The logging-issue, I believe, is secondary. I believe that the active namenode gets busy calculating/distributing delete-orders to datanodes when I drop 500.000 files at once, and that this is the causer fo the zookeeper-shutdown. When the namenode gets overloaded with caclulating/distributing those delete-orders, it doesn't keep up with responses to zoo-keeper, which the kills the namenode in order to failover to NN2. > Namenodes loops on 'block does not belong to any file' after deleting many > files > > > Key: HDFS-7480 > URL: https://issues.apache.org/jira/browse/HDFS-7480 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 > Environment: CentOS - HDFS-HA (journal), zookeeper >Reporter: Frode Halvorsen > > A small cluster has 8 servers with 32 G RAM. > Two is namenodes (HA-configured), six is Datanodes (8x3 TB disks configured > with RAID as one 21 TB drive). > The cluster recieves avg 400.000 small files each day. I started archiving > (HAR) each day as separate archives. After deleting the orinigal files for > one month, the namenodes stared acting up really bad. > When restaring those, both active and passive nodes seems to work OK for some > time, but then starts to report a lot of blocks belonging to no files, and > the name-node just spins those messages in a massive loop. If the passive > node is first, it also influences the active node in susch a way that it's no > longer possible to archive new files. If the active node also starts in this > loop, it suddenly dies without any error-message. > The only way I'm able to get rid of the problem, is to start decommission > nodes, watching the cluster closely to avoid downtime, and make sure every > datanode gets a 'clean' start. After all datanodes has been decommisioned (in > turns), and restarted with clean disks, the problem is gone. But if I then > delete a lot of files in a short time, the problem starts again... > The main problem (I think), is that the recieving and reporting of those > blocks takes so many resources, that the namenodes is too busy to tell the > datanodes to delete those blocks.. > If the active name-node starts on the loop, it does the 'right' thing by > telling the datanode to invalidate the block, But the amount of blocks is so > massive, that the namenode doesn't do anything else. Just now, I have about > 1200-1400 log-entries pr second in the passive node. > update : > Just got the active namenode in the loop - it logs 1000 lines pr second. > 500 'BlockStateChange: BLOCK* processReport: blk_1080796332_7056241 on > x.x.x.x:50010 size 1742 does not belong to any file' > and > 500 ' BlockStateChange: BLOCK* InvalidateBlocks: add blk_1080796332_7056241 > to x.x.x.x:50010' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
[ https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266000#comment-14266000 ] Hudson commented on HDFS-7572: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #65 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/65/]) HDFS-7572. TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows. Contributed by Arpit Agarwal. (cnauroth: rev dfd2589bcb0e83f073eab30e32badcf2e9f75a62) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows > --- > > Key: HDFS-7572 > URL: https://issues.apache.org/jira/browse/HDFS-7572 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.7.0 > > Attachments: HDFS-7572.001.patch > > > *Error Message* > Expected: is > but: was > *Stacktrace* > java.lang.AssertionError: > Expected: is > but: was > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.junit.Assert.assertThat(Assert.java:865) > at org.junit.Assert.assertThat(Assert.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7583) Fix findbug in TransferFsImage.java
[ https://issues.apache.org/jira/browse/HDFS-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265994#comment-14265994 ] Hudson commented on HDFS-7583: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #65 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/65/]) HDFS-7583. Fix findbug in TransferFsImage.java (Contributed by Vinayakumar B) (vinayakumarb: rev 4cd66f7fb280e53e2d398a62e922a8d68d150679) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java > Fix findbug in TransferFsImage.java > --- > > Key: HDFS-7583 > URL: https://issues.apache.org/jira/browse/HDFS-7583 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7583-001.patch, HDFS-7583-002.patch > > > Fix following findbug resulting in recent jenkins runs > {noformat}Exceptional return value of java.io.File.delete() ignored in > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Bug type RV_RETURN_VALUE_IGNORED_BAD_PRACTICE (click for details) > In class org.apache.hadoop.hdfs.server.namenode.TransferFsImage > In method > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Called method java.io.File.delete() > At TransferFsImage.java:[line 577]{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
[ https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265985#comment-14265985 ] Hudson commented on HDFS-7572: -- FAILURE: Integrated in Hadoop-Yarn-trunk #799 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/799/]) HDFS-7572. TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows. Contributed by Arpit Agarwal. (cnauroth: rev dfd2589bcb0e83f073eab30e32badcf2e9f75a62) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistFiles.java > TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows > --- > > Key: HDFS-7572 > URL: https://issues.apache.org/jira/browse/HDFS-7572 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.7.0 > > Attachments: HDFS-7572.001.patch > > > *Error Message* > Expected: is > but: was > *Stacktrace* > java.lang.AssertionError: > Expected: is > but: was > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.junit.Assert.assertThat(Assert.java:865) > at org.junit.Assert.assertThat(Assert.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7583) Fix findbug in TransferFsImage.java
[ https://issues.apache.org/jira/browse/HDFS-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265979#comment-14265979 ] Hudson commented on HDFS-7583: -- FAILURE: Integrated in Hadoop-Yarn-trunk #799 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/799/]) HDFS-7583. Fix findbug in TransferFsImage.java (Contributed by Vinayakumar B) (vinayakumarb: rev 4cd66f7fb280e53e2d398a62e922a8d68d150679) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java > Fix findbug in TransferFsImage.java > --- > > Key: HDFS-7583 > URL: https://issues.apache.org/jira/browse/HDFS-7583 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-7583-001.patch, HDFS-7583-002.patch > > > Fix following findbug resulting in recent jenkins runs > {noformat}Exceptional return value of java.io.File.delete() ignored in > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Bug type RV_RETURN_VALUE_IGNORED_BAD_PRACTICE (click for details) > In class org.apache.hadoop.hdfs.server.namenode.TransferFsImage > In method > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.deleteTmpFiles(List) > Called method java.io.File.delete() > At TransferFsImage.java:[line 577]{noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7585) TestEnhancedByteBufferAccess hard code the block size
[ https://issues.apache.org/jira/browse/HDFS-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265963#comment-14265963 ] Hadoop QA commented on HDFS-7585: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690280/HDFS-7585.001.patch against trunk revision 4cd66f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9147//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9147//console This message is automatically generated. > TestEnhancedByteBufferAccess hard code the block size > - > > Key: HDFS-7585 > URL: https://issues.apache.org/jira/browse/HDFS-7585 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: sam liu >Assignee: sam liu >Priority: Blocker > Attachments: HDFS-7585.001.patch > > > The test TestEnhancedByteBufferAccess hard code the block size, and it fails > with exceptions on power linux. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832
[ https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265903#comment-14265903 ] Lars Francke commented on HDFS-7575: I don't object at all, quite the opposite. Thanks for taking care of this. > NameNode not handling heartbeats properly after HDFS-2832 > - > > Key: HDFS-7575 > URL: https://issues.apache.org/jira/browse/HDFS-7575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0, 2.5.0, 2.6.0 >Reporter: Lars Francke >Assignee: Arpit Agarwal >Priority: Critical > > Before HDFS-2832 each DataNode would have a unique storageId which included > its IP address. Since HDFS-2832 the DataNodes have a unique storageId per > storage directory which is just a random UUID. > They send reports per storage directory in their heartbeats. This heartbeat > is processed on the NameNode in the > {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would > just store the information per Datanode. After the patch though each DataNode > can have multiple different storages so it's stored in a map keyed by the > storage Id. > This works fine for all clusters that have been installed post HDFS-2832 as > they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 > different keys. On each Heartbeat the Map is searched and updated > ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}): > {code:title=DatanodeStorageInfo} > void updateState(StorageReport r) { > capacity = r.getCapacity(); > dfsUsed = r.getDfsUsed(); > remaining = r.getRemaining(); > blockPoolUsed = r.getBlockPoolUsed(); > } > {code} > On clusters that were upgraded from a pre HDFS-2832 version though the > storage Id has not been rewritten (at least not on the four clusters I > checked) so each directory will have the exact same storageId. That means > there'll be only a single entry in the {{storageMap}} and it'll be > overwritten by a random {{StorageReport}} from the DataNode. This can be seen > in the {{updateState}} method above. This just assigns the capacity from the > received report, instead it should probably sum it up per received heartbeat. > The Balancer seems to be one of the only things that actually uses this > information so it now considers the utilization of a random drive per > DataNode for balancing purposes. > Things get even worse when a drive has been added or replaced as this will > now get a new storage Id so there'll be two entries in the storageMap. As new > drives are usually empty it skewes the balancers decision in a way that this > node will never be considered over-utilized. > Another problem is that old StorageReports are never removed from the > storageMap. So if I replace a drive and it gets a new storage Id the old one > will still be in place and used for all calculations by the Balancer until a > restart of the NameNode. > I can try providing a patch that does the following: > * Instead of using a Map I could just store the array we receive or instead > of storing an array sum up the values for reports with the same Id > * On each heartbeat clear the map (so we know we have up to date information) > Does that sound sensible? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7564) NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map
[ https://issues.apache.org/jira/browse/HDFS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265858#comment-14265858 ] Hadoop QA commented on HDFS-7564: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690277/HDFS-7564.003.patch against trunk revision 4cd66f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9146//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9146//console This message is automatically generated. > NFS gateway dynamically reload UID/GID mapping file /etc/nfs.map > > > Key: HDFS-7564 > URL: https://issues.apache.org/jira/browse/HDFS-7564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Yongjun Zhang >Priority: Minor > Attachments: HDFS-7564.001.patch, HDFS-7564.002.patch, > HDFS-7564.003.patch > > > Add dynamic reload of the NFS gateway UID/GID mappings file /etc/nfs.map > (default for static.id.mapping.file). > It seems that the mappings file is currently only read upon restart of the > NFS gateway which would cause any active clients NFS mount points to hang or > fail. > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7585) TestEnhancedByteBufferAccess hard code the block size
[ https://issues.apache.org/jira/browse/HDFS-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated HDFS-7585: -- Status: Open (was: Patch Available) > TestEnhancedByteBufferAccess hard code the block size > - > > Key: HDFS-7585 > URL: https://issues.apache.org/jira/browse/HDFS-7585 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: sam liu >Assignee: sam liu >Priority: Blocker > Attachments: HDFS-7585.001.patch > > > The test TestEnhancedByteBufferAccess hard code the block size, and it fails > with exceptions on power linux. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7585) TestEnhancedByteBufferAccess hard code the block size
[ https://issues.apache.org/jira/browse/HDFS-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated HDFS-7585: -- Status: Patch Available (was: Open) > TestEnhancedByteBufferAccess hard code the block size > - > > Key: HDFS-7585 > URL: https://issues.apache.org/jira/browse/HDFS-7585 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: sam liu >Assignee: sam liu >Priority: Blocker > Attachments: HDFS-7585.001.patch > > > The test TestEnhancedByteBufferAccess hard code the block size, and it fails > with exceptions on power linux. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7585) TestEnhancedByteBufferAccess hard code the block size
[ https://issues.apache.org/jira/browse/HDFS-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated HDFS-7585: -- Attachment: HDFS-7585.001.patch > TestEnhancedByteBufferAccess hard code the block size > - > > Key: HDFS-7585 > URL: https://issues.apache.org/jira/browse/HDFS-7585 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: sam liu >Assignee: sam liu >Priority: Blocker > Attachments: HDFS-7585.001.patch > > > The test TestEnhancedByteBufferAccess hard code the block size, and it fails > with exceptions on power linux. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7585) TestEnhancedByteBufferAccess hard code the block size
[ https://issues.apache.org/jira/browse/HDFS-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam liu updated HDFS-7585: -- Status: Patch Available (was: Open) The solution is to remove the hard-code of block size and use native OS page size instead. In this way, this test could pass on both x86 platform and power platform. > TestEnhancedByteBufferAccess hard code the block size > - > > Key: HDFS-7585 > URL: https://issues.apache.org/jira/browse/HDFS-7585 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 2.6.0 >Reporter: sam liu >Assignee: sam liu >Priority: Blocker > Attachments: HDFS-7585.001.patch > > > The test TestEnhancedByteBufferAccess hard code the block size, and it fails > with exceptions on power linux. -- This message was sent by Atlassian JIRA (v6.3.4#6332)