[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116636#comment-15116636 ] Hadoop QA commented on HDFS-8999: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} HDFS-8999 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12784330/h8999_20160121c_branch-2.patch | | JIRA Issue | HDFS-8999 | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14240/console | This message was automatically generated. > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze > Attachments: h8999_20151228.patch, h8999_20160106.patch, > h8999_20160106b.patch, h8999_20160106c.patch, h8999_20160111.patch, > h8999_20160113.patch, h8999_20160114.patch, h8999_20160121.patch, > h8999_20160121b.patch, h8999_20160121c.patch, h8999_20160121c_branch-2.patch > > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
[ https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116748#comment-15116748 ] GAO Rui commented on HDFS-9494: --- [~szetszwo],[~rakeshr], thanks a lot for your advice. For {{executor.shutdownNow()}}, if not all the tasks had been completed, we might not arrive to the end of {{flushAllInternals()}}. But there is no harmless to ensure executor shutdown, so I add {{executor.shutdownNow()}} as well. After checking the related codes, it seems that we haven't set a timeout for {{waitForAckedSeqno()}}. Maybe we could consider to set a timeout for it in another new Jira. I have updated the 05 patch. Could you kindly review it? Thank you very much. > Parallel optimization of DFSStripedOutputStream#flushAllInternals( ) > > > Key: HDFS-9494 > URL: https://issues.apache.org/jira/browse/HDFS-9494 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: GAO Rui >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9494-origin-trunk.00.patch, > HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, > HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch, > HDFS-9494-origin-trunk.05.patch > > > Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and > wait for flushInternal( ) in sequence. So the runtime flow is like: > {code} > Streamer0#flushInternal( ) > Streamer0#waitForAckedSeqno( ) > Streamer1#flushInternal( ) > Streamer1#waitForAckedSeqno( ) > … > Streamer8#flushInternal( ) > Streamer8#waitForAckedSeqno( ) > {code} > It could be better to trigger all the streamers to flushInternal( ) and > wait for all of them to return from waitForAckedSeqno( ), and then > flushAllInternals( ) returns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8071) Redundant checkFileProgress() in PART II of getAdditionalBlock()
[ https://issues.apache.org/jira/browse/HDFS-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116846#comment-15116846 ] Hudson commented on HDFS-8071: -- FAILURE: Integrated in Hadoop-trunk-Commit #9186 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9186/]) HDFS-9690. ClientProtocol.addBlock is not idempotent after HDFS-8071. (szetszwo: rev 45c763ad6171bc7808c2ddcb9099a4215113da2a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSClientRetries.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java > Redundant checkFileProgress() in PART II of getAdditionalBlock() > > > Key: HDFS-8071 > URL: https://issues.apache.org/jira/browse/HDFS-8071 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 2.7.0 > > Attachments: HDFS-8071-01.patch, HDFS-8071-02.patch, > HDFS-8071-branch-2.7.patch > > > {{FSN.getAdditionalBlock()}} consists of two parts I and II. Each part calls > {{analyzeFileState()}}, which among other things check replication of the > penultimate block via {{checkFileProgress()}}. See details in HDFS-4452. > Checking file progress in Part II is not necessary, because Part I already > assured the penultimate block is complete. It cannot change to incomplete, > unless the file is truncated, which is not allowed for files under > construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9690) addBlock is not idempotent
[ https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116847#comment-15116847 ] Hudson commented on HDFS-9690: -- FAILURE: Integrated in Hadoop-trunk-Commit #9186 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9186/]) HDFS-9690. ClientProtocol.addBlock is not idempotent after HDFS-8071. (szetszwo: rev 45c763ad6171bc7808c2ddcb9099a4215113da2a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSClientRetries.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java > addBlock is not idempotent > -- > > Key: HDFS-9690 > URL: https://issues.apache.org/jira/browse/HDFS-9690 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9690_20160124.patch, h9690_20160124b.patch, > h9690_20160124b_branch-2.7.patch > > > TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the > bug. It failed in the following builds. > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9704) terminate progress after namenode recover finished
Liao, Xiaoge created HDFS-9704: -- Summary: terminate progress after namenode recover finished Key: HDFS-9704 URL: https://issues.apache.org/jira/browse/HDFS-9704 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Liao, Xiaoge Priority: Minor terminate progress after namenode recover finished -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
[ https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116767#comment-15116767 ] Hadoop QA commented on HDFS-9494: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 18s {color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-client-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new + 13 unchanged - 1 fixed = 14 total (was 14) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 56s {color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-client-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new + 13 unchanged - 1 fixed = 14 total (was 14) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 37s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 5s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7
[jira] [Updated] (HDFS-9503) Replace -namenode option with -fs for NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9503: Attachment: HDFS-9053.002.patch > Replace -namenode option with -fs for NNThroughputBenchmark > --- > > Key: HDFS-9503 > URL: https://issues.apache.org/jira/browse/HDFS-9503 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Konstantin Shvachko >Assignee: Mingliang Liu > Attachments: HDFS-9053.000.patch, HDFS-9053.001.patch, > HDFS-9053.002.patch > > > HDFS-7847 introduced a new option {{-namenode}}, which is intended to point > the benchmark to a remote NameNode. It should use a standard generic option > {{-fs}} instead, which is routinely used to specify NameNode URI in shell > commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
[ https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116671#comment-15116671 ] Sangjin Lee commented on HDFS-9579: --- I went over the existing version of the patch. First, I don't think using a {{HashMap}} for the bytes read per distance is thread safe. Note that one thread (the owner) will modify this map in {{incrementBytesReadByDistance()}} while any thread can read the values off the map via {{getBytesReadByDistance()}} and {{visitAll()}}, all unsynchronized. The problems could range memory visibility, ConcurrentModificationException, and worse. We need to make this thread safe. Another reservation I have with using a map: I'm a little concerned about memory implications. An additional map per {{StatisticsData}} can add up. Can we find a way of avoiding using a map? I know it may sound ugly, but one other option is to use individual long (volatile) variables. That can also address the thread safety. Thoughts? Also, in NetworkTopology.java (lines 373-381) {{equals()}} and {{hashCode()}} are superfluous here as it does not modify the super behavior in any way. > Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level > - > > Key: HDFS-9579 > URL: https://issues.apache.org/jira/browse/HDFS-9579 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9579-2.patch, HDFS-9579-3.patch, HDFS-9579-4.patch, > HDFS-9579.patch, MR job counters.png > > > For cross DC distcp or other applications, it becomes useful to have insight > as to the traffic volume for each network distance to distinguish cross-DC > traffic, local-DC-remote-rack, etc. > FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To > provide additional metrics for each network distance, we can add additional > metrics to FileSystem level and have {{DFSInputStream}} update the value > based on the network distance between client and the datanode. > {{DFSClient}} will resolve client machine's network location as part of its > initialization. It doesn't need to resolve datanode's network location for > each read as {{DatanodeInfo}} already has the info. > There are existing HDFS specific metrics such as {{ReadStatistics}} and > {{DFSHedgedReadMetrics}}. But these metrics are only accessible via > {{DFSClient}} or {{DFSInputStream}}. Not something that application framework > such as MR and Tez can get to. That is the benefit of storing these new > metrics in FileSystem.Statistics. > This jira only includes metrics generation by HDFS. The consumption of these > metrics at MR and Tez will be tracked by separated jiras. > We can add similar metrics for HDFS write scenario later if it is necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9701) DN may deadlock when hot-swapping under load
[ https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116830#comment-15116830 ] Xiao Chen commented on HDFS-9701: - Patch 2 fixes checkstyle, findbugs issues, and added some javadocs. For failed tests, {{TestFsDatasetImpl}} and {{TestDataNodeHotSwapVolumes}} are related, others seems not. - {{TestFsDatasetImpl}}: original test missing cleanup. Added. - {{TestDataNodeHotSwapVolumes}}: IIUC, we should hflush first in order for the {{BlockReceiver}} to hold a ref count. Then we can verify that block reference is not removed because block not finalized, even if a reconfig task is launched. For this reason, I moved the barrier to fix the test. [~eddyxu] please correct me if I'm wrong. > DN may deadlock when hot-swapping under load > > > Key: HDFS-9701 > URL: https://issues.apache.org/jira/browse/HDFS-9701 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9701.01.patch, HDFS-9701.02.patch > > > If the DN is under load (new blocks being written), a hot-swap task by {{hdfs > dfsadmin -reconfig}} may cause a dead lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9690) addBlock is not idempotent
[ https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9690: -- Attachment: h9690_20160124b_branch-2.7.patch Thanks Vinay for reviewing and trying committing the patch! I just have committed the patch down to 2.8. Here is a patch for 2.7. h9690_20160124b_branch-2.7.patch: for 2.7. > addBlock is not idempotent > -- > > Key: HDFS-9690 > URL: https://issues.apache.org/jira/browse/HDFS-9690 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9690_20160124.patch, h9690_20160124b.patch, > h9690_20160124b_branch-2.7.patch > > > TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the > bug. It failed in the following builds. > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116845#comment-15116845 ] Tsz Wo Nicholas Sze commented on HDFS-8999: --- I have committed this to trunk. Will leave this open for committing to branch-2. > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze > Attachments: h8999_20151228.patch, h8999_20160106.patch, > h8999_20160106b.patch, h8999_20160106c.patch, h8999_20160111.patch, > h8999_20160113.patch, h8999_20160114.patch, h8999_20160121.patch, > h8999_20160121b.patch, h8999_20160121c.patch, h8999_20160121c_branch-2.patch > > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9663) Optimize some RPC call using lighter weight construct than DatanodeInfo
[ https://issues.apache.org/jira/browse/HDFS-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116634#comment-15116634 ] Colin Patrick McCabe commented on HDFS-9663: Is that stuff actually sent over the wire in every case? These fields are optional in the protobuf structures. {code} /** * The status of a Datanode */ message DatanodeInfoProto { required DatanodeIDProto id = 1; optional uint64 capacity = 2 [default = 0]; optional uint64 dfsUsed = 3 [default = 0]; optional uint64 remaining = 4 [default = 0]; optional uint64 blockPoolUsed = 5 [default = 0]; optional uint64 lastUpdate = 6 [default = 0]; optional uint32 xceiverCount = 7 [default = 0]; optional string location = 8; enum AdminState { NORMAL = 0; DECOMMISSION_INPROGRESS = 1; DECOMMISSIONED = 2; } optional AdminState adminState = 10 [default = NORMAL]; optional uint64 cacheCapacity = 11 [default = 0]; optional uint64 cacheUsed = 12 [default = 0]; optional uint64 lastUpdateMonotonic = 13 [default = 0]; optional string upgradeDomain = 14; } {code} I agree that it's messy that these fields are optional, but it's hard to see how to change it compatibly at this point. > Optimize some RPC call using lighter weight construct than DatanodeInfo > --- > > Key: HDFS-9663 > URL: https://issues.apache.org/jira/browse/HDFS-9663 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Kai Zheng > > While working on HDFS-8430 when add a RPC in DataTransferProtocol, it was > noticed the very heavy construct either {{DatanodeInfo}} or > {{DatanodeInfoWithStorage}} is used to represent a datanode just for > connection in most time. However, it's very fat and contains much more > information than that needed. See how it's defined: > {code} > public class DatanodeInfo extends DatanodeID implements Node { > private long capacity; > private long dfsUsed; > private long remaining; > private long blockPoolUsed; > private long cacheCapacity; > private long cacheUsed; > private long lastUpdate; > private long lastUpdateMonotonic; > private int xceiverCount; > private String location = NetworkTopology.DEFAULT_RACK; > private String softwareVersion; > private List dependentHostNames = new LinkedList<>(); > private String upgradeDomain; > ... > {code} > In client and datanode sides, for RPC calls like > {{DataTransferProtocol#writeBlock}}, looks like the information contained in > {{DatanodeID}} is almost enough. > I did a quick hack that using a light weight construct like > {{SimpleDatanodeInfo}} that simply extends DatanodeID (no other field added, > but if whatever field needed, then just add it) and changed the > DataTransferProtocol#writeBlock call. Manually checked many relevant tests it > did work fine. How much network traffic saved, did a simple test with codes > in {{Sender}}: > {code} > private static void send(final DataOutputStream out, final Op opcode, > final Message proto) throws IOException { > LOG.trace("Sending DataTransferOp {}: {}", > proto.getClass().getSimpleName(), proto); > int before = out.size(); > op(out, opcode); > proto.writeDelimitedTo(out); > int after = out.size(); > System.out.println("X sent=" + (after - before)); > out.flush(); > } > {code} > Ran the test {{TestWriteRead#testWriteAndRead}}, the change can save about > 100 bytes in most time for the call. The saving may be not so big because > only 3 datanodes are to send, but in situations like in > {{BlockECRecoveryCommand}}, there can be 6+ 3 datanodes as targets and > sources to send, the saving will be significant. > Hence, suggest use more light weight construct to represent a datanode in RPC > calls when possible. Or other ideas to avoid unnecessary wire data size. This > may make sense, as noted, there were some discussions in HDFS-8999 to save > some datanodes bandwidth. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9704) terminate progress after namenode recover finished
[ https://issues.apache.org/jira/browse/HDFS-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liao, Xiaoge updated HDFS-9704: --- Attachment: HDFS-9704.001.patch > terminate progress after namenode recover finished > -- > > Key: HDFS-9704 > URL: https://issues.apache.org/jira/browse/HDFS-9704 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Liao, Xiaoge >Priority: Minor > Attachments: HDFS-9704.001.patch > > > terminate progress after namenode recover finished -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116762#comment-15116762 ] Hadoop QA commented on HDFS-8999: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 38s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s {color} | {color:red} hadoop-hdfs-project: patch generated 4 new + 1027 unchanged - 3 fixed = 1031 total (was 1030) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 11s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 19s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 13s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 103m 46s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 230m 38s {color} | {color:black} {color} | \\
[jira] [Updated] (HDFS-9655) NN should start JVM pause monitor before loading fsimage
[ https://issues.apache.org/jira/browse/HDFS-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9655: Labels: supportability (was: ) > NN should start JVM pause monitor before loading fsimage > > > Key: HDFS-9655 > URL: https://issues.apache.org/jira/browse/HDFS-9655 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Critical > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-9655.001.patch > > > We have seen many cases of NameNode startup either extremely slow or even > hung. Most of them were caused by insufficient heap size with regard to the > metadata size. Those cases were resolved by increasing the heap size. > However it did take support team some time to root cause. JVM pause warning > messages would greatly assist in such diagnosis, but NN starts JVM pause > monitor after fsimage/edits loading. > Propose to start JVM pause monitor before loading fsimage/edits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
[ https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-9494: -- Status: In Progress (was: Patch Available) > Parallel optimization of DFSStripedOutputStream#flushAllInternals( ) > > > Key: HDFS-9494 > URL: https://issues.apache.org/jira/browse/HDFS-9494 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: GAO Rui >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9494-origin-trunk.00.patch, > HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, > HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch > > > Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and > wait for flushInternal( ) in sequence. So the runtime flow is like: > {code} > Streamer0#flushInternal( ) > Streamer0#waitForAckedSeqno( ) > Streamer1#flushInternal( ) > Streamer1#waitForAckedSeqno( ) > … > Streamer8#flushInternal( ) > Streamer8#waitForAckedSeqno( ) > {code} > It could be better to trigger all the streamers to flushInternal( ) and > wait for all of them to return from waitForAckedSeqno( ), and then > flushAllInternals( ) returns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
[ https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-9494: -- Status: Patch Available (was: In Progress) > Parallel optimization of DFSStripedOutputStream#flushAllInternals( ) > > > Key: HDFS-9494 > URL: https://issues.apache.org/jira/browse/HDFS-9494 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: GAO Rui >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9494-origin-trunk.00.patch, > HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, > HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch, > HDFS-9494-origin-trunk.05.patch > > > Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and > wait for flushInternal( ) in sequence. So the runtime flow is like: > {code} > Streamer0#flushInternal( ) > Streamer0#waitForAckedSeqno( ) > Streamer1#flushInternal( ) > Streamer1#waitForAckedSeqno( ) > … > Streamer8#flushInternal( ) > Streamer8#waitForAckedSeqno( ) > {code} > It could be better to trigger all the streamers to flushInternal( ) and > wait for all of them to return from waitForAckedSeqno( ), and then > flushAllInternals( ) returns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
[ https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-9494: -- Attachment: HDFS-9494-origin-trunk.05.patch > Parallel optimization of DFSStripedOutputStream#flushAllInternals( ) > > > Key: HDFS-9494 > URL: https://issues.apache.org/jira/browse/HDFS-9494 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: GAO Rui >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9494-origin-trunk.00.patch, > HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, > HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch, > HDFS-9494-origin-trunk.05.patch > > > Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and > wait for flushInternal( ) in sequence. So the runtime flow is like: > {code} > Streamer0#flushInternal( ) > Streamer0#waitForAckedSeqno( ) > Streamer1#flushInternal( ) > Streamer1#waitForAckedSeqno( ) > … > Streamer8#flushInternal( ) > Streamer8#waitForAckedSeqno( ) > {code} > It could be better to trigger all the streamers to flushInternal( ) and > wait for all of them to return from waitForAckedSeqno( ), and then > flushAllInternals( ) returns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9690) addBlock is not idempotent
[ https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116851#comment-15116851 ] Hadoop QA commented on HDFS-9690: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} HDFS-9690 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12784370/h9690_20160124b_branch-2.7.patch | | JIRA Issue | HDFS-9690 | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14244/console | This message was automatically generated. > addBlock is not idempotent > -- > > Key: HDFS-9690 > URL: https://issues.apache.org/jira/browse/HDFS-9690 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9690_20160124.patch, h9690_20160124b.patch, > h9690_20160124b_branch-2.7.patch > > > TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the > bug. It failed in the following builds. > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7694) FSDataInputStream should support "unbuffer"
[ https://issues.apache.org/jira/browse/HDFS-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116622#comment-15116622 ] Colin Patrick McCabe commented on HDFS-7694: Hi, [~djp]. This change is compatible, since people are not expected to be subclassing {{FSDataInputStream}}. So it seems fine to backport to 2.6, if the maintainers of that branch think it will be useful there. > FSDataInputStream should support "unbuffer" > --- > > Key: HDFS-7694 > URL: https://issues.apache.org/jira/browse/HDFS-7694 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.7.0 > > Attachments: HDFS-7694.001.patch, HDFS-7694.002.patch, > HDFS-7694.003.patch, HDFS-7694.004.patch, HDFS-7694.005.patch > > > For applications that have many open HDFS (or other Hadoop filesystem) files, > it would be useful to have an API to clear readahead buffers and sockets. > This could be added to the existing APIs as an optional interface, in much > the same way as we added setReadahead / setDropBehind / etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9663) Optimize some RPC call using lighter weight construct than DatanodeInfo
[ https://issues.apache.org/jira/browse/HDFS-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116701#comment-15116701 ] Kai Zheng commented on HDFS-9663: - Thanks for your comments, Colin. bq. Is that stuff actually sent over the wire in every case? These fields are optional in the protobuf structures. I thought of this and checked it, these optional fields looked like to be sent over even when they're not actually needed. I will check again to ensure about this. bq. I agree that it's messy that these fields are optional, but it's hard to see how to change it compatibly at this point. Yes right. Compatibility has to be considered. For protocol that's already released out, to avoid sending over unnecessary fields may be the option; for others introduced recently like the one mentioned in the description, BlockECRecoveryCommand, and new protocols in future, I thought we may be able to change using lightweight structure like DatanodeID when possible. Sounds good? > Optimize some RPC call using lighter weight construct than DatanodeInfo > --- > > Key: HDFS-9663 > URL: https://issues.apache.org/jira/browse/HDFS-9663 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Kai Zheng > > While working on HDFS-8430 when add a RPC in DataTransferProtocol, it was > noticed the very heavy construct either {{DatanodeInfo}} or > {{DatanodeInfoWithStorage}} is used to represent a datanode just for > connection in most time. However, it's very fat and contains much more > information than that needed. See how it's defined: > {code} > public class DatanodeInfo extends DatanodeID implements Node { > private long capacity; > private long dfsUsed; > private long remaining; > private long blockPoolUsed; > private long cacheCapacity; > private long cacheUsed; > private long lastUpdate; > private long lastUpdateMonotonic; > private int xceiverCount; > private String location = NetworkTopology.DEFAULT_RACK; > private String softwareVersion; > private List dependentHostNames = new LinkedList<>(); > private String upgradeDomain; > ... > {code} > In client and datanode sides, for RPC calls like > {{DataTransferProtocol#writeBlock}}, looks like the information contained in > {{DatanodeID}} is almost enough. > I did a quick hack that using a light weight construct like > {{SimpleDatanodeInfo}} that simply extends DatanodeID (no other field added, > but if whatever field needed, then just add it) and changed the > DataTransferProtocol#writeBlock call. Manually checked many relevant tests it > did work fine. How much network traffic saved, did a simple test with codes > in {{Sender}}: > {code} > private static void send(final DataOutputStream out, final Op opcode, > final Message proto) throws IOException { > LOG.trace("Sending DataTransferOp {}: {}", > proto.getClass().getSimpleName(), proto); > int before = out.size(); > op(out, opcode); > proto.writeDelimitedTo(out); > int after = out.size(); > System.out.println("X sent=" + (after - before)); > out.flush(); > } > {code} > Ran the test {{TestWriteRead#testWriteAndRead}}, the change can save about > 100 bytes in most time for the call. The saving may be not so big because > only 3 datanodes are to send, but in situations like in > {{BlockECRecoveryCommand}}, there can be 6+ 3 datanodes as targets and > sources to send, the saving will be significant. > Hence, suggest use more light weight construct to represent a datanode in RPC > calls when possible. Or other ideas to avoid unnecessary wire data size. This > may make sense, as noted, there were some discussions in HDFS-8999 to save > some datanodes bandwidth. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9701) DN may deadlock when hot-swapping under load
[ https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9701: Attachment: HDFS-9701.02.patch > DN may deadlock when hot-swapping under load > > > Key: HDFS-9701 > URL: https://issues.apache.org/jira/browse/HDFS-9701 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9701.01.patch, HDFS-9701.02.patch > > > If the DN is under load (new blocks being written), a hot-swap task by {{hdfs > dfsadmin -reconfig}} may cause a dead lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
[ https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116791#comment-15116791 ] Ming Ma commented on HDFS-9579: --- Thanks [~sjlee0]! Good point about the thread visibility issue. The reason I ended up using the map is to make the code more general to support any network distance value without code change. However due to the fact that the available network distance values don't change often, using individual long variables seems ok given it addresses the issues you mentioned above. To use individual long variables, it could be something like below. Note that it assume tree-based topology; and it should cover the common scenarios. If we need to track network distance values, we can update it later. In addition, this means bytesReadDistanceOfFour and bytesReadDistanceOfSix won't be used for small network topology. {noformat} volatile long bytesReadLocalHost; volatile long bytesReadDistanceOfTwo; // local rack case. volatile long bytesReadDistanceOfFour; // first-degree remote rack volatile long bytesReadDistanceOfSix; // second-degree remote rack {noformat} I will update the patch once we agree on the new approach. > Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level > - > > Key: HDFS-9579 > URL: https://issues.apache.org/jira/browse/HDFS-9579 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9579-2.patch, HDFS-9579-3.patch, HDFS-9579-4.patch, > HDFS-9579.patch, MR job counters.png > > > For cross DC distcp or other applications, it becomes useful to have insight > as to the traffic volume for each network distance to distinguish cross-DC > traffic, local-DC-remote-rack, etc. > FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To > provide additional metrics for each network distance, we can add additional > metrics to FileSystem level and have {{DFSInputStream}} update the value > based on the network distance between client and the datanode. > {{DFSClient}} will resolve client machine's network location as part of its > initialization. It doesn't need to resolve datanode's network location for > each read as {{DatanodeInfo}} already has the info. > There are existing HDFS specific metrics such as {{ReadStatistics}} and > {{DFSHedgedReadMetrics}}. But these metrics are only accessible via > {{DFSClient}} or {{DFSInputStream}}. Not something that application framework > such as MR and Tez can get to. That is the benefit of storing these new > metrics in FileSystem.Statistics. > This jira only includes metrics generation by HDFS. The consumption of these > metrics at MR and Tez will be tracked by separated jiras. > We can add similar metrics for HDFS write scenario later if it is necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6489) DFS Used space is not correct computed on frequent append operations
[ https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated HDFS-6489: -- Attachment: HDFS6489.java > DFS Used space is not correct computed on frequent append operations > > > Key: HDFS-6489 > URL: https://issues.apache.org/jira/browse/HDFS-6489 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0 >Reporter: stanley shi > Attachments: HDFS6489.java > > > The current implementation of the Datanode will increase the DFS used space > on each block write operation. This is correct in most scenario (create new > file), but sometimes it will behave in-correct(append small data to a large > block). > For example, I have a file with only one block(say, 60M). Then I try to > append to it very frequently but each time I append only 10 bytes; > Then on each append, dfs used will be increased with the length of the > block(60M), not teh actual data length(10bytes). > Consider in a scenario I use many clients to append concurrently to a large > number of files (1000+), assume the block size is 32M (half of the default > value), then the dfs used will be increased 1000*32M = 32G on each append to > the files; but actually I only write 10K bytes; this will cause the datanode > to report in-sufficient disk space on data write. > {quote}2014-06-04 15:27:34,719 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock > BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received > exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: > Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, > FINALIZED{quote} > But the actual disk usage: > {quote} > [root@hdsh143 ~]# df -h > FilesystemSize Used Avail Use% Mounted on > /dev/sda3 16G 2.9G 13G 20% / > tmpfs 1.9G 72K 1.9G 1% /dev/shm > /dev/sda1 97M 32M 61M 35% /boot > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Attachment: HDFS-9260.014.patch > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFSBenchmarks.zip, > HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6489) DFS Used space is not correct computed on frequent append operations
[ https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated HDFS-6489: -- Affects Version/s: 2.7.1 > DFS Used space is not correct computed on frequent append operations > > > Key: HDFS-6489 > URL: https://issues.apache.org/jira/browse/HDFS-6489 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.7.1 >Reporter: stanley shi > Attachments: HDFS6489.java > > > The current implementation of the Datanode will increase the DFS used space > on each block write operation. This is correct in most scenario (create new > file), but sometimes it will behave in-correct(append small data to a large > block). > For example, I have a file with only one block(say, 60M). Then I try to > append to it very frequently but each time I append only 10 bytes; > Then on each append, dfs used will be increased with the length of the > block(60M), not teh actual data length(10bytes). > Consider in a scenario I use many clients to append concurrently to a large > number of files (1000+), assume the block size is 32M (half of the default > value), then the dfs used will be increased 1000*32M = 32G on each append to > the files; but actually I only write 10K bytes; this will cause the datanode > to report in-sufficient disk space on data write. > {quote}2014-06-04 15:27:34,719 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock > BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received > exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: > Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, > FINALIZED{quote} > But the actual disk usage: > {quote} > [root@hdsh143 ~]# df -h > FilesystemSize Used Avail Use% Mounted on > /dev/sda3 16G 2.9G 13G 20% / > tmpfs 1.9G 72K 1.9G 1% /dev/shm > /dev/sda1 97M 32M 61M 35% /boot > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6489) DFS Used space is not correct computed on frequent append operations
[ https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115206#comment-15115206 ] Bogdan Raducanu commented on HDFS-6489: --- I've recently hit this bug in 2.7.1. I attached repro code. The repro should fail with 'all datanodes are bad' exception while the datanode log will show the "insufficient disk space" exception. While the program is running you can see the reported "Block pool used" increase by a lot. A minute or two after the failure the "Block pool used" goes down to normal. > DFS Used space is not correct computed on frequent append operations > > > Key: HDFS-6489 > URL: https://issues.apache.org/jira/browse/HDFS-6489 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.7.1 >Reporter: stanley shi > Attachments: HDFS6489.java > > > The current implementation of the Datanode will increase the DFS used space > on each block write operation. This is correct in most scenario (create new > file), but sometimes it will behave in-correct(append small data to a large > block). > For example, I have a file with only one block(say, 60M). Then I try to > append to it very frequently but each time I append only 10 bytes; > Then on each append, dfs used will be increased with the length of the > block(60M), not teh actual data length(10bytes). > Consider in a scenario I use many clients to append concurrently to a large > number of files (1000+), assume the block size is 32M (half of the default > value), then the dfs used will be increased 1000*32M = 32G on each append to > the files; but actually I only write 10K bytes; this will cause the datanode > to report in-sufficient disk space on data write. > {quote}2014-06-04 15:27:34,719 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock > BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received > exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: > Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, > FINALIZED{quote} > But the actual disk usage: > {quote} > [root@hdsh143 ~]# df -h > FilesystemSize Used Avail Use% Mounted on > /dev/sda3 16G 2.9G 13G 20% / > tmpfs 1.9G 72K 1.9G 1% /dev/shm > /dev/sda1 97M 32M 61M 35% /boot > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9695) HTTPFS - CHECKACCESS operation missing
Bert Hekman created HDFS-9695: - Summary: HTTPFS - CHECKACCESS operation missing Key: HDFS-9695 URL: https://issues.apache.org/jira/browse/HDFS-9695 Project: Hadoop HDFS Issue Type: Bug Reporter: Bert Hekman Hi, The CHECKACCESS operation seems to be missing in HTTPFS. I'm getting the following error: {code} QueryParamException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.CHECKACCESS {code} A quick look into the org.apache.hadoop.fs.http.client.HttpFSFileSystem class reveals that CHECKACCESS is not defined at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9689) Test o.a.h.hdfs.TestRenameWhileOpen fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115072#comment-15115072 ] Vinayakumar B edited comment on HDFS-9689 at 1/25/16 1:22 PM: -- One possible solution to Namenode restart tests, is to implement {{BPOfferService#refreshNNList()}} in way to support in between changing of namenode addresses, and issue, DN#refreshNamenodes(Conf) call to all DNs after restart in MiniDfsCluster. This way all tests which fails intermittently with this kind of problems would be solved. After analyzing, current MiniDfsCluster, is not having option to restartNamenode in another optional ports. If this support is required to be added, there are some points needs to be checked. 1. FileSystem's URI will change in case of NonHA, So all FileSystem's instances should be refreshed in tests after restart of Namenode. 2. Current restartDatanode(..) have keepPort default as false, but this will work only if {{setupHostsFile}} is false. Otherwise, DN will try to restart in same port. This could result in some occasional test failures. 3. In case of restartDatanodes() or restartNamenodes(),, assertions of URIs or DN names, should be checked, as these may change with changed port post restart. So, to completely resolve issues raising due to Port Bind issues, from restart (Name|Data)nodes needs some effort :) What you say, [~liuml07] ? was (Author: vinayrpet): One possible solution to Namenode restart tests, is to implement {{BPOfferService#refreshNNList()}} in way to support in between changing of namenode addresses, and issue, DN#refreshNamenodes(Conf) call to all DNs after restart in MiniDfsCluster. This way all tests which fails intermittently with this kind of problems would be solved. What you say, [~liuml07] ? > Test o.a.h.hdfs.TestRenameWhileOpen fails intermittently > - > > Key: HDFS-9689 > URL: https://issues.apache.org/jira/browse/HDFS-9689 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9689.000.patch > > > The test fails in recent builds, e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/14063/testReport/org.apache.hadoop.hdfs/TestRenameWhileOpen/ > and > https://builds.apache.org/job/PreCommit-HDFS-Build/14212/testReport/org.apache.hadoop.hdfs/TestRenameWhileOpen/testWhileOpenRenameToNonExistentDirectory/ > The *Error Message* is like: > {code} > Problem binding to [localhost:60690] java.net.BindException: Address already > in use; For more details see: http://wiki.apache.org/hadoop/BindException > {code} > and *Stacktrace* is: > {code} > java.net.BindException: Problem binding to [localhost:60690] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:469) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:695) > at org.apache.hadoop.ipc.Server.(Server.java:2464) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:392) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:743) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:685) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:884) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:863) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1581) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247) > at > org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115356#comment-15115356 ] Staffan Friberg commented on HDFS-9260: --- Fixed checkstyle on TreeSet. Should I convert storages field to private? (The triplets field was protected) > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFSBenchmarks.zip, > HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9694) Make existing DFSClient#getFileChecksum() work for striped blocks
Kai Zheng created HDFS-9694: --- Summary: Make existing DFSClient#getFileChecksum() work for striped blocks Key: HDFS-9694 URL: https://issues.apache.org/jira/browse/HDFS-9694 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Fix For: 3.0.0 This is a sub-task of HDFS-8430 and will get the existing API {{FileSystem#getFileChecksum(path)}} work for striped files. It will also refactor existing codes and layout basic work for subsequent tasks like support of the new API proposed there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9693) Trim the user config of `dfs.ha.namenode.id`
[ https://issues.apache.org/jira/browse/HDFS-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115244#comment-15115244 ] Hadoop QA commented on HDFS-9693: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 4s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 46s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 235m 51s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestMissingBlocksAlert | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.datanode.TestBlockReplacement | | | hadoop.hdfs.TestFileCreationDelete | | | hadoop.hdfs.server.namenode.ha.TestHAAppend | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | JDK v1.7.0_91 Failed junit
[jira] [Commented] (HDFS-8430) Erasure coding: compute file checksum for stripe files
[ https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115135#comment-15115135 ] Kai Zheng commented on HDFS-8430: - To break down, opened HDFS-9694 to make the existing API also works for striped files along with codes refactoring. > Erasure coding: compute file checksum for stripe files > -- > > Key: HDFS-8430 > URL: https://issues.apache.org/jira/browse/HDFS-8430 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Walter Su >Assignee: Kai Zheng > Attachments: HDFS-8430-poc1.patch > > > HADOOP-3981 introduces a distributed file checksum algorithm. It's designed > for replicated block. > {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped > block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9525) hadoop utilities need to support provided delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115355#comment-15115355 ] Kihwal Lee commented on HDFS-9525: -- Is anyone reverting it or reworking on the fix? > hadoop utilities need to support provided delegation tokens > --- > > Key: HDFS-9525 > URL: https://issues.apache.org/jira/browse/HDFS-9525 > Project: Hadoop HDFS > Issue Type: New Feature > Components: security >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: HeeSoo Kim >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, > HDFS-7984.003.patch, HDFS-7984.004.patch, HDFS-7984.005.patch, > HDFS-7984.006.patch, HDFS-7984.007.patch, HDFS-7984.patch, > HDFS-9525.008.patch, HDFS-9525.009.patch, HDFS-9525.009.patch, > HDFS-9525.branch-2.008.patch, HDFS-9525.branch-2.009.patch > > > When using the webhdfs:// filesystem (especially from distcp), we need the > ability to inject a delegation token rather than webhdfs initialize its own. > This would allow for cross-authentication-zone file system accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9406) FSImage corruption after taking snapshot
[ https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115780#comment-15115780 ] Jing Zhao commented on HDFS-9406: - Thanks for reporting the issue, [~stanislav.an...@gmail.com]. The corrupted fsimage should also be useful for debugging. Could you please share the image if possible? > FSImage corruption after taking snapshot > > > Key: HDFS-9406 > URL: https://issues.apache.org/jira/browse/HDFS-9406 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: CentOS 6 amd64, CDH 5.4.4-1 > 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3 > Memory: 32GB > Namenode blocks: ~700_000 blocks, no HA setup >Reporter: Stanislav Antic >Assignee: Yongjun Zhang > > FSImage corruption happened after HDFS snapshots were taken. Cluster was not > used > at that time. > When namenode restarts it reported NULL pointer exception: > {code} > 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized > segments in /tmp/fsimage_checker_5857/fsimage/current > 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected. > 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes. > 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553) > 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1 > {code} > Corruption happened after "07.11.2015 00:15", and after that time blocks > ~9300 blocks were invalidated that shouldn't be. > After recovering FSimage I discovered that around ~9300 blocks were missing. > -I also attached log of namenode before and after corruption happened.- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9262) Support reconfiguring dfs.datanode.lazywriter.interval.sec without DN restart
[ https://issues.apache.org/jira/browse/HDFS-9262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-9262: Attachment: HDFS-9262-HDFS-9000.004.patch V004 fixed a large quantity of unit failures as a result of Reconfigurable implementation(originally throw UnsupportedOperationException) in SimulatedFSDataset and ExternalDatasetImpl. > Support reconfiguring dfs.datanode.lazywriter.interval.sec without DN restart > - > > Key: HDFS-9262 > URL: https://issues.apache.org/jira/browse/HDFS-9262 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 2.7.0 >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-9262-HDFS-9000.002.patch, > HDFS-9262-HDFS-9000.003.patch, HDFS-9262-HDFS-9000.004.patch, > HDFS-9262.001.patch > > > This is to reconfigure > {code} > dfs.datanode.lazywriter.interval.sec > {code} > without restarting DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9699) libhdfs++: Add appropriate catch blocks for ASIO operations that throw
James Clampffer created HDFS-9699: - Summary: libhdfs++: Add appropriate catch blocks for ASIO operations that throw Key: HDFS-9699 URL: https://issues.apache.org/jira/browse/HDFS-9699 Project: Hadoop HDFS Issue Type: Sub-task Reporter: James Clampffer Assignee: James Clampffer libhdfs++ doesn't create exceptions of its own but it should be able to gracefully handle exceptions thrown by libraries it uses, particularly asio. libhdfs++ should be able to catch most exceptions within reason either at the call site or in the code that spins up asio worker threads. Certain system exceptions like std::bad_alloc don't need to be caught because by that point the process is likely in a unrecoverable state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9672) o.a.h.hdfs.TestLeaseRecovery2 fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115862#comment-15115862 ] Jitendra Nath Pandey commented on HDFS-9672: +1 > o.a.h.hdfs.TestLeaseRecovery2 fails intermittently > -- > > Key: HDFS-9672 > URL: https://issues.apache.org/jira/browse/HDFS-9672 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9672.000.patch, HDFS-9672.001.patch > > > It fails in recent builds, see: > https://builds.apache.org/job/PreCommit-HDFS-Build/14177/testReport/org.apache.hadoop.hdfs/ > https://builds.apache.org/job/PreCommit-HDFS-Build/14147/testReport/org.apache.hadoop.hdfs/ > Failing test methods include: > * > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart > * org.apache.hadoop.hdfs.TestLeaseRecovery2.testLeaseRecoverByAnotherUser > * org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecovery > * > org.apache.hadoop.hdfs.TestLeaseRecovery2.org.apache.hadoop.hdfs.TestLeaseRecovery2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
[ https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115745#comment-15115745 ] Rakesh R commented on HDFS-9494: Thanks [~demongaorui] for the patch. I've a minor comment, please consider this also when preparing next patch. For every flushAllInternals(), it is creating {{ExecutorService executor = Executors.newFixedThreadPool(numAllBlocks);}}. Please do {{executor.shutdownNow();}} at the end of flushAllInternals() function. Otw there could be a chance of unnecessary {{Thread (pool-1-thread-1) (Running)}} reference leaving, right? > Parallel optimization of DFSStripedOutputStream#flushAllInternals( ) > > > Key: HDFS-9494 > URL: https://issues.apache.org/jira/browse/HDFS-9494 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: GAO Rui >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9494-origin-trunk.00.patch, > HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, > HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch > > > Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and > wait for flushInternal( ) in sequence. So the runtime flow is like: > {code} > Streamer0#flushInternal( ) > Streamer0#waitForAckedSeqno( ) > Streamer1#flushInternal( ) > Streamer1#waitForAckedSeqno( ) > … > Streamer8#flushInternal( ) > Streamer8#waitForAckedSeqno( ) > {code} > It could be better to trigger all the streamers to flushInternal( ) and > wait for all of them to return from waitForAckedSeqno( ), and then > flushAllInternals( ) returns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9698) Long running Balancer should renew TGT
Zhe Zhang created HDFS-9698: --- Summary: Long running Balancer should renew TGT Key: HDFS-9698 URL: https://issues.apache.org/jira/browse/HDFS-9698 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover, security Affects Versions: 2.6.3 Reporter: Zhe Zhang Assignee: Zhe Zhang When the {{Balancer}} runs beyond the configured TGT lifetime, the current logic won't renew TGT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9406) FSImage corruption after taking snapshot
[ https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115866#comment-15115866 ] Yongjun Zhang commented on HDFS-9406: - Thanks [~kihwal] and [~jingzhao]. Hi Jing, I have got a set of data from [~stanislav.an...@gmail.com] at our private channel, the issue can be reproduced with this set of data (Thanks Stanislav a million for that!). I have been debugging and had good understanding. I will talk with you and Stanislav privately about the data. While I tried to create a small testcase to reproduce the symptom here, I was not quite successful. However, I was able to create HDFS-9697 and have a proposed solution (not published yet). My study showed that HDFS-9406 has similar cause as HDFS-9697 but not exactly the same. I'm digging it a bit further, I might need help from you guys at some point. Thanks much. > FSImage corruption after taking snapshot > > > Key: HDFS-9406 > URL: https://issues.apache.org/jira/browse/HDFS-9406 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: CentOS 6 amd64, CDH 5.4.4-1 > 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3 > Memory: 32GB > Namenode blocks: ~700_000 blocks, no HA setup >Reporter: Stanislav Antic >Assignee: Yongjun Zhang > > FSImage corruption happened after HDFS snapshots were taken. Cluster was not > used > at that time. > When namenode restarts it reported NULL pointer exception: > {code} > 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized > segments in /tmp/fsimage_checker_5857/fsimage/current > 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected. > 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes. > 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553) > 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1 > {code} > Corruption happened after "07.11.2015 00:15", and after that time blocks > ~9300 blocks were invalidated that shouldn't be. > After recovering FSimage I discovered that around ~9300 blocks were missing. > -I also attached log of namenode before and after corruption happened.- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115818#comment-15115818 ] Jing Zhao edited comment on HDFS-9696 at 1/25/16 7:32 PM: -- Currently I think HDFS-9406 and HDFS-9697 may both be caused by some lingering INode in the diff list. Both failed when loading INode from the inode Map. Compared with the logic for removing inodes from inode map, cleaning diff list is more complicated thus has higher chance to have bug. was (Author: jingzhao): Currently I think HDFS-9406 and HDFS-9697 may both be caused by some lingering INode in the diff list. Both failed when loading INode from the inode Map. Compared with the logic for removing inodes from inode map, cleaning diff list is more complicated thus has higher chance to fail. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Yongjun Zhang >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115818#comment-15115818 ] Jing Zhao commented on HDFS-9696: - Currently I think HDFS-9406 and HDFS-9697 may both be caused by some lingering INode in the diff list. Both failed when loading INode from the inode Map. Compared with the logic for removing inodes from inode map, cleaning diff list is more complicated thus has higher chance to fail. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Yongjun Zhang >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9330) Support reconfiguring dfs.datanode.duplicate.replica.deletion without DN restart
[ https://issues.apache.org/jira/browse/HDFS-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-9330: Attachment: HDFS-9330-HDFS-9000.003.patch Similarly, V003 fixed a large quantity of unit failures as a result of Reconfigurable implementation(originally throw UnsupportedOperationException) in SimulatedFSDataset and ExternalDatasetImpl. > Support reconfiguring dfs.datanode.duplicate.replica.deletion without DN > restart > - > > Key: HDFS-9330 > URL: https://issues.apache.org/jira/browse/HDFS-9330 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Attachments: HDFS-9330-HDFS-9000.002.patch, > HDFS-9330-HDFS-9000.003.patch, HDFS-9330.001.patch > > > This is to reconfigure > {code} > dfs.datanode.duplicate.replica.deletion > {code} > without restarting DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9691) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode#testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115776#comment-15115776 ] Mingliang Liu commented on HDFS-9691: - The failing test is not related, and seems flaky which is tracked by [HDFS-9476]. > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode#testCheckSafeMode > fails intermittently > - > > Key: HDFS-9691 > URL: https://issues.apache.org/jira/browse/HDFS-9691 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9691.000.patch > > > It's a flaky test method and can rarely re-produce locally. We can see this > happened in recent build, e.g. > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14225/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14139/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ > {code} > Error Message > expected: but was: > Stacktrace > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:165) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9698) Long running Balancer should renew TGT
[ https://issues.apache.org/jira/browse/HDFS-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9698: Attachment: HDFS-9698.00.patch A similar fix as HADOOP-12559, but in the {{Balancer}}. Adding the renewal logic before each {{Balancer}} iteration because the dispatch runs multiple operations with NN within the iteration. > Long running Balancer should renew TGT > -- > > Key: HDFS-9698 > URL: https://issues.apache.org/jira/browse/HDFS-9698 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, security >Affects Versions: 2.6.3 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9698.00.patch > > > When the {{Balancer}} runs beyond the configured TGT lifetime, the current > logic won't renew TGT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9406) FSImage corruption after taking snapshot
[ https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115795#comment-15115795 ] Kihwal Lee commented on HDFS-9406: -- HDFS-9696 might be related. > FSImage corruption after taking snapshot > > > Key: HDFS-9406 > URL: https://issues.apache.org/jira/browse/HDFS-9406 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: CentOS 6 amd64, CDH 5.4.4-1 > 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3 > Memory: 32GB > Namenode blocks: ~700_000 blocks, no HA setup >Reporter: Stanislav Antic >Assignee: Yongjun Zhang > > FSImage corruption happened after HDFS snapshots were taken. Cluster was not > used > at that time. > When namenode restarts it reported NULL pointer exception: > {code} > 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized > segments in /tmp/fsimage_checker_5857/fsimage/current > 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected. > 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes. > 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553) > 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1 > {code} > Corruption happened after "07.11.2015 00:15", and after that time blocks > ~9300 blocks were invalidated that shouldn't be. > After recovering FSimage I discovered that around ~9300 blocks were missing. > -I also attached log of namenode before and after corruption happened.- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115796#comment-15115796 ] Yongjun Zhang commented on HDFS-9696: - Thanks Kihwal. Yes, agree. While I have been investigating, I indeed planned to ask the snapshot developers for help at some point. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Yongjun Zhang >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115884#comment-15115884 ] Yongjun Zhang commented on HDFS-9696: - And I have a solution for HDFS-9697, for the case I created. Yet to prove that it will work with all situations. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Yongjun Zhang >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9094) Add command line option to ask NameNode reload configuration.
[ https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9094: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) +1 for the v009 patch. I committed this for 2.9.0. Thanks for the contribution [~xiaobingo]. > Add command line option to ask NameNode reload configuration. > - > > Key: HDFS-9094 > URL: https://issues.apache.org/jira/browse/HDFS-9094 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.7.0 >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-9094-HDFS-9000.002.patch, > HDFS-9094-HDFS-9000.003.patch, HDFS-9094-HDFS-9000.004.patch, > HDFS-9094-HDFS-9000.005.patch, HDFS-9094-HDFS-9000.006.patch, > HDFS-9094-HDFS-9000.007.patch, HDFS-9094-HDFS-9000.008.patch, > HDFS-9094-HDFS-9000.009.patch, HDFS-9094.001.patch > > > This work is going to add DFS admin command that allows reloading NameNode > configuration. This is sibling work related to HDFS-6808. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9094) Add command line option to ask NameNode reload configuration.
[ https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115906#comment-15115906 ] Hudson commented on HDFS-9094: -- FAILURE: Integrated in Hadoop-trunk-Commit #9180 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9180/]) HDFS-9094. Add command line option to ask NameNode reload configuration. (arp: rev d62b4a4de75edb840df6634f49cb4beb74e3fb07) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ReconfigurationProtocolServerSideUtils.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ReconfigurationProtocol.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamenodeProtocols.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java > Add command line option to ask NameNode reload configuration. > - > > Key: HDFS-9094 > URL: https://issues.apache.org/jira/browse/HDFS-9094 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.7.0 >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-9094-HDFS-9000.002.patch, > HDFS-9094-HDFS-9000.003.patch, HDFS-9094-HDFS-9000.004.patch, > HDFS-9094-HDFS-9000.005.patch, HDFS-9094-HDFS-9000.006.patch, > HDFS-9094-HDFS-9000.007.patch, HDFS-9094-HDFS-9000.008.patch, > HDFS-9094-HDFS-9000.009.patch, HDFS-9094.001.patch > > > This work is going to add DFS admin command that allows reloading NameNode > configuration. This is sibling work related to HDFS-6808. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
[ https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115531#comment-15115531 ] Tsz Wo Nicholas Sze commented on HDFS-9494: --- In the last for-loop, the finally-block will be executed multiple times (healthyStreamerCount). It may not be intended. I think it is better to wait until all tasks have been completed. Then, process the exceptions if the map is non-empty. > Parallel optimization of DFSStripedOutputStream#flushAllInternals( ) > > > Key: HDFS-9494 > URL: https://issues.apache.org/jira/browse/HDFS-9494 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: GAO Rui >Assignee: GAO Rui >Priority: Minor > Attachments: HDFS-9494-origin-trunk.00.patch, > HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, > HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch > > > Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and > wait for flushInternal( ) in sequence. So the runtime flow is like: > {code} > Streamer0#flushInternal( ) > Streamer0#waitForAckedSeqno( ) > Streamer1#flushInternal( ) > Streamer1#waitForAckedSeqno( ) > … > Streamer8#flushInternal( ) > Streamer8#waitForAckedSeqno( ) > {code} > It could be better to trigger all the streamers to flushInternal( ) and > wait for all of them to return from waitForAckedSeqno( ), and then > flushAllInternals( ) returns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115578#comment-15115578 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 19 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 704 unchanged - 12 fixed = 706 total (was 716) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 34s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 34s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 128m 41s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCache | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++
[ https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115421#comment-15115421 ] James Clampffer commented on HDFS-9118: --- I'm going to take a shot at this. Things I'm planning on picking up implicitly in addition to the log message and level: -id of thread doing the logging -stack address of the logging function (add a local variable and grab its address) -line number, file name, function > Add logging system for libdhfs++ > > > Key: HDFS-9118 > URL: https://issues.apache.org/jira/browse/HDFS-9118 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: James Clampffer > > With HDFS-9505, we've starting logging data from libhdfs++. Consumers of the > library are going to have their own logging infrastructure that we're going > to want to provide data to. > libhdfs++ should have a logging library that: > * Is overridable and can provide sufficient information to work well with > common C++ logging frameworks > * Has a rational default implementation > * Is performant -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++
[ https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115434#comment-15115434 ] Bob Hansen commented on HDFS-9118: -- Grabbing the stack address is a fairly expensive operation. I would limit it to opt-in in rare circumstances, and perhaps when logging an error. > Add logging system for libdhfs++ > > > Key: HDFS-9118 > URL: https://issues.apache.org/jira/browse/HDFS-9118 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: James Clampffer > > With HDFS-9505, we've starting logging data from libhdfs++. Consumers of the > library are going to have their own logging infrastructure that we're going > to want to provide data to. > libhdfs++ should have a logging library that: > * Is overridable and can provide sufficient information to work well with > common C++ logging frameworks > * Has a rational default implementation > * Is performant -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)
[ https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115548#comment-15115548 ] Dinesh S. Atreya commented on HDFS-9607: Please see updated API from above below: {code:title=FSWriteInPlaceStream.java|borderStyle=solid} // (alternate names welcome) extends FSDataOutputStream longgetPos() // Get the current position, note FSDataOutputStream already has it. voidseek(long desiredWritePos) // Seek to the given position in file int write(long position, byte[] writeBuffer, int readLength) throws IOException // Write/Update bytes from writeBuffer up to previously read length // at given position in file int write(long position, int readLength, byte[] writeBuffer, int offset, int readLength) throws IOException // Write/Update bytes from writeBuffer up to previously read length // after seek in file starting at offset. boolean canWrite(long position, byte[] writeBuffer, int readLength) // Check whether Write/Update of bytes from writeBuffer up to // previously read length at given position is possible inside file boolean canWrite(long position, int readLength, byte[] writeBuffer, int offset, int readLength) // Check whether Write/Update of bytes from writeBuffer up to // previously read length after seek is possible inside file starting at offset. {code} > Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place) > > > Key: HDFS-9607 > URL: https://issues.apache.org/jira/browse/HDFS-9607 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Dinesh S. Atreya > > Link to Umbrella JIRA > https://issues.apache.org/jira/browse/HADOOP-12620 > Provide capability to carry out in-place writes/updates. Only writes in-place > are supported where the existing length does not change. > For example, "Hello World" can be replaced by "Hello HDFS!" > See > https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300 > for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)
[ https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115577#comment-15115577 ] Dinesh S. Atreya commented on HDFS-9607: Alternatively, assuming {{getPos()}} and {{seek}} as given above are included {code:title=FSWriteInPlaceStream.java|borderStyle=solid} // (alternate names welcome) extends FSDataOutputStream voidsetReadLength(int length) // Set the length that had been read earlier. int getReadLength() // Get the read length that has been set. int write(long position, byte[] writeBuffer) throws IOException // Write/Update bytes from writeBuffer up to previously read length // at given position in file int write(long position, byte[] writeBuffer, int offset) throws IOException // Write/Update bytes from writeBuffer up to previously read length // after seek in file starting at offset. boolean canWrite(long position, byte[] writeBuffer) // Check whether Write/Update of bytes from writeBuffer up to // previously read length at given position is possible inside file boolean canWrite(long position, byte[] writeBuffer, int offset) // Check whether Write/Update of bytes from writeBuffer up to // previously read length after seek is possible inside file starting at offset. {code} > Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place) > > > Key: HDFS-9607 > URL: https://issues.apache.org/jira/browse/HDFS-9607 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Dinesh S. Atreya > > Link to Umbrella JIRA > https://issues.apache.org/jira/browse/HADOOP-12620 > Provide capability to carry out in-place writes/updates. Only writes in-place > are supported where the existing length does not change. > For example, "Hello World" can be replaced by "Hello HDFS!" > See > https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300 > for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9696) Garbage snapshot records lingering forever
Kihwal Lee created HDFS-9696: Summary: Garbage snapshot records lingering forever Key: HDFS-9696 URL: https://issues.apache.org/jira/browse/HDFS-9696 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.2 Reporter: Kihwal Lee Priority: Critical We have a cluster where the snapshot feature might have been tested years ago. When the HDFS does not have any snapshot, but I see filediff records persisted in its fsimage. Since it has been restarted many times and checkpointed over 100 times since then, it must haven been persisted and carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9118) Add logging system for libdhfs++
[ https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer reassigned HDFS-9118: - Assignee: James Clampffer > Add logging system for libdhfs++ > > > Key: HDFS-9118 > URL: https://issues.apache.org/jira/browse/HDFS-9118 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: James Clampffer > > With HDFS-9505, we've starting logging data from libhdfs++. Consumers of the > library are going to have their own logging infrastructure that we're going > to want to provide data to. > libhdfs++ should have a logging library that: > * Is overridable and can provide sufficient information to work well with > common C++ logging frameworks > * Has a rational default implementation > * Is performant -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)
[ https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115572#comment-15115572 ] Dinesh S. Atreya commented on HDFS-9607: Correction {code} int write(long position, byte[] writeBuffer, int offset, int readLength) throws IOException // Write/Update bytes from writeBuffer up to previously read length // after seek in file starting at offset. {code} should replace {code} int write(long position, int readLength, byte[] writeBuffer, int offset, int readLength) throws IOException // Write/Update bytes from writeBuffer up to previously read length // after seek in file starting at offset. {code} > Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place) > > > Key: HDFS-9607 > URL: https://issues.apache.org/jira/browse/HDFS-9607 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Dinesh S. Atreya > > Link to Umbrella JIRA > https://issues.apache.org/jira/browse/HADOOP-12620 > Provide capability to carry out in-place writes/updates. Only writes in-place > are supported where the existing length does not change. > For example, "Hello World" can be replaced by "Hello HDFS!" > See > https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300 > for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9690) addBlock is not idempotent
[ https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115568#comment-15115568 ] Vinayakumar B commented on HDFS-9690: - Tried committing, patch doesnt apply to branch-2.7 as there is no FSDirWriteFileOp.java in branch-2.7. > addBlock is not idempotent > -- > > Key: HDFS-9690 > URL: https://issues.apache.org/jira/browse/HDFS-9690 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9690_20160124.patch, h9690_20160124b.patch > > > TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the > bug. It failed in the following builds. > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9684) DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.
[ https://issues.apache.org/jira/browse/HDFS-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115523#comment-15115523 ] Kihwal Lee commented on HDFS-9684: -- That usually means the ulimit is reached. What is the max Xceiver limit in the datanode config? And what is the datanode user's limit on fork/clone? I.e. {{ulimit -u}}. On a rare occasion, the system can run out of PID. I think the default on most linux distros is 32K. You can raise it if that's causing the problem. > DataNode stopped sending heartbeat after getting OutOfMemoryError form > DataTransfer thread. > --- > > Key: HDFS-9684 > URL: https://issues.apache.org/jira/browse/HDFS-9684 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Blocker > Attachments: HDFS-9684.01.patch > > > {noformat} > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1999) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2008) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:857) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9690) addBlock is not idempotent
[ https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9690: - Target Version/s: 2.7.3 > addBlock is not idempotent > -- > > Key: HDFS-9690 > URL: https://issues.apache.org/jira/browse/HDFS-9690 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9690_20160124.patch, h9690_20160124b.patch > > > TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the > bug. It failed in the following builds. > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ > - > https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang reassigned HDFS-9696: --- Assignee: Yongjun Zhang Hi Kihwal, Since I am working on, do you mind if I assign it to myself? Thanks. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Yongjun Zhang >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115754#comment-15115754 ] Kihwal Lee commented on HDFS-9696: -- bq. do you mind if I assign it to myself? I don't. But I noticed that none of the original snapshot feature developers are watching HDFS-9406. At some point, we should call them out. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Yongjun Zhang >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9697) NN fails to restart due to corrupt fsimage caused by snapshot handling
[ https://issues.apache.org/jira/browse/HDFS-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9697: Summary: NN fails to restart due to corrupt fsimage caused by snapshot handling (was: NN fails to restart due to corrupt fsimage) > NN fails to restart due to corrupt fsimage caused by snapshot handling > -- > > Key: HDFS-9697 > URL: https://issues.apache.org/jira/browse/HDFS-9697 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > This is related to HDFS-9406, but not quite the same symptom. > {quote} > ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReference(FSImageFormatPBSnapshot.java:114) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReferenceSection(FSImageFormatPBSnapshot.java:105) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:258) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1062) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:766) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:589) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:818) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:797) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1561) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115595#comment-15115595 ] Kihwal Lee commented on HDFS-9696: -- {code:xml} 0 ... 1638543008443-10action-data.seq 43108392-1302some_random_file ... {code} The file with inode number 43008443 exists. As it is shown, there is no snapshot that SnapshotManager is aware of and the snapshot ID of all filediff entries are -1. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9697) NN fails to restart due to corrupt fsimage
Yongjun Zhang created HDFS-9697: --- Summary: NN fails to restart due to corrupt fsimage Key: HDFS-9697 URL: https://issues.apache.org/jira/browse/HDFS-9697 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Yongjun Zhang Assignee: Yongjun Zhang This is related to HDFS-9406, but not quite the same symptom. {quote} ERROR namenode.NameNode: Failed to start namenode. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReference(FSImageFormatPBSnapshot.java:114) at org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReferenceSection(FSImageFormatPBSnapshot.java:105) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:258) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1062) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:766) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:818) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:797) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1561) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115690#comment-15115690 ] Yongjun Zhang commented on HDFS-9696: - Hi [~kihwal], Thanks much for reporting this issue. I have been looking in to HDFS-9406 and observed the same. I have made progress on HDFS-9406 and am still working on. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.
[ https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115710#comment-15115710 ] Jing Zhao commented on HDFS-8999: - Thanks for verifying the test failures, Nicholas! +1 committing the latest patch to trunk. Please see if you plan to commit it to branch-2 as well. > Namenode need not wait for {{blockReceived}} for the last block before > completing a file. > - > > Key: HDFS-8999 > URL: https://issues.apache.org/jira/browse/HDFS-8999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze > Attachments: h8999_20151228.patch, h8999_20160106.patch, > h8999_20160106b.patch, h8999_20160106c.patch, h8999_20160111.patch, > h8999_20160113.patch, h8999_20160114.patch, h8999_20160121.patch, > h8999_20160121b.patch > > > This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment > from the jira: > {quote} > ...whether we need to let NameNode wait for all the block_received msgs to > announce the replica is safe. Looking into the code, now we have ># NameNode knows the DataNodes involved when initially setting up the > writing pipeline ># If any DataNode fails during the writing, client bumps the GS and > finally reports all the DataNodes included in the new pipeline to NameNode > through the updatePipeline RPC. ># When the client received the ack for the last packet of the block (and > before the client tries to close the file on NameNode), the replica has been > finalized in all the DataNodes. > Then in this case, when NameNode receives the close request from the client, > the NameNode already knows the latest replicas for the block. Currently the > checkReplication call only counts in all the replicas that NN has already > received the block_received msg, but based on the above #2 and #3, it may be > safe to also count in all the replicas in the > BlockUnderConstructionFeature#replicas? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots
[ https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling updated HDFS-9700: Attachment: HDFS-9700_branch-2.7.patch The attached patch is against branch-2.7. For an HBase deployment on secure Hadoop, this reliably lowers our P95 write latencies from 40ms+ to ~2ms. I'm still working out how/if the same changes apply to trunk. > DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots > > > Key: HDFS-9700 > URL: https://issues.apache.org/jira/browse/HDFS-9700 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1, 2.6.3 >Reporter: Gary Helmling > Attachments: HDFS-9700_branch-2.7.patch > > > In {{DFSClient.connectToDN()}} and > {{DFSOutputStream.createSocketForPipeline()}}, we never call > {{setTcpNoDelay()}} on the constructed socket before sending. In both cases, > we should respect the value of ipc.client.tcpnodelay in the configuration. > While this applies whether security is enabled or not, it seems to have a > bigger impact on latency when security is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116012#comment-15116012 ] Elliott Clark commented on HDFS-9669: - Ping? This is running in production and removes thousands of tcp resets. > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++
[ https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116065#comment-15116065 ] James Clampffer commented on HDFS-9118: --- Shouldn't getting the stack address of a local variable boil down (at least on x86) to just reading whats in ESP-a constant offset? That should be doable in a few cycles, superscalar complications aside, unless I'm missing something. Either way good idea on allowing things to opt-in based on logging levels. I'll add that. > Add logging system for libdhfs++ > > > Key: HDFS-9118 > URL: https://issues.apache.org/jira/browse/HDFS-9118 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: James Clampffer > > With HDFS-9505, we've starting logging data from libhdfs++. Consumers of the > library are going to have their own logging infrastructure that we're going > to want to provide data to. > libhdfs++ should have a logging library that: > * Is overridable and can provide sufficient information to work well with > common C++ logging frameworks > * Has a rational default implementation > * Is performant -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots
[ https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling updated HDFS-9700: Attachment: HDFS-9700-v1.patch Attaching a patch for the same changes against trunk. > DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots > > > Key: HDFS-9700 > URL: https://issues.apache.org/jira/browse/HDFS-9700 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1, 2.6.3 >Reporter: Gary Helmling > Attachments: HDFS-9700-v1.patch, HDFS-9700_branch-2.7.patch > > > In {{DFSClient.connectToDN()}} and > {{DFSOutputStream.createSocketForPipeline()}}, we never call > {{setTcpNoDelay()}} on the constructed socket before sending. In both cases, > we should respect the value of ipc.client.tcpnodelay in the configuration. > While this applies whether security is enabled or not, it seems to have a > bigger impact on latency when security is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots
[ https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling updated HDFS-9700: Status: Patch Available (was: Open) > DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots > > > Key: HDFS-9700 > URL: https://issues.apache.org/jira/browse/HDFS-9700 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.6.3, 2.7.1 >Reporter: Gary Helmling > Attachments: HDFS-9700-v1.patch, HDFS-9700_branch-2.7.patch > > > In {{DFSClient.connectToDN()}} and > {{DFSOutputStream.createSocketForPipeline()}}, we never call > {{setTcpNoDelay()}} on the constructed socket before sending. In both cases, > we should respect the value of ipc.client.tcpnodelay in the configuration. > While this applies whether security is enabled or not, it seems to have a > bigger impact on latency when security is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9701) DN may deadlock when hot-swapping under load
Xiao Chen created HDFS-9701: --- Summary: DN may deadlock when hot-swapping under load Key: HDFS-9701 URL: https://issues.apache.org/jira/browse/HDFS-9701 Project: Hadoop HDFS Issue Type: Bug Reporter: Xiao Chen Assignee: Xiao Chen If the DN is under load (new blocks being written), a hot-swap task by {{hdfs dfsadmin -reconfig}} may cause a dead lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots
[ https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116161#comment-15116161 ] Hadoop QA commented on HDFS-9700: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 33s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12784241/HDFS-9700-v1.patch | | JIRA Issue | HDFS-9700 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4f1abf5f155b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh
[jira] [Commented] (HDFS-9701) DN may deadlock when hot-swapping under load
[ https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116149#comment-15116149 ] Xiao Chen commented on HDFS-9701: - Most notable jstacks: Reconfigure task: {noformat} "Reconfiguration Task" #459 daemon prio=5 os_prio=0 tid=0x7fc6913a6000 nid=0x5219 waiting on condition [0x7fc663cde000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.closeAndWait(FsVolumeImpl.java:251) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:322) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:363) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:472) - locked <0xd6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:718) - locked <0xd55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode) at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:684) at org.apache.hadoop.hdfs.server.datanode.DataNode.refreshVolumes(DataNode.java:648) - locked <0xd55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode) at org.apache.hadoop.hdfs.server.datanode.DataNode.reconfigurePropertyImpl(DataNode.java:485) at org.apache.hadoop.conf.ReconfigurableBase$ReconfigurationThread.run(ReconfigurableBase.java:133) {noformat} Being written thread: {noformat} "PacketResponder: BP-284727513-10.64.40.36-1450767058747:blk_1073785044_44298, type=HAS_DOWNSTREAM_IN_PIPELINE" #462 daemon prio=5 os_prio=0 tid=0x7fc67c5c8000 nid=0x5268 waiting for monitor entry [0x7fc662ed2000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1487) - waiting to lock <0xd6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1300) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1257) at java.lang.Thread.run(Thread.java:745) {noformat} The deadlock happens between a lock and a reference count waiting: # in {{BlockReceiver$PacketResponder#finalizeBlock}}, reference is increased after {{claimReplicaHandler}}. (Code [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java#L1426]) # Reconfigure task locks on the {{FsDatasetImpl}} object # Reconfigure task calls all the way into {{FsVolumeImpl#closeAndWait}}, infinite loop waiting on reference count (Code [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java#L249]) # {{BlockReceiver$PacketResponder#finalizeBlock}} waits on the {{FsDatasetImpl}} object's lock in step #2. Oops. > DN may deadlock when hot-swapping under load > > > Key: HDFS-9701 > URL: https://issues.apache.org/jira/browse/HDFS-9701 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen > > If the DN is under load (new blocks being written), a hot-swap task by {{hdfs > dfsadmin -reconfig}} may cause a dead lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9330) Support reconfiguring dfs.datanode.duplicate.replica.deletion without DN restart
[ https://issues.apache.org/jira/browse/HDFS-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116162#comment-15116162 ] Hadoop QA commented on HDFS-9330: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 284 unchanged - 10 fixed = 285 total (was 294) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 13s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 28s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 128m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.server.datanode.TestBlockScanner | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12784218/HDFS-9330-HDFS-9000.003.patch | | JIRA Issue | HDFS-9330 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 53bba4c69937 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3
[jira] [Created] (HDFS-9702) DiskBalancer : getVolumeMap implementation
Anu Engineer created HDFS-9702: -- Summary: DiskBalancer : getVolumeMap implementation Key: HDFS-9702 URL: https://issues.apache.org/jira/browse/HDFS-9702 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer & mover Affects Versions: HDFS-1312 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: HDFS-1312 Add get volume map -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9702) DiskBalancer : getVolumeMap implementation
[ https://issues.apache.org/jira/browse/HDFS-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-9702: --- Attachment: HDFS-9702-HDFS-1312.001.patch Adding patch for code review. This is dependent on HDFS-9683 > DiskBalancer : getVolumeMap implementation > -- > > Key: HDFS-9702 > URL: https://issues.apache.org/jira/browse/HDFS-9702 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: HDFS-1312 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: HDFS-1312 > > Attachments: HDFS-9702-HDFS-1312.001.patch > > > Add get volume map -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9703) DiskBalancer : getBandwidth implementation
[ https://issues.apache.org/jira/browse/HDFS-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-9703: --- Attachment: HDFS-9703-HDFS-1312.001.patch Adding patch for code review. This is dependent on HDFS-9702 > DiskBalancer : getBandwidth implementation > -- > > Key: HDFS-9703 > URL: https://issues.apache.org/jira/browse/HDFS-9703 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: HDFS-1312 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: HDFS-1312 > > Attachments: HDFS-9703-HDFS-1312.001.patch > > > Add getBandwidth call -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9262) Support reconfiguring dfs.datanode.lazywriter.interval.sec without DN restart
[ https://issues.apache.org/jira/browse/HDFS-9262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116176#comment-15116176 ] Hadoop QA commented on HDFS-9262: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 2s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 251 unchanged - 11 fixed = 252 total (was 262) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 17s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 57s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 155m 35s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestBlockStoragePolicy | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.tools.TestDFSAdmin | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.TestRenameWhileOpen | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | hadoop.hdfs.tools.TestDFSAdminWithHA | | | hadoop.hdfs.tools.TestDFSAdmin | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (HDFS-9525) hadoop utilities need to support provided delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116289#comment-15116289 ] Owen O'Malley commented on HDFS-9525: - [~daryn] I'm sorry, but I don't see what problem the patch introduced. It lets your webhdfs have a token even if your security is turned off as long as it was already in the UGI. Where is the problem? > hadoop utilities need to support provided delegation tokens > --- > > Key: HDFS-9525 > URL: https://issues.apache.org/jira/browse/HDFS-9525 > Project: Hadoop HDFS > Issue Type: New Feature > Components: security >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: HeeSoo Kim >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, > HDFS-7984.003.patch, HDFS-7984.004.patch, HDFS-7984.005.patch, > HDFS-7984.006.patch, HDFS-7984.007.patch, HDFS-7984.patch, > HDFS-9525.008.patch, HDFS-9525.009.patch, HDFS-9525.009.patch, > HDFS-9525.branch-2.008.patch, HDFS-9525.branch-2.009.patch > > > When using the webhdfs:// filesystem (especially from distcp), we need the > ability to inject a delegation token rather than webhdfs initialize its own. > This would allow for cross-authentication-zone file system accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9701) DN may deadlock when hot-swapping under load
[ https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9701: Attachment: HDFS-9701.01.patch > DN may deadlock when hot-swapping under load > > > Key: HDFS-9701 > URL: https://issues.apache.org/jira/browse/HDFS-9701 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9701.01.patch > > > If the DN is under load (new blocks being written), a hot-swap task by {{hdfs > dfsadmin -reconfig}} may cause a dead lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9701) DN may deadlock when hot-swapping under load
[ https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9701: Status: Patch Available (was: Open) > DN may deadlock when hot-swapping under load > > > Key: HDFS-9701 > URL: https://issues.apache.org/jira/browse/HDFS-9701 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9701.01.patch > > > If the DN is under load (new blocks being written), a hot-swap task by {{hdfs > dfsadmin -reconfig}} may cause a dead lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9701) DN may deadlock when hot-swapping under load
[ https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116165#comment-15116165 ] Xiao Chen commented on HDFS-9701: - Thanks a lot to [~eddyxu] for the offline discussion about the general ideas! To fix the problem, we have several ways: 1. Don't do infinite wait in {{FsVolumeImpl#closeAndWait}}, wait outside of the lock scope 2. Use finer-grained locks on the volumes in FsDatasetImpl. I think option 1 is better since the change is smaller, and the infinite wait inside seems a bit scary to me. Patch 1 attempts to solve the problem along option 1. - Moved the wait-for-close logic to outside of the {{FsDatasetImpl}}, into {{DataNode}}. - Had to add a new interface to {{FsDatasetSpi}} - Added methods along the call stack to allow the above - Added a new unit test in {{TestFsDatasetImpl}} that locks before the patch and passes after - Had to modify the {{TestFsVolumeList}} to accommodate the change - Added more info into the log in {{BlockReceiver}} which I found useful when root causing the problem. - Added a missing {{@Override}} in {{FsDatasetImpl}} > DN may deadlock when hot-swapping under load > > > Key: HDFS-9701 > URL: https://issues.apache.org/jira/browse/HDFS-9701 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9701.01.patch > > > If the DN is under load (new blocks being written), a hot-swap task by {{hdfs > dfsadmin -reconfig}} may cause a dead lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9698) Long running Balancer should renew TGT
[ https://issues.apache.org/jira/browse/HDFS-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116184#comment-15116184 ] Hadoop QA commented on HDFS-9698: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 38 unchanged - 0 fixed = 39 total (was 38) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 10s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 7s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 182m 15s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.server.namenode.TestFSImageWithSnapshot | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.server.datanode.TestBlockScanner | | |
[jira] [Created] (HDFS-9703) DiskBalancer : getBandwidth implementation
Anu Engineer created HDFS-9703: -- Summary: DiskBalancer : getBandwidth implementation Key: HDFS-9703 URL: https://issues.apache.org/jira/browse/HDFS-9703 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer & mover Affects Versions: HDFS-1312 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: HDFS-1312 Add getBandwidth call -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9476) TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail
[ https://issues.apache.org/jira/browse/HDFS-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115775#comment-15115775 ] Mingliang Liu commented on HDFS-9476: - It happens in recent build as well, see UT log at: https://builds.apache.org/job/PreCommit-HDFS-Build/14230/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt > TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail > - > > Key: HDFS-9476 > URL: https://issues.apache.org/jira/browse/HDFS-9476 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang > > This test occasionally fail. For example, the most recent one is: > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2587/ > Error Message > {noformat} > Cannot obtain block length for > LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020; > getBlockSize()=1024; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]} > {noformat} > Stacktrace > {noformat} > java.io.IOException: Cannot obtain block length for > LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020; > getBlockSize()=1024; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]} > at > org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:399) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:343) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275) > at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:265) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1046) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1011) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.dfsOpenFileWithRetries(TestDFSUpgradeFromImage.java:177) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyDir(TestDFSUpgradeFromImage.java:213) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyFileSystem(TestDFSUpgradeFromImage.java:228) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:600) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel1BBWImage(TestDFSUpgradeFromImage.java:622) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9698) Long running Balancer should renew TGT
[ https://issues.apache.org/jira/browse/HDFS-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9698: Status: Patch Available (was: Open) > Long running Balancer should renew TGT > -- > > Key: HDFS-9698 > URL: https://issues.apache.org/jira/browse/HDFS-9698 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, security >Affects Versions: 2.6.3 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > > When the {{Balancer}} runs beyond the configured TGT lifetime, the current > logic won't renew TGT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115802#comment-15115802 ] Yongjun Zhang commented on HDFS-9696: - Ah, I intended to write the request message in my prior comment before reassigning, just found that I accidentally reassigned together with the request message. Sorry about that. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Yongjun Zhang >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9689) Test o.a.h.hdfs.TestRenameWhileOpen fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115838#comment-15115838 ] Mingliang Liu commented on HDFS-9689: - Thanks [~iwasakims] and [~vinayrpet] for your insightful comments. {quote} Are other test processes still possible to bind the nn port between shutdownNameNode and createNameNode in MiniDFSCluster#restartNameNode? {quote} Yes. This is why I cancelled the patch. If the port changes in the process of restart NN, the NN will never leave the safe mode, leading to timed out exception. {quote} So, to completely resolve issues raising due to Port Bind issues, from restart (Name|Data)nodes needs some effort {quote} I totally agree with you. The to-do list you proposed seems to fix these kind of errors fundamentally. Shall we address this in separate jira? > Test o.a.h.hdfs.TestRenameWhileOpen fails intermittently > - > > Key: HDFS-9689 > URL: https://issues.apache.org/jira/browse/HDFS-9689 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9689.000.patch > > > The test fails in recent builds, e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/14063/testReport/org.apache.hadoop.hdfs/TestRenameWhileOpen/ > and > https://builds.apache.org/job/PreCommit-HDFS-Build/14212/testReport/org.apache.hadoop.hdfs/TestRenameWhileOpen/testWhileOpenRenameToNonExistentDirectory/ > The *Error Message* is like: > {code} > Problem binding to [localhost:60690] java.net.BindException: Address already > in use; For more details see: http://wiki.apache.org/hadoop/BindException > {code} > and *Stacktrace* is: > {code} > java.net.BindException: Problem binding to [localhost:60690] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:469) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:695) > at org.apache.hadoop.ipc.Server.(Server.java:2464) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:392) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:743) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:685) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:884) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:863) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1581) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247) > at > org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441) > at > org.apache.hadoop.hdfs.TestRenameWhileOpen.testWhileOpenRenameToNonExistentDirectory(TestRenameWhileOpen.java:332) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115879#comment-15115879 ] Yongjun Zhang commented on HDFS-9696: - Yes [~jingzhao], your analysis is correct to me per my study in HDFS-9406. Thanks. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Yongjun Zhang >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots
Gary Helmling created HDFS-9700: --- Summary: DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots Key: HDFS-9700 URL: https://issues.apache.org/jira/browse/HDFS-9700 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.3, 2.7.1 Reporter: Gary Helmling In {{DFSClient.connectToDN()}} and {{DFSOutputStream.createSocketForPipeline()}}, we never call {{setTcpNoDelay()}} on the constructed socket before sending. In both cases, we should respect the value of ipc.client.tcpnodelay in the configuration. While this applies whether security is enabled or not, it seems to have a bigger impact on latency when security is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9672) o.a.h.hdfs.TestLeaseRecovery2 fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116453#comment-15116453 ] Mingliang Liu commented on HDFS-9672: - Thanks for the discussion, review and commit, [~jnp]! > o.a.h.hdfs.TestLeaseRecovery2 fails intermittently > -- > > Key: HDFS-9672 > URL: https://issues.apache.org/jira/browse/HDFS-9672 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9672.000.patch, HDFS-9672.001.patch > > > It fails in recent builds, see: > https://builds.apache.org/job/PreCommit-HDFS-Build/14177/testReport/org.apache.hadoop.hdfs/ > https://builds.apache.org/job/PreCommit-HDFS-Build/14147/testReport/org.apache.hadoop.hdfs/ > Failing test methods include: > * > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart > * org.apache.hadoop.hdfs.TestLeaseRecovery2.testLeaseRecoverByAnotherUser > * org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecovery > * > org.apache.hadoop.hdfs.TestLeaseRecovery2.org.apache.hadoop.hdfs.TestLeaseRecovery2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9701) DN may deadlock when hot-swapping under load
[ https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116466#comment-15116466 ] Hadoop QA commented on HDFS-9701: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 39s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 335 unchanged - 1 fixed = 337 total (was 336) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 49s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 33s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 18s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 53s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 199m 57s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(Set, boolean) calls Thread.sleep() with a lock held At DataNode.java:a lock held At DataNode.java:[line 805] | | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.qjournal.client.TestQuorumJournalManager | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | |
[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
[ https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116490#comment-15116490 ] Sangjin Lee commented on HDFS-9579: --- [~mingma], the patch no longer applies cleanly. Do you mind updating the patch? Thanks! > Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level > - > > Key: HDFS-9579 > URL: https://issues.apache.org/jira/browse/HDFS-9579 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9579-2.patch, HDFS-9579-3.patch, HDFS-9579-4.patch, > HDFS-9579.patch, MR job counters.png > > > For cross DC distcp or other applications, it becomes useful to have insight > as to the traffic volume for each network distance to distinguish cross-DC > traffic, local-DC-remote-rack, etc. > FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To > provide additional metrics for each network distance, we can add additional > metrics to FileSystem level and have {{DFSInputStream}} update the value > based on the network distance between client and the datanode. > {{DFSClient}} will resolve client machine's network location as part of its > initialization. It doesn't need to resolve datanode's network location for > each read as {{DatanodeInfo}} already has the info. > There are existing HDFS specific metrics such as {{ReadStatistics}} and > {{DFSHedgedReadMetrics}}. But these metrics are only accessible via > {{DFSClient}} or {{DFSInputStream}}. Not something that application framework > such as MR and Tez can get to. That is the benefit of storing these new > metrics in FileSystem.Statistics. > This jira only includes metrics generation by HDFS. The consumption of these > metrics at MR and Tez will be tracked by separated jiras. > We can add similar metrics for HDFS write scenario later if it is necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9664) TestRollingUpgrade.testRollback failed frequently
[ https://issues.apache.org/jira/browse/HDFS-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116320#comment-15116320 ] Mingliang Liu commented on HDFS-9664: - Perhaps there is a fundamental problem in the rolling upgrade. Otherwise, * If we don't really need check exit exception when shutdowning the cluster, we may ignore this exception by building a cluster with {{checkExitOnShutdown(false)}} option. * Meanwhile, restarting NN/DN is not keep the RPC port. If the port is different, the NN may never leave safe mode because not enough DN registers (see [HDFS-9689]). Will this break the test? > TestRollingUpgrade.testRollback failed frequently > - > > Key: HDFS-9664 > URL: https://issues.apache.org/jira/browse/HDFS-9664 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang > > Seen the following failure in the following jenkins test runs: > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/2743/testReport > (2016-01-18 22:14:23) > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/2742/testReport > (2016-01-18 17:52:58) > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/2739/testReport > (2016-01-18 01:51:26) > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/2738/testReport > (2016-01-17 21:56:17) > Failed test: org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback > {quote} > Error Message > Test resulted in an unexpected exit > Stacktrace > java.lang.AssertionError: Test resulted in an unexpected exit > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1895) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1882) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1875) > at > org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:350) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9677) Rename generationStampV1/generationStampV2 to legacyGenerationStamp/generationStamp
[ https://issues.apache.org/jira/browse/HDFS-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116366#comment-15116366 ] Mingliang Liu commented on HDFS-9677: - The v1 patch basically renames the {{generationStampV1}} => {{legacyGenerationStamp}} and {{generationStampV2}} => {{generationStamp}}. I think the renaming is reasonable as I don't see any loss of readability. Meanwhile, there are existing comments for the usages of the variables to elaborate the difference. We have other cases using "legacy" in class/variable names, e.g. {{BlockReaderLocalLegacy}}, {{legacyBlock}}. Another option is to change {{generationStampV1}} to {{generationStampRandom}}, {{generationStampV2}} to {{generationStampSequential}}. These are real names. However, they're representing how it does, not exactly "what it does" 'cause they're implementation specific. Suppose a new user plays with generation stamp for the very first time, she needs to know implementation details before she is able to tell which one is legacy or deprecated. Even with current V1/V2 naming, we should not blame a new user who wonders whether a {{generationStampV3}} version exists. I'd happy to refine the patch for further useful input. > Rename generationStampV1/generationStampV2 to > legacyGenerationStamp/generationStamp > --- > > Key: HDFS-9677 > URL: https://issues.apache.org/jira/browse/HDFS-9677 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Attachments: HDFS-9677.000.patch, HDFS-9677.001.patch > > > [comment|https://issues.apache.org/jira/browse/HDFS-9542?focusedCommentId=15110531=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15110531] > from [~drankye] in HDFS-9542: > {quote} > Just wonder if it's a good idea to rename: generationStampV1 => > legacyGenerationStamp; generationStampV2 => generationStamp, similar for > other variables, as we have legacy block and block. > {quote} > This jira plans to do this rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332)