[jira] [Commented] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.
[ https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205838#comment-15205838 ] Hadoop QA commented on HDFS-9483: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 10m 20s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12776616/HDFS-9483.001.patch | | JIRA Issue | HDFS-9483 | | Optional Tests | asflicense mvnsite | | uname | Linux 8c9669470813 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e7ed05e | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14887/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured > WebHDFS. > - > > Key: HDFS-9483 > URL: https://issues.apache.org/jira/browse/HDFS-9483 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Chris Nauroth >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9483.001.patch, HDFS-9483.patch > > > If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in > a URL to access it. The current documentation does not state this anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-10185: - Resolution: Not A Problem Status: Resolved (was: Patch Available) Closed the jira, because this is a not a problem. > TestHFlushInterrupted verifies interrupt state incorrectly > -- > > Key: HDFS-10185 > URL: https://issues.apache.org/jira/browse/HDFS-10185 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10185.001.patch > > > In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places > verifying interrupt state incorrectly. As follow: > {code} > Thread.currentThread().interrupt(); > try { > stm.hflush(); > // If we made it past the hflush(), then that means that the ack made > it back >// from the pipeline before we got to the wait() call. In that case we > should > // still have interrupted status. > assertTrue(Thread.interrupted()); > } catch (InterruptedIOException ie) { > System.out.println("Got expected exception during flush"); > } > {code} > When stm do the {{hflush}} operation, it will throw interrupted exception and > the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put > this before the {{hflush}}, this method will clear interrupted state and the > expected exception will not be throw. The similar problem also appears after > in stm.close. > So we should use a way to get state without clearing interrupted state like > {{Thread.currentThread().isInterrupted()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9959) add log when block removed from last live datanode
[ https://issues.apache.org/jira/browse/HDFS-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205802#comment-15205802 ] Hadoop QA commented on HDFS-9959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 51s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 14 unchanged - 0 fixed = 17 total (was 14) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 49s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 33s {color} | {color:red} Patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 115m 15s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | JDK v1.8.0_74 Timed out junit tests |
[jira] [Commented] (HDFS-7648) Verify that HDFS blocks are in the correct datanode directories
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205769#comment-15205769 ] Rakesh R commented on HDFS-7648: Thanks [~szetszwo] for the reply. bq. Why it needs a separated sub-task but not just updating the patch here? bq. The two chunks of code above are duplicated. Please add a helper method to avoid the duplication. Thanks. The attached patches in this jira is quite old. Please ignore these patches as the propsed changes has been implemented using HDFS-7819. I have raised this sub-task based on the [discussion|https://issues.apache.org/jira/browse/HDFS-7648?focusedCommentId=14332305=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14332305]. Also, please refer [DirectoryScanner.java#L916|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java#L916] to see the code changes. Initially the discussion was to identify the wrong items and automatically fixing it, but couldn't reach to a conclusion. Since we haven't fully implemented the original idea of fixing the wrong blocks I thought of keeping this jira open(one can refer this jira comments to get the background) and fix smaller parts separately through sub-tasks. Should I delete the attached old patches from this jira to avoid confusions if any? > Verify that HDFS blocks are in the correct datanode directories > --- > > Key: HDFS-7648 > URL: https://issues.apache.org/jira/browse/HDFS-7648 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo Nicholas Sze >Assignee: Rakesh R > Attachments: HDFS-7648-3.patch, HDFS-7648-4.patch, HDFS-7648-5.patch, > HDFS-7648.patch, HDFS-7648.patch > > > HDFS-6482 changed datanode layout to use block ID to determine the directory > to store the block. We should have some mechanism to verify it. Either > DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10190) Expose FSDatasetImpl lock metrics
John Zhuge created HDFS-10190: - Summary: Expose FSDatasetImpl lock metrics Key: HDFS-10190 URL: https://issues.apache.org/jira/browse/HDFS-10190 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.8.0 Reporter: John Zhuge Assignee: John Zhuge Expose FSDatasetImpl lock metrics: * Number of lock calls * Contention rate * Average wait time Locks of interest: * FsDatasetImpl intrinsic lock * FsDatasetImpl.statsLock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205713#comment-15205713 ] John Zhuge commented on HDFS-10185: --- Close it then. Thanks for attempting to fix this unit test though, it is definitely flaky. > TestHFlushInterrupted verifies interrupt state incorrectly > -- > > Key: HDFS-10185 > URL: https://issues.apache.org/jira/browse/HDFS-10185 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10185.001.patch > > > In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places > verifying interrupt state incorrectly. As follow: > {code} > Thread.currentThread().interrupt(); > try { > stm.hflush(); > // If we made it past the hflush(), then that means that the ack made > it back >// from the pipeline before we got to the wait() call. In that case we > should > // still have interrupted status. > assertTrue(Thread.interrupted()); > } catch (InterruptedIOException ie) { > System.out.println("Got expected exception during flush"); > } > {code} > When stm do the {{hflush}} operation, it will throw interrupted exception and > the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put > this before the {{hflush}}, this method will clear interrupted state and the > expected exception will not be throw. The similar problem also appears after > in stm.close. > So we should use a way to get state without clearing interrupted state like > {{Thread.currentThread().isInterrupted()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205707#comment-15205707 ] Lin Yiqun commented on HDFS-10185: -- {quote} However, do you think this comment is no longer true? {quote} The comment is true, and it means this issue is not a exactly problem. So shall I invalidate this jira or do you have other comments for this, [~jzhuge]. > TestHFlushInterrupted verifies interrupt state incorrectly > -- > > Key: HDFS-10185 > URL: https://issues.apache.org/jira/browse/HDFS-10185 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10185.001.patch > > > In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places > verifying interrupt state incorrectly. As follow: > {code} > Thread.currentThread().interrupt(); > try { > stm.hflush(); > // If we made it past the hflush(), then that means that the ack made > it back >// from the pipeline before we got to the wait() call. In that case we > should > // still have interrupted status. > assertTrue(Thread.interrupted()); > } catch (InterruptedIOException ie) { > System.out.println("Got expected exception during flush"); > } > {code} > When stm do the {{hflush}} operation, it will throw interrupted exception and > the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put > this before the {{hflush}}, this method will clear interrupted state and the > expected exception will not be throw. The similar problem also appears after > in stm.close. > So we should use a way to get state without clearing interrupted state like > {{Thread.currentThread().isInterrupted()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9559) Add haadmin command to get HA state of all the namenodes
[ https://issues.apache.org/jira/browse/HDFS-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205701#comment-15205701 ] Surendra Singh Lilhore commented on HDFS-9559: -- Please can someone review this jira? This command is useful in multiple namenode cluster ( HDFS-6440 ). > Add haadmin command to get HA state of all the namenodes > > > Key: HDFS-9559 > URL: https://issues.apache.org/jira/browse/HDFS-9559 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9559.01.patch > > > Currently we have one command to get state of namenode. > {code} > ./hdfs haadmin -getServiceState > {code} > It will be good to have command which will give state of all the namenodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.
[ https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205678#comment-15205678 ] Surendra Singh Lilhore commented on HDFS-9483: -- [~cnauroth], Please can you review... > Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured > WebHDFS. > - > > Key: HDFS-9483 > URL: https://issues.apache.org/jira/browse/HDFS-9483 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Chris Nauroth >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9483.001.patch, HDFS-9483.patch > > > If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in > a URL to access it. The current documentation does not state this anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205674#comment-15205674 ] Vinayakumar B commented on HDFS-9847: - bq. Can we use nothrow exception way, but still print out the lossing precision log infos to let users know. This seems to be better. > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2043) TestHFlush failing intermittently
[ https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-2043: Attachment: HDFS-2043.003.patch Update the patch and adding the sleep time in each retry chance. Testing {{TestHFlush}} again, this patch will looks effective if the result is passed again. Thanks for reviewing the patch. > TestHFlush failing intermittently > - > > Key: HDFS-2043 > URL: https://issues.apache.org/jira/browse/HDFS-2043 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Aaron T. Myers >Assignee: Lin Yiqun > Attachments: HDFS-2043.002.patch, HDFS-2043.003.patch, HDFS.001.patch > > > I can't reproduce this failure reliably, but it seems like TestHFlush has > been failing intermittently, with the frequency increasing of late. > Note the following two pre-commit test runs from different JIRAs where > TestHFlush seems to have failed spuriously: > https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/ > https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.
[ https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205658#comment-15205658 ] Brahma Reddy Battula commented on HDFS-9917: Uploaded patch..Kindly review.. can we limit the number of IBR's to standby where DN keep accumulating the IBRs and use lot of memory..? > IBR accumulate more objects when SNN was down for sometime. > --- > > Key: HDFS-9917 > URL: https://issues.apache.org/jira/browse/HDFS-9917 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-9917.patch > > > SNN was down for sometime because of some reasons..After restarting SNN,it > became unreponsive because > - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), > where as each datanode had only ~2.5 million blocks. > - GC can't trigger on this objects since all will be under RPC queue. > To recover this( to clear this objects) ,restarted all the DN's one by > one..This issue happened in 2.4.1 where split of blockreport was not > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.
[ https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-9917: --- Status: Patch Available (was: Open) > IBR accumulate more objects when SNN was down for sometime. > --- > > Key: HDFS-9917 > URL: https://issues.apache.org/jira/browse/HDFS-9917 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-9917.patch > > > SNN was down for sometime because of some reasons..After restarting SNN,it > became unreponsive because > - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), > where as each datanode had only ~2.5 million blocks. > - GC can't trigger on this objects since all will be under RPC queue. > To recover this( to clear this objects) ,restarted all the DN's one by > one..This issue happened in 2.4.1 where split of blockreport was not > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.
[ https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-9917: --- Attachment: HDFS-9917.patch > IBR accumulate more objects when SNN was down for sometime. > --- > > Key: HDFS-9917 > URL: https://issues.apache.org/jira/browse/HDFS-9917 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-9917.patch > > > SNN was down for sometime because of some reasons..After restarting SNN,it > became unreponsive because > - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), > where as each datanode had only ~2.5 million blocks. > - GC can't trigger on this objects since all will be under RPC queue. > To recover this( to clear this objects) ,restarted all the DN's one by > one..This issue happened in 2.4.1 where split of blockreport was not > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205644#comment-15205644 ] Lin Yiqun commented on HDFS-9847: - {quote} Yes. Similar to the discussion on casting the result to int, the caller is responsible for precision. {quote} Can we use nothrow exception way, but still print out the lossing precision log infos to let users know. > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor
[ https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205621#comment-15205621 ] Lin Yiqun commented on HDFS-9951: - Thanks [~cmccabe] for commit! > Use string constants for XML tags in OfflineImageReconstructor > -- > > Key: HDFS-9951 > URL: https://issues.apache.org/jira/browse/HDFS-9951 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, > HDFS-9551.003.patch, HDFS-9551.004.patch > > > In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to > process xml files and load the subtree of the XML into a Node structure. But > there are lots of places that node removes key by directively writing value > in methods rather than define them first. Like this: > {code} > Node expiration = directive.removeChild("expiration"); > {code} > We could improve this to define them in Node and them invoked like this way: > {code} > Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION); > {code} > And it will be good to manager node key's name in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.
[ https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205591#comment-15205591 ] Rushabh S Shah commented on HDFS-9874: -- Sure. Will update the patch shortly. > Long living DataXceiver threads cause volume shutdown to block. > --- > > Key: HDFS-9874 > URL: https://issues.apache.org/jira/browse/HDFS-9874 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, > HDFS-9874-trunk.patch > > > One of the failed volume shutdown took 3 days to complete. > Below are the relevant datanode logs while shutting down a volume (due to > disk failure) > {noformat} > 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing > failed volume volumeA/current: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not > writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized > at > org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194) > at > org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178) > at java.lang.Thread.run(Thread.java:745) > 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing > scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) > 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting. > 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b. > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633) > 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > delete old dfsUsed file in > volumeA/current/BP-1788428031-nnIp-1351700107344/current > 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > write dfsUsed to > volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed (Read-only > file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at java.io.FileOutputStream.(FileOutputStream.java:162) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:247) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:698) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:815) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:328) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:250) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011) > at
[jira] [Commented] (HDFS-9959) add log when block removed from last live datanode
[ https://issues.apache.org/jira/browse/HDFS-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205585#comment-15205585 ] Tsz Wo Nicholas Sze commented on HDFS-9959: --- For the new patch, please call init() in heartbeatCheck() but not add(..). Otherwise, blocks will be added for other cases such as when a block is deleted from the last datanode. > ... Which solution is better: ... Let's add only first thousand blocks, i.e. check missing.size() before adding the block. We probably should print the log message in multiple lines, say 10 blocks per line. Thanks for the update. > add log when block removed from last live datanode > -- > > Key: HDFS-9959 > URL: https://issues.apache.org/jira/browse/HDFS-9959 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Minor > Attachments: HDFS-9959.1.patch, HDFS-9959.2.patch, HDFS-9959.patch > > > Add logs like "BLOCK* No live nodes contain block blk_1073741825_1001, last > datanode contain it is node: 127.0.0.1:65341" in BlockStateChange should help > to identify which datanode should be fixed first to recover missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++
[ https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205581#comment-15205581 ] Hadoop QA commented on HDFS-9118: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 7m 35s {color} | {color:red} Docker failed to build yetus/hadoop:0cf5e66. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12794615/HDFS-9118.HDFS-8707.003.patch | | JIRA Issue | HDFS-9118 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14884/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Add logging system for libdhfs++ > > > Key: HDFS-9118 > URL: https://issues.apache.org/jira/browse/HDFS-9118 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: James Clampffer > Attachments: HDFS-9118.HDFS-8707.000.patch, > HDFS-9118.HDFS-8707.001.patch, HDFS-9118.HDFS-8707.002.patch, > HDFS-9118.HDFS-8707.003.patch > > > With HDFS-9505, we've starting logging data from libhdfs++. Consumers of the > library are going to have their own logging infrastructure that we're going > to want to provide data to. > libhdfs++ should have a logging library that: > * Is overridable and can provide sufficient information to work well with > common C++ logging frameworks > * Has a rational default implementation > * Is performant -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10189) PacketResponder toString is built incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205469#comment-15205469 ] Hadoop QA commented on HDFS-10189: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 17s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 33s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 132m 36s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.TestHFlush | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.TestDistributedFileSystem | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12794590/HDFS-10189.patch | | JIRA Issue | HDFS-10189 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux db8a1a3624b4 3.13.0-36-lowlatency #63-Ubuntu SMP
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205403#comment-15205403 ] Devaraj Das commented on HDFS-3702: --- Just FYI - the balancer work is being tracked in HBASE-8549. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205395#comment-15205395 ] Arpit Agarwal commented on HDFS-3702: - If the region server has write permissions on /hbase/.logs, which I assume it does, it should be able to set policies on that directory. The ability for administrators to do so upfront would be a nice benefit but not a must. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9959) add log when block removed from last live datanode
[ https://issues.apache.org/jira/browse/HDFS-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-9959: Attachment: HDFS-9959.2.patch How about this one? In case of extreme case, for example, all DataNodes are dead, and the missing blocks may have more than hundred thousand, logging all of them might take very long time. {code} NameNode.blockStateChangeLog.warn("After removed " + dn + ", no live nodes contain the following " + missing.size() + " blocks: " + missing); {code} Which solution is better: 1. add only first thousand blocks and ignore others 2. if there are more than thousand blocks missing, only logging first thousand? > add log when block removed from last live datanode > -- > > Key: HDFS-9959 > URL: https://issues.apache.org/jira/browse/HDFS-9959 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Minor > Attachments: HDFS-9959.1.patch, HDFS-9959.2.patch, HDFS-9959.patch > > > Add logs like "BLOCK* No live nodes contain block blk_1073741825_1001, last > datanode contain it is node: 127.0.0.1:65341" in BlockStateChange should help > to identify which datanode should be fixed first to recover missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205380#comment-15205380 ] Nicolas Liochon commented on HDFS-3702: --- bq. The issue was opened in July 2012 so we not holding our breath If we're not holding our breath is also because we put a hack in HBase (HBASE-6435). However, this hack is not perfect and does not help on the write path (we write and flush 3 times while two would provide the same level of safety), and we still try to do a recoverLease on a dead node when there is a server crash. bq. Yeah, vendors could ensure installers set the attribute. imho, it's not an optional behavior for HBase (compared to favoredNode which was supposed to be a power-user configuration only): out of the box, HBase WALs should be written to 2 remote nodes by default, and never to the local node. So it would be much better to have the right behavior without requiring any extra work, scripts to run or code to deploy on the hdfs namenode (it's too easy to mess things up). > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10175: - Status: Patch Available (was: In Progress) > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205369#comment-15205369 ] Chris Douglas commented on HDFS-9847: - bq. Sorry I didn't understand your proposal to handle compatibility in the last para, agreed with everything before that. For params without units we warn, which is consistent with our current strategy for deprecating config knobs. If the impl increases precision, then nothing distinguishes throw/nothrow. If we decrease precision, the nothrow implementation can decide what to do: discard lost precision (default), check the old precision and warn, or throw; if {{getTimeDuration}} throws, then old configs will throw (default) unless the impl checks the old precision and warns, or discards. w.r.t. compatibility, the only difference is the default for decreasing precision. The caller is explicitly saying they want the unit in seconds/hours/days, and following the semantics of {{TimeUnit}} seems preferable to an error. I see your point about our existing cases: we can't switch from {{some.config.seconds}} to {{some.config}} with a default in millis by aliasing the former to the latter. We need to first warn that {{some.config.seconds}} is deprecated in a release, then drop it in the next. Where {{some.config}} has no units in the key, we can't convert the API to use {{getTimeDuration}} and increase precision in the same release. To be fair, this isn't less flexibility than we have now: we don't specify {{some.config\[.seconds\]}} as a float. The opacity of discarded precision creating the possibility for misattribution (e.g., "increasing the hb interval from 1s to 1500ms improved stability" when it's noise) is also a concern, but I'd argue that's not a failure of the API. bq. To continue the example of heartbeat interval do you mean we continue to query it with a unit of seconds for now and discard precision without throwing? Yes. Similar to the discussion on casting the result to int, the caller is responsible for precision. All that said, while I think nothrow is preferable, I won't insist on it and am +0 on the latest patch. > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10175: - Attachment: HDFS-10175.001.patch > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205361#comment-15205361 ] stack commented on HDFS-3702: - bq. ...could you comment on the usability of providing node lists to this API? Usually nodes and NN can agree on what they call machines but we've all seen plenty of clusters where this is not so. Both HDFS and HBase have their own means of insulating themselves against dodgy named setups. These systems are not in alignment. bq. My impression was that tracking this in HBase was onerous, and is part of why favored nodes fell out of favor. No. It was never fully plumbed in HBase (it was plumbed into a balancer that no one used and would not swap into place because the default was featureful). Regards the FB experience, we need to get them to do us a post-mortem. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205343#comment-15205343 ] stack commented on HDFS-3702: - bq. Hi stack, the attribute could be set by an installer script or an API call at process startup [~arpitagarwal]Thanks. Yeah, vendors could ensure installers set the attribute. There are a significant set of installs where HBase shows up post-HDFS install and/or where HBase does not have sufficient permissions to set attributes on HDFS. I don't know the percentage. Would be just easier all around if it could be managed internally by HBase so no need to get scripts and/or operators involved. bq. ...so if you think HBase needs a solution now, ... Smile. The issue was opened in July 2012 so we not holding our breath (smile). Would be cool if we could ask HDFS to not write local. Anyone doing WAL-on-HDFS will appreciate this in HDFS. Thanks [~arpitagarwal] > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205337#comment-15205337 ] stack commented on HDFS-3702: - bq. How would DFSClient know which nodes are disfavored nodes? How could it enforce disfavored nodes? You postulated an application that wanted to 'distribute its files uniformly in a cluster.' I was just trying to suggest that users would prefer that HDFS would just do it for them. HDFS would know how to do it better being the arbiter of what is happening in the cluster. An application will do a poor job compared. 'distribute its files uniformly...' sounds like a good feature to implement with a block placement policy. bq. Since we already have favoredNodes, adding disfavoredNodes seems more natural than adding a flag. As noted above at 'stack added a comment - 12/Mar/16 15:20', favoredNodes is an unexercised feature that has actually been disavowed by the originators of the idea, FB, because it proved broken in practice. I'd suggest we not build more atop a feature-under-review as adding disfavoredNodes would (or at least until we hear of successful use of favoredNodes -- apparently our Y! are trying it). bq. In addition, the new FileSystem CreateFlag does not look clean to me since it is too specific to HDFS. How would other FileSystems such as LocalFileSystem implement it? The flag added by the attached patch is qualified throughout as a 'hint'. When set against LFS, it'll just be ignored. No harm done. The 'hint' didn't take. If we went your suggested route and added a disfavoredNodes route, things get a bit interesting when hbase, say, passes localhost. What'll happen? Does the user now have to check the FS implementation type before they select DFSClient method to call? I don't think you are objecting to the passing of flags on create, given this seems pretty standard fare in FSs. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205306#comment-15205306 ] Arpit Agarwal commented on HDFS-9847: - Sorry I didn't understand your proposal to handle compatibility in the last para, agreed with everything before that. To continue the example of heartbeat interval do you mean we continue to query it with a unit of seconds for now and discard precision without throwing? > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205275#comment-15205275 ] Arpit Agarwal commented on HDFS-3702: - bq. No need for an admin operator to remember to set attributes on specific dirs (99% won't); Hi [~stack], the attribute could be set by an installer script or an API call at process startup. Since you agree pluggable policies are good to have eventually, CreateFlag becomes a stopgap. However this will take more time so if you think HBase needs a solution now, I'm -0. Thanks. AddBlockFlag should be tagged as {{@InterfaceAudience.Private}} if we proceed with the .008 patch. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10189) PacketResponder toString is built incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205233#comment-15205233 ] Colin Patrick McCabe commented on HDFS-10189: - +1 pending jenkins. Thanks. [~jpallas] > PacketResponder toString is built incorrectly > - > > Key: HDFS-10189 > URL: https://issues.apache.org/jira/browse/HDFS-10189 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Joe Pallas >Assignee: Joe Pallas >Priority: Minor > Attachments: HDFS-10189.patch > > > The constructor for {{BlockReceiver.PacketResponder}} says > {code} > final StringBuilder b = new StringBuilder(getClass().getSimpleName()) > .append(": ").append(block).append(", type=").append(type); > if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) { > b.append(", downstreams=").append(downstreams.length) > .append(":").append(Arrays.asList(downstreams)); > } > {code} > So it includes the list of downstreams only when it has no downstreams. The > {{if}} test should be for equality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9118) Add logging system for libdhfs++
[ https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-9118: -- Attachment: HDFS-9118.HDFS-8707.003.patch New patch, incorporates a lot of [~bobhansen]'s suggestions. -rebased onto head -log macros to avoid creating LogMessage objects if they won't be logged -add a separate public header for the C logging data structures and defines so that the implementation doesn't have to drag in the whole public API -get rid of lock macros -more helpful output when null pointers are passed to operator<<(const char*) and operator<<(std::string*) -remove some garbage that slipped into the last patch (stuff commented out but not deleted) > Add logging system for libdhfs++ > > > Key: HDFS-9118 > URL: https://issues.apache.org/jira/browse/HDFS-9118 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: James Clampffer > Attachments: HDFS-9118.HDFS-8707.000.patch, > HDFS-9118.HDFS-8707.001.patch, HDFS-9118.HDFS-8707.002.patch, > HDFS-9118.HDFS-8707.003.patch > > > With HDFS-9505, we've starting logging data from libhdfs++. Consumers of the > library are going to have their own logging infrastructure that we're going > to want to provide data to. > libhdfs++ should have a logging library that: > * Is overridable and can provide sufficient information to work well with > common C++ logging frameworks > * Has a rational default implementation > * Is performant -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205215#comment-15205215 ] Tsz Wo Nicholas Sze commented on HDFS-3702: --- > For uniform distribution of files over a cluster, I think users would prefer > that DFSClient managed it for them (a new flag on CreateFlag?) ... How would DFSClient know which nodes are disfavored nodes? How could it enforce disfavored nodes? > ... disfavoredNodes seems like a more intrusive and roundabout route – with > its overrides, possible builders, and global interpretation of 'localhost' > string – to the clean flag this patch carries? I disagree. Since we already have favoredNodes, adding disfavoredNodes seems more natural than adding a flag. In addition, the new FileSystem CreateFlag does not look clean to me since it is too specific to HDFS. How would other FileSystems such as LocalFileSystem implement it? > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205191#comment-15205191 ] Chris Douglas commented on HDFS-9847: - Sorry, I overloaded "precision". By providing another parameter, it might provide semantics like "give me the value of this parameter in this unit, unless it sheds more than (x% | x | whatever) of the configured value". If the caller cares this much, they should ask for the var with high precision and make the decision to throw in that context. bq. Without the precision check Configuration silently changes values without the caller knowing about it. Isn't that the intent behind configuring with units? Throwing for loss of precision requires the operator to know the default precision (no improvement over the current approach, which embeds the unit in the config key), or go through at least one round of errors. To make this less painful for users, the caller should only use {{getTimeDuration}} with high precision, which gets us back to specifying everything but the string value in millis, internally. If {{getTimeDuration}} throws, then why is it better? I'd argue handling backwards compatibility is easier without throwing for loss of precision. This warns for vars without units, which we've decided is sufficient for deprecated config vars. New versions can increase precision changing only the timeunit param, and old client configs will still work. If it wants to decrease precision, then the caller can warn/throw for values that lose precision (whatever's appropriate for its context), and migrate to the new unit in the next version. > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205142#comment-15205142 ] stack commented on HDFS-3702: - For uniform distribution of files over a cluster, I think users would prefer that DFSClient managed it for them (a new flag on CreateFlag?) rather than do calculation figuring how to populate favoredNodes and disfavoredNodes using imperfect knowledge of the cluster, something the NN will always do better at. Unless you have other possible uses, disfavoredNodes seems like a more intrusive and roundabout route -- with its overrides, possible builders, and global interpretation of 'localhost' string -- to the clean flag this patch carries? What you think [~szetszwo]? Thanks Nicolas. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.
[ https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205137#comment-15205137 ] Wei-Chiu Chuang commented on HDFS-9874: --- Thanks for looking into it. Maybe the NPE is unrelated. I'm not able to fail the test, it could be an intermittent flaky test. But in anyway, it would be great if you could improve the test diagnostics using {{GenericTestUtils#assertExceptionContains}}. This utility method prints the stack trace if the exception message doesn't match the expected value. > Long living DataXceiver threads cause volume shutdown to block. > --- > > Key: HDFS-9874 > URL: https://issues.apache.org/jira/browse/HDFS-9874 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, > HDFS-9874-trunk.patch > > > One of the failed volume shutdown took 3 days to complete. > Below are the relevant datanode logs while shutting down a volume (due to > disk failure) > {noformat} > 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing > failed volume volumeA/current: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not > writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized > at > org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194) > at > org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178) > at java.lang.Thread.run(Thread.java:745) > 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing > scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) > 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting. > 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b. > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633) > 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > delete old dfsUsed file in > volumeA/current/BP-1788428031-nnIp-1351700107344/current > 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > write dfsUsed to > volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed (Read-only > file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at java.io.FileOutputStream.(FileOutputStream.java:162) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:247) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:698) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:815) > at >
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205136#comment-15205136 ] Tsz Wo Nicholas Sze commented on HDFS-3702: --- > ... we can also add another flag for fully random distribution. It seems not a good idea to keep adding flag. BTW, fully random distribution is not the same as uniform distribution. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205111#comment-15205111 ] Andrew Wang commented on HDFS-3702: --- [~stack] from a downstream perspective, could you comment on the usability of providing node lists to this API? This always felt hacky to me, since ultimately the NN is the one who knows about the cluster state and DN names and block location constraints. My impression was that tracking this in HBase was onerous, and is part of why favored nodes fell out of favor. bq. For example, some application may want to distribute its files uniformly in a cluster The main reason for skew I've seen is the local writer case, which this patch attempts to address. It'll still bias to the local rack, but I doubt that'll be an issue in practice, and if it is we can also add another flag for fully random distribution. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204945#comment-15204945 ] Chris Douglas edited comment on HDFS-9847 at 3/21/16 8:53 PM: -- The current patch changes {{Configuration::getTimeDuration}} to throw when losing precision: {noformat} // Configuration.java +long raw = Long.parseLong(timeStr); +long converted = unit.convert(raw, vUnit.unit()); +if (vUnit.unit().convert(converted, unit) != raw) { + throw new HadoopIllegalArgumentException("Loss of precision converting " + + timeStr + vUnit.suffix() + " to " + unit); } -return unit.convert(Long.parseLong(vStr), vUnit.unit()); +return converted; // TestConfiguration Configuration conf = new Configuration(false); conf.setTimeDuration("test.time.a", 7L, SECONDS); assertEquals("7s", conf.get("test.time.a")); -assertEquals(0L, conf.getTimeDuration("test.time.a", 30, MINUTES)); {noformat} This changes the contract. In the current version, the caller determines the precision (as detailed in the javadoc). This is correct; the caller knows reasonable precision, and can perform checks (e.g., unexpected value of 0) in its context. The {{Configuration}} has no context, and providing the expected precision as an argument overengineers the interface. If I want to give a heartbeat interval in microseconds that's my right as a lunatic, but the caller should not throw because it only cares about the value in seconds. (aside) Why {{HadoopIllegalArgumentException}} instead of {{java.lang.IllegalArgumentException}}? Not why is is thrown here, but why does it exist? was (Author: chris.douglas): The current patch changes {{Configuration::getTimeDuration}} to throw when losing precision: {{noformat}} // Configuration.java +long raw = Long.parseLong(timeStr); +long converted = unit.convert(raw, vUnit.unit()); +if (vUnit.unit().convert(converted, unit) != raw) { + throw new HadoopIllegalArgumentException("Loss of precision converting " + + timeStr + vUnit.suffix() + " to " + unit); } -return unit.convert(Long.parseLong(vStr), vUnit.unit()); +return converted; // TestConfiguration Configuration conf = new Configuration(false); conf.setTimeDuration("test.time.a", 7L, SECONDS); assertEquals("7s", conf.get("test.time.a")); -assertEquals(0L, conf.getTimeDuration("test.time.a", 30, MINUTES)); {{noformat}} This changes the contract. In the current version, the caller determines the precision (as detailed in the javadoc). This is correct; the caller knows reasonable precision, and can perform checks (e.g., unexpected value of 0) in its context. The {{Configuration}} has no context, and providing the expected precision as an argument overengineers the interface. If I want to give a heartbeat interval in microseconds that's my right as a lunatic, but the caller should not throw because it only cares about the value in seconds. (aside) Why {{HadoopIllegalArgumentException}} instead of {{java.lang.IllegalArgumentException}}? Not why is is thrown here, but why does it exist? > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.
[ https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205092#comment-15205092 ] Rushabh S Shah commented on HDFS-9874: -- The NPE is expected. {quote} at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1714) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdownBlockPool(FsDatasetImpl.java:2591) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:1479) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:411) {quote} This is getting called while shutting down the cluster. This is expected since I have triggered only a part of checkDiskError thread. {code:title=DataNode.java|borderStyle=solid} private void checkDiskError() { Set unhealthyDataDirs = data.checkDataDir(); if (unhealthyDataDirs != null && !unhealthyDataDirs.isEmpty()) { try { // Remove all unhealthy volumes from DataNode. removeVolumes(unhealthyDataDirs, false); } catch (IOException e) { LOG.warn("Error occurred when removing unhealthy storage dirs: " + e.getMessage(), e); } StringBuilder sb = new StringBuilder("DataNode failed volumes:"); for (File dataDir : unhealthyDataDirs) { sb.append(dataDir.getAbsolutePath() + ";"); } handleDiskError(sb.toString()); } } {code} I have only called the first line of the above function in the test case since I don't want the test case to wait for DataNode#checkDiskErrorInterval (which is 5 secs if defualt). That's why it will not execute removeVolumes(unhealthyDataDirs, false) Therefore the NPE. I am not able to reproduce the test case failing on my local machine on jdk 7 and jdk8. [~jojochuang]: Does it fail on your machine ? > Long living DataXceiver threads cause volume shutdown to block. > --- > > Key: HDFS-9874 > URL: https://issues.apache.org/jira/browse/HDFS-9874 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, > HDFS-9874-trunk.patch > > > One of the failed volume shutdown took 3 days to complete. > Below are the relevant datanode logs while shutting down a volume (due to > disk failure) > {noformat} > 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing > failed volume volumeA/current: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not > writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized > at > org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194) > at > org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178) > at java.lang.Thread.run(Thread.java:745) > 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing > scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) > 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting. > 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b. > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at >
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205086#comment-15205086 ] Mingliang Liu commented on HDFS-10175: -- Thanks for your comment, [~andrew.wang]. I was aware of the thread local statistics data structure, and was in favor of following the same approach. The new operation map is still per-thread. The ConcurrentHashMap was used because when aggregating, we have to make sure the map should not be modified. It's functionality is similar to the "volatile" keyword for other primitive statistic data. Anyway, I will revise the code and will update the patch if ConcurrentHashMap turns out unnecessary, for the sake of performance. Before that, the next patch will firstly resolve the conflicts from trunk because of [HDFS-9579]. > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205079#comment-15205079 ] Tsz Wo Nicholas Sze commented on HDFS-3702: --- > So, the suggestion is changing public signatures to add a new parameter (Or > adding a new override where there are already 6)? For compatibility reason, we probably have to add a new override. For better usability, we may add a Builder. > For a client to make effective use of disfavoredNodes, they would have to > figure the exact name the NN is using and volunteer it in this > disfavoredNodes list? Or could they just write 'localhost' and let NN figure > it out? We should support 'localhost' in the API. DFSClient or NN may replace 'localhost' with the corresponding name. > Do you foresee any other use for this disfavoredNodes parameter other than > for the exclusion of 'localnode'? Yes, disfavoredNodes seems useful. For example, some application may want to distribute its files uniformly in a cluster. Then, it could specify the previously allocated DNs as the disfavoredNodes. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205020#comment-15205020 ] Arpit Agarwal commented on HDFS-9847: - The precision check was my idea so I'll try to answer. bq. The Configuration has no context, and providing the expected precision as an argument overengineers the interface. {{getTimeDuration}} already takes a precision parameter so this patch didn't add to the interface. Without the precision check Configuration silently changes values without the caller knowing about it. e.g. this test passes today. {code} conf.set("xyz", "1500ms"); assertEquals(conf.getTimeDuration("xyz", 0, TimeUnit.SECONDS), 1); {code} The alternative is the caller always passes TimeUnit as microseconds (or nanoseconds?) But handling backwards compatibility is harder. How will the caller differentiate {{1us}} from plain {{1}} without looking at the raw value? bq. Why HadoopIllegalArgumentException instead of java.lang.IllegalArgumentException? Added by HADOOP-6537, looks like the intent was to differentiate IllegalArgumentException thrown by Hadoop vs the JDK. > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10189) PacketResponder toString is built incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Pallas updated HDFS-10189: -- Attachment: HDFS-10189.patch > PacketResponder toString is built incorrectly > - > > Key: HDFS-10189 > URL: https://issues.apache.org/jira/browse/HDFS-10189 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Joe Pallas >Assignee: Joe Pallas >Priority: Minor > Attachments: HDFS-10189.patch > > > The constructor for {{BlockReceiver.PacketResponder}} says > {code} > final StringBuilder b = new StringBuilder(getClass().getSimpleName()) > .append(": ").append(block).append(", type=").append(type); > if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) { > b.append(", downstreams=").append(downstreams.length) > .append(":").append(Arrays.asList(downstreams)); > } > {code} > So it includes the list of downstreams only when it has no downstreams. The > {{if}} test should be for equality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10189) PacketResponder toString is built incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Pallas updated HDFS-10189: -- Assignee: Joe Pallas Status: Patch Available (was: Open) > PacketResponder toString is built incorrectly > - > > Key: HDFS-10189 > URL: https://issues.apache.org/jira/browse/HDFS-10189 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Joe Pallas >Assignee: Joe Pallas >Priority: Minor > > The constructor for {{BlockReceiver.PacketResponder}} says > {code} > final StringBuilder b = new StringBuilder(getClass().getSimpleName()) > .append(": ").append(block).append(", type=").append(type); > if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) { > b.append(", downstreams=").append(downstreams.length) > .append(":").append(Arrays.asList(downstreams)); > } > {code} > So it includes the list of downstreams only when it has no downstreams. The > {{if}} test should be for equality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204989#comment-15204989 ] stack commented on HDFS-3702: - [~szetszwo] So, the suggestion is changing public signatures to add a new parameter (Or adding a new override where there are already 6)? For a client to make effective use of disfavoredNodes, they would have to figure the exact name the NN is using and volunteer it in this disfavoredNodes list? Or could they just write 'localhost' and let NN figure it out? Do you foresee any other use for this disfavoredNodes parameter other than for the exclusion of 'localnode'? Thanks Nicolas. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.
[ https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204987#comment-15204987 ] Rushabh S Shah commented on HDFS-9874: -- [~jojochuang]: Thanks for reporting. Taking a look now. > Long living DataXceiver threads cause volume shutdown to block. > --- > > Key: HDFS-9874 > URL: https://issues.apache.org/jira/browse/HDFS-9874 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, > HDFS-9874-trunk.patch > > > One of the failed volume shutdown took 3 days to complete. > Below are the relevant datanode logs while shutting down a volume (due to > disk failure) > {noformat} > 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing > failed volume volumeA/current: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not > writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized > at > org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194) > at > org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178) > at java.lang.Thread.run(Thread.java:745) > 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing > scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) > 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting. > 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b. > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633) > 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > delete old dfsUsed file in > volumeA/current/BP-1788428031-nnIp-1351700107344/current > 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > write dfsUsed to > volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed (Read-only > file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at java.io.FileOutputStream.(FileOutputStream.java:162) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:247) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:698) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:815) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:328) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:250) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
[jira] [Commented] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers
[ https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204985#comment-15204985 ] Wei-Chiu Chuang commented on HDFS-10187: Test failures are all known bugs and unrelated. The checkstyle warning need to be addressed though. > Add a "list" refresh handler to list all registered refresh > identifiers/handlers > > > Key: HDFS-10187 > URL: https://issues.apache.org/jira/browse/HDFS-10187 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: commandline, supportability > Attachments: HDFS-10187.001.patch > > > HADOOP-10376 added a new feature to register handlers for refreshing daemon > configurations, because name node properties can be reconfigured without > restarting the daemon. This can be a very useful generic interface, but so > far no real handlers have been registered using this interface. I added a new > 'list' handler to list all registered handlers. My plan is to add more > handlers in the future using this interface. > Another minor fix is return a more explicit error message to the client if a > handler is not registered. (It is currently logged at namenode side, but > client side only gets a "Failed to get response." message without knowing why) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.
[ https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204980#comment-15204980 ] Wei-Chiu Chuang commented on HDFS-9874: --- It seems this patch is buggy. In a precommit job, this test threw NPE: https://builds.apache.org/job/PreCommit-HDFS-Build/14881/testReport/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestFsDatasetImpl/testCleanShutdownOfVolume/ Exception in thread "DataNode: [[[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/data/data1/, [DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/data/data2/]] heartbeating to localhost/127.0.0.1:39740" java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1714) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdownBlockPool(FsDatasetImpl.java:2591) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:1479) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:411) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:494) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:749) at java.lang.Thread.run(Thread.java:745) And the precommit record shows it has been failing continuously for 3 times. > Long living DataXceiver threads cause volume shutdown to block. > --- > > Key: HDFS-9874 > URL: https://issues.apache.org/jira/browse/HDFS-9874 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, > HDFS-9874-trunk.patch > > > One of the failed volume shutdown took 3 days to complete. > Below are the relevant datanode logs while shutting down a volume (due to > disk failure) > {noformat} > 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing > failed volume volumeA/current: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not > writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized > at > org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194) > at > org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178) > at java.lang.Thread.run(Thread.java:745) > 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing > scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) > 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting. > 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN > datanode.VolumeScanner: VolumeScanner(volumeA, > DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b. > java.io.FileNotFoundException: > volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633) > 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to > delete old dfsUsed file in >
[jira] [Commented] (HDFS-9405) Warmup NameNode EDEK caches in background thread
[ https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204977#comment-15204977 ] Xiao Chen commented on HDFS-9405: - Thanks Andrew and all for the review! (Sorry for that missed final) > Warmup NameNode EDEK caches in background thread > > > Key: HDFS-9405 > URL: https://issues.apache.org/jira/browse/HDFS-9405 > Project: Hadoop HDFS > Issue Type: Improvement > Components: encryption, namenode >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Xiao Chen > Fix For: 2.8.0 > > Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, > HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, > HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, > HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, > HDFS-9405.12.patch, HDFS-9405.13.patch > > > {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation > to the key provider, which could be slow or cause timeout. It should be done > as a separate thread so as to return a proper error message to the RPC caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204981#comment-15204981 ] stack commented on HDFS-3702: - [~arpiagariu] You'd like to purge all of the CreateFlag parameters Arpit? CreateFlag seems to be how other filesystems do color on a particular creation and this patch was able to make use of it and save changing a bunch of method signatures. Seems kinda useful? And seems like we could get more flags on CreateFlag down the road (ASYNC?). bq. What do you think of per-target block placement policies as proposed in this comment e.g. set a custom placement policy for /hbase/.logs/. Seems like a grand idea (then and now) being able to do it for a whole class of files based-off their location in HDFS. Would this be instead of this patches' decoration on CreateFlag? I'd suggest not. I like this patch. It gets us what we want nicely. No need for an admin operator to remember to set attributes on specific dirs (99% won't); we can just do the code change in hbase (and rip out the hacks we have had in place for years now that have been our workaround in the absence of this patch). Thanks > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10189) PacketResponder toString is built incorrectly
Joe Pallas created HDFS-10189: - Summary: PacketResponder toString is built incorrectly Key: HDFS-10189 URL: https://issues.apache.org/jira/browse/HDFS-10189 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Joe Pallas Priority: Minor The constructor for {{BlockReceiver.PacketResponder}} says {code} final StringBuilder b = new StringBuilder(getClass().getSimpleName()) .append(": ").append(block).append(", type=").append(type); if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) { b.append(", downstreams=").append(downstreams.length) .append(":").append(Arrays.asList(downstreams)); } {code} So it includes the list of downstreams only when it has no downstreams. The {{if}} test should be for equality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor
[ https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204956#comment-15204956 ] Hudson commented on HDFS-9951: -- FAILURE: Integrated in Hadoop-trunk-Commit #9483 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9483/]) HDFS-9951. Use string constants for XML tags in (cmccabe: rev 680716f31e120f4d3ee70b095e4db46c05b891d9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java > Use string constants for XML tags in OfflineImageReconstructor > -- > > Key: HDFS-9951 > URL: https://issues.apache.org/jira/browse/HDFS-9951 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, > HDFS-9551.003.patch, HDFS-9551.004.patch > > > In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to > process xml files and load the subtree of the XML into a Node structure. But > there are lots of places that node removes key by directively writing value > in methods rather than define them first. Like this: > {code} > Node expiration = directive.removeChild("expiration"); > {code} > We could improve this to define them in Node and them invoked like this way: > {code} > Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION); > {code} > And it will be good to manager node key's name in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204945#comment-15204945 ] Chris Douglas commented on HDFS-9847: - The current patch changes {{Configuration::getTimeDuration}} to throw when losing precision: {{noformat}} // Configuration.java +long raw = Long.parseLong(timeStr); +long converted = unit.convert(raw, vUnit.unit()); +if (vUnit.unit().convert(converted, unit) != raw) { + throw new HadoopIllegalArgumentException("Loss of precision converting " + + timeStr + vUnit.suffix() + " to " + unit); } -return unit.convert(Long.parseLong(vStr), vUnit.unit()); +return converted; // TestConfiguration Configuration conf = new Configuration(false); conf.setTimeDuration("test.time.a", 7L, SECONDS); assertEquals("7s", conf.get("test.time.a")); -assertEquals(0L, conf.getTimeDuration("test.time.a", 30, MINUTES)); {{noformat}} This changes the contract. In the current version, the caller determines the precision (as detailed in the javadoc). This is correct; the caller knows reasonable precision, and can perform checks (e.g., unexpected value of 0) in its context. The {{Configuration}} has no context, and providing the expected precision as an argument overengineers the interface. If I want to give a heartbeat interval in microseconds that's my right as a lunatic, but the caller should not throw because it only cares about the value in seconds. (aside) Why {{HadoopIllegalArgumentException}} instead of {{java.lang.IllegalArgumentException}}? Not why is is thrown here, but why does it exist? > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers
[ https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204949#comment-15204949 ] Hadoop QA commented on HDFS-10187: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 15s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 5s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 2s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 2s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s {color} | {color:red} root: patch generated 6 new + 230 unchanged - 5 fixed = 236 total (was 235) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 51s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 22s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 39s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 32s {color} | {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 22s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 33s {color} | {color:red} Patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 232m 46s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests |
[jira] [Updated] (HDFS-9809) Abstract implementation-specific details from the datanode
[ https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HDFS-9809: - Component/s: fs datanode > Abstract implementation-specific details from the datanode > -- > > Key: HDFS-9809 > URL: https://issues.apache.org/jira/browse/HDFS-9809 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode, fs >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti > Attachments: HDFS-9809.001.patch > > > Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) > implicitly assume that blocks are stored in java.io.File(s) and that volumes > are divided into directories. We propose to abstract these details, which > would help in supporting other storages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9809) Abstract implementation-specific details from the datanode
[ https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204943#comment-15204943 ] Virajith Jalaparti commented on HDFS-9809: -- Hi [~zhz], Thanks for the comment. One concrete example where a different sub-class of {{ReplicaInfo}} might be used is Ozone (HDFS-7240) where the replica can be stored in the underlying key-value store. Similar to the {{DatasetSpi}}/{{FsDatasetSpi}}, we could have {{ReplicaInfo}}/{{FsReplicaInfo}}. We are still going over the changes proposed by HDFS-7240 and in the process of understanding if a different sub-class of {{ReplicaInfo}} makes sense there. We are currently working on the design document for HDFS-9806 and will post it soon (within the week). If it ends up being the case that this work is useful only in the context of HDFS-9806, we will make it a subtask. > Abstract implementation-specific details from the datanode > -- > > Key: HDFS-9809 > URL: https://issues.apache.org/jira/browse/HDFS-9809 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti > Attachments: HDFS-9809.001.patch > > > Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) > implicitly assume that blocks are stored in java.io.File(s) and that volumes > are divided into directories. We propose to abstract these details, which > would help in supporting other storages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204925#comment-15204925 ] Tsz Wo Nicholas Sze commented on HDFS-3702: --- Please do commit the patch yet. Some idea below. In DistributedFileSystem, we already have create(..) and append(..) methods to support favoredNodes. How about we also add a new parameter disfavoredNodes? It supports a more general API -- we could set disfavoredNodes to one or more hosts. BPP can fallback to these nodes if the other nodes are unavailable. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor
[ https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-9951: --- Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Tags: tools Status: Resolved (was: Patch Available) Committed to 2.8 > Use string constants for XML tags in OfflineImageReconstructor > -- > > Key: HDFS-9951 > URL: https://issues.apache.org/jira/browse/HDFS-9951 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, > HDFS-9551.003.patch, HDFS-9551.004.patch > > > In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to > process xml files and load the subtree of the XML into a Node structure. But > there are lots of places that node removes key by directively writing value > in methods rather than define them first. Like this: > {code} > Node expiration = directive.removeChild("expiration"); > {code} > We could improve this to define them in Node and them invoked like this way: > {code} > Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION); > {code} > And it will be good to manager node key's name in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor
[ https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204862#comment-15204862 ] Colin Patrick McCabe commented on HDFS-9951: Thanks for fixing the patch. +1 > Use string constants for XML tags in OfflineImageReconstructor > -- > > Key: HDFS-9951 > URL: https://issues.apache.org/jira/browse/HDFS-9951 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, > HDFS-9551.003.patch, HDFS-9551.004.patch > > > In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to > process xml files and load the subtree of the XML into a Node structure. But > there are lots of places that node removes key by directively writing value > in methods rather than define them first. Like this: > {code} > Node expiration = directive.removeChild("expiration"); > {code} > We could improve this to define them in Node and them invoked like this way: > {code} > Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION); > {code} > And it will be good to manager node key's name in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9405) Warmup NameNode EDEK caches in background thread
[ https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204863#comment-15204863 ] Hudson commented on HDFS-9405: -- FAILURE: Integrated in Hadoop-trunk-Commit #9482 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9482/]) HDFS-9405. Warmup NameNode EDEK caches in background thread. Contributed (wang: rev e3bb38d62567eafe57d16b78deeba1b71c58e41c) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/ValueQueue.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestValueQueue.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java > Warmup NameNode EDEK caches in background thread > > > Key: HDFS-9405 > URL: https://issues.apache.org/jira/browse/HDFS-9405 > Project: Hadoop HDFS > Issue Type: Improvement > Components: encryption, namenode >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Xiao Chen > Fix For: 2.8.0 > > Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, > HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, > HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, > HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, > HDFS-9405.12.patch, HDFS-9405.13.patch > > > {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation > to the key provider, which could be slow or cause timeout. It should be done > as a separate thread so as to return a proper error message to the RPC caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9405) Warmup NameNode EDEK caches in background thread
[ https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-9405: -- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Pushed to trunk, branch-2, branch-2.8! Thanks Xiao for patch and also Arun for reviews! > Warmup NameNode EDEK caches in background thread > > > Key: HDFS-9405 > URL: https://issues.apache.org/jira/browse/HDFS-9405 > Project: Hadoop HDFS > Issue Type: Improvement > Components: encryption, namenode >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Xiao Chen > Fix For: 2.8.0 > > Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, > HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, > HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, > HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, > HDFS-9405.12.patch, HDFS-9405.13.patch > > > {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation > to the key provider, which could be slow or cause timeout. It should be done > as a separate thread so as to return a proper error message to the RPC caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9405) Warmup NameNode EDEK caches in background thread
[ https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204847#comment-15204847 ] Andrew Wang commented on HDFS-9405: --- One teensy nit is that I think we could make initialDelay and retryInterval final, but that's not a blocker. Will commit shortly, thanks Xiao! > Warmup NameNode EDEK caches in background thread > > > Key: HDFS-9405 > URL: https://issues.apache.org/jira/browse/HDFS-9405 > Project: Hadoop HDFS > Issue Type: Improvement > Components: encryption, namenode >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Xiao Chen > Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, > HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, > HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, > HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, > HDFS-9405.12.patch, HDFS-9405.13.patch > > > {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation > to the key provider, which could be slow or cause timeout. It should be done > as a separate thread so as to return a proper error message to the RPC caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.
[ https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204843#comment-15204843 ] Tsz Wo Nicholas Sze commented on HDFS-9917: --- [~brahmareddy], your proposal on reRegister() sounds great, thanks. > IBR accumulate more objects when SNN was down for sometime. > --- > > Key: HDFS-9917 > URL: https://issues.apache.org/jira/browse/HDFS-9917 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > > SNN was down for sometime because of some reasons..After restarting SNN,it > became unreponsive because > - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), > where as each datanode had only ~2.5 million blocks. > - GC can't trigger on this objects since all will be under RPC queue. > To recover this( to clear this objects) ,restarted all the DN's one by > one..This issue happened in 2.4.1 where split of blockreport was not > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify that HDFS blocks are in the correct datanode directories
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204822#comment-15204822 ] Tsz Wo Nicholas Sze commented on HDFS-7648: --- > ... I've raised separate sub-task HDFS-10186 to improve logging part. ... Why it needs a separated sub-task but not just updating the patch here? > I failed to understand it, could you please provide few more details. {code} +// Check whether the actual directory location of block file +// is block ID-based layout +File blockDir = DatanodeUtil.idToBlockDir(bpFinalizedDir, blockId); +File actualBlockDir = files[i].getParentFile(); +if (actualBlockDir.compareTo(blockDir) != 0) { + LOG.warn("Block: " + blockId + + " has to be upgraded to block ID-based layout"); +} report.add(new ScanInfo(blockId, null, files[i], vol)); } continue; @@ -646,6 +655,14 @@ public ScanInfoPerBlockPool call() throws Exception { break; } } +// Check whether the actual directory location of block file +// is block ID-based layout +File blockDir = DatanodeUtil.idToBlockDir(bpFinalizedDir, blockId); +File actualBlockDir = blockFile.getParentFile(); +if (actualBlockDir.compareTo(blockDir) != 0) { + LOG.warn("Block: " + blockId + + " has to be upgraded to block ID-based layout"); +} report.add(new ScanInfo(blockId, blockFile, metaFile, vol)); {code} The two chunks of code above are duplicated. Please add a helper method to avoid the duplication. Thanks. > Verify that HDFS blocks are in the correct datanode directories > --- > > Key: HDFS-7648 > URL: https://issues.apache.org/jira/browse/HDFS-7648 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo Nicholas Sze >Assignee: Rakesh R > Attachments: HDFS-7648-3.patch, HDFS-7648-4.patch, HDFS-7648-5.patch, > HDFS-7648.patch, HDFS-7648.patch > > > HDFS-6482 changed datanode layout to use block ID to determine the directory > to store the block. We should have some mechanism to verify it. Either > DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9616) libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests.
[ https://issues.apache.org/jira/browse/HDFS-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204776#comment-15204776 ] Hadoop QA commented on HDFS-9616: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 38s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 59s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 39s {color} | {color:green} HDFS-8707 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 32s {color} | {color:green} HDFS-8707 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 23s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 22s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 4s {color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 4s {color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 51s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12794531/HDFS-9616.HDFS-8707.002.patch | | JIRA Issue | HDFS-9616 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 4d8b97f384b4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / 7751507 | | Default Java | 1.7.0_95 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/14882/artifact/patchprocess/whitespace-eol.txt | | JDK v1.7.0_95 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/14882/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14882/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > libhdfs++ Add runtime hooks to allow a client application to add low level > monitoring and tests. >
[jira] [Updated] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
[ https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-9579: -- Fix Version/s: 2.9.0 Committed the branch-2 patch onto branch-2. I am not committing it to branch-2.8 because I understand 2.8 needs to be stabilized and I don't think it is critical this makes 2.8.0. Do let me know if you feel strongly that this should be in 2.8.0. > Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level > - > > Key: HDFS-9579 > URL: https://issues.apache.org/jira/browse/HDFS-9579 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.9.0 > > Attachments: HDFS-9579-10.patch, HDFS-9579-2.patch, > HDFS-9579-3.patch, HDFS-9579-4.patch, HDFS-9579-5.patch, HDFS-9579-6.patch, > HDFS-9579-7.patch, HDFS-9579-8.patch, HDFS-9579-9.patch, > HDFS-9579-branch-2.patch, HDFS-9579.patch, MR job counters.png > > > For cross DC distcp or other applications, it becomes useful to have insight > as to the traffic volume for each network distance to distinguish cross-DC > traffic, local-DC-remote-rack, etc. > FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To > provide additional metrics for each network distance, we can add additional > metrics to FileSystem level and have {{DFSInputStream}} update the value > based on the network distance between client and the datanode. > {{DFSClient}} will resolve client machine's network location as part of its > initialization. It doesn't need to resolve datanode's network location for > each read as {{DatanodeInfo}} already has the info. > There are existing HDFS specific metrics such as {{ReadStatistics}} and > {{DFSHedgedReadMetrics}}. But these metrics are only accessible via > {{DFSClient}} or {{DFSInputStream}}. Not something that application framework > such as MR and Tez can get to. That is the benefit of storing these new > metrics in FileSystem.Statistics. > This jira only includes metrics generation by HDFS. The consumption of these > metrics at MR and Tez will be tracked by separated jiras. > We can add similar metrics for HDFS write scenario later if it is necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204734#comment-15204734 ] Lei (Eddy) Xu commented on HDFS-3702: - Hey, Guys Based on [~stack] and [~andrew.wang]'s +1s, I will commit this by end of day if there is no further comment. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
[ https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204679#comment-15204679 ] Ming Ma commented on HDFS-9579: --- The branch-2 version also applies to branch-2.8. Unless others ask for it, skip the work for branch-2.7 and branch-2.6 given this is a feature enhancement and requires extra effort. > Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level > - > > Key: HDFS-9579 > URL: https://issues.apache.org/jira/browse/HDFS-9579 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0 > > Attachments: HDFS-9579-10.patch, HDFS-9579-2.patch, > HDFS-9579-3.patch, HDFS-9579-4.patch, HDFS-9579-5.patch, HDFS-9579-6.patch, > HDFS-9579-7.patch, HDFS-9579-8.patch, HDFS-9579-9.patch, > HDFS-9579-branch-2.patch, HDFS-9579.patch, MR job counters.png > > > For cross DC distcp or other applications, it becomes useful to have insight > as to the traffic volume for each network distance to distinguish cross-DC > traffic, local-DC-remote-rack, etc. > FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To > provide additional metrics for each network distance, we can add additional > metrics to FileSystem level and have {{DFSInputStream}} update the value > based on the network distance between client and the datanode. > {{DFSClient}} will resolve client machine's network location as part of its > initialization. It doesn't need to resolve datanode's network location for > each read as {{DatanodeInfo}} already has the info. > There are existing HDFS specific metrics such as {{ReadStatistics}} and > {{DFSHedgedReadMetrics}}. But these metrics are only accessible via > {{DFSClient}} or {{DFSInputStream}}. Not something that application framework > such as MR and Tez can get to. That is the benefit of storing these new > metrics in FileSystem.Statistics. > This jira only includes metrics generation by HDFS. The consumption of these > metrics at MR and Tez will be tracked by separated jiras. > We can add similar metrics for HDFS write scenario later if it is necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10184) Introduce unit tests framework for HDFS UI
[ https://issues.apache.org/jira/browse/HDFS-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204670#comment-15204670 ] Haohui Mai commented on HDFS-10184: --- Unless we move to Java 8 this is a non starter. The UI is heavily driven by HTML 5. Rhino, the JavaScript engine in Java 7, is at least 10x slower than Nashorn and node.js. I have written a JavaScript version of dfsadmin using Rhino and the performance is very unsatisfactory. Given the scope of the tests I'm not fully convinced it is a good idea. I believe that the new Web UI in YARN is adopiting npm / node.js -- these tools will be integrated with the current Jenkins workflow so the integration should not be an issue. > Introduce unit tests framework for HDFS UI > -- > > Key: HDFS-10184 > URL: https://issues.apache.org/jira/browse/HDFS-10184 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Haohui Mai > > The current HDFS UI is based on HTML5 and it does not have unit tests yet. > Occasionally things break and we can't catch it. We should investigate and > introduce unit test frameworks such as Mocha for the UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204661#comment-15204661 ] Lei (Eddy) Xu commented on HDFS-3702: - [~arpitagarwal] and [~szetszwo]. Thanks for these useful suggestions. I had “{{per-block}} block placement hint” and “putting local node into the excludedNodess", in patch 001 and 002, respectively. But for the reasons of performance concerns and the capability of fallback, as mentioned in the previous comments, I changed the patch to the current solution, which re-uses the fallback code in BPP. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2043) TestHFlush failing intermittently
[ https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204615#comment-15204615 ] Hadoop QA commented on HDFS-2043: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 52s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 46s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 199m 40s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestEncryptionZones | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL |
[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units
[ https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204613#comment-15204613 ] Arpit Agarwal commented on HDFS-9847: - Thanks [~linyiqun]. I am +1 for the latest patch with a minor issue - the following line should also include the name of the configuration setting. {code} 1660 throw new HadoopIllegalArgumentException("Loss of precision converting " 1661 + timeStr + vUnit.suffix() + " to " + unit); {code} I will hold off committing for now in case [~chris.douglas] or [~ste...@apache.org] have comments. Steve - I think the only open question was whether to add week/year suffixes. I don't have an opinion either way. > HDFS configuration without time unit name should accept friendly time units > --- > > Key: HDFS-9847 > URL: https://issues.apache.org/jira/browse/HDFS-9847 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, > HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, > HDFS-9847.006.patch, timeduration-w-y.patch > > > In HDFS-9821, it talks about the issue of leting existing keys use friendly > units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names > contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make > some other configurations which without time unit name to accept friendly > time units. The time unit {{seconds}} is frequently used in hdfs. We can > updating this configurations first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9959) add log when block removed from last live datanode
[ https://issues.apache.org/jira/browse/HDFS-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204604#comment-15204604 ] yunjiong zhao commented on HDFS-9959: - +1 for this. > add log when block removed from last live datanode > -- > > Key: HDFS-9959 > URL: https://issues.apache.org/jira/browse/HDFS-9959 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Minor > Attachments: HDFS-9959.1.patch, HDFS-9959.patch > > > Add logs like "BLOCK* No live nodes contain block blk_1073741825_1001, last > datanode contain it is node: 127.0.0.1:65341" in BlockStateChange should help > to identify which datanode should be fixed first to recover missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
[ https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9579: -- Attachment: HDFS-9579-branch-2.patch Thank you [~sjlee0], [~liuml07] and [~cmccabe]! Here is the patch for branch-2 which passed all tests under hadoop-common-project and hadoop-hdfs-project locally. > Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level > - > > Key: HDFS-9579 > URL: https://issues.apache.org/jira/browse/HDFS-9579 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0 > > Attachments: HDFS-9579-10.patch, HDFS-9579-2.patch, > HDFS-9579-3.patch, HDFS-9579-4.patch, HDFS-9579-5.patch, HDFS-9579-6.patch, > HDFS-9579-7.patch, HDFS-9579-8.patch, HDFS-9579-9.patch, > HDFS-9579-branch-2.patch, HDFS-9579.patch, MR job counters.png > > > For cross DC distcp or other applications, it becomes useful to have insight > as to the traffic volume for each network distance to distinguish cross-DC > traffic, local-DC-remote-rack, etc. > FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To > provide additional metrics for each network distance, we can add additional > metrics to FileSystem level and have {{DFSInputStream}} update the value > based on the network distance between client and the datanode. > {{DFSClient}} will resolve client machine's network location as part of its > initialization. It doesn't need to resolve datanode's network location for > each read as {{DatanodeInfo}} already has the info. > There are existing HDFS specific metrics such as {{ReadStatistics}} and > {{DFSHedgedReadMetrics}}. But these metrics are only accessible via > {{DFSClient}} or {{DFSInputStream}}. Not something that application framework > such as MR and Tez can get to. That is the benefit of storing these new > metrics in FileSystem.Statistics. > This jira only includes metrics generation by HDFS. The consumption of these > metrics at MR and Tez will be tracked by separated jiras. > We can add similar metrics for HDFS write scenario later if it is necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9616) libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests.
[ https://issues.apache.org/jira/browse/HDFS-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Hansen updated HDFS-9616: - Assignee: Bob Hansen (was: James Clampffer) > libhdfs++ Add runtime hooks to allow a client application to add low level > monitoring and tests. > > > Key: HDFS-9616 > URL: https://issues.apache.org/jira/browse/HDFS-9616 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Bob Hansen > Attachments: HDFS-9616.HDFS-8707.002.patch > > > It would be nice to have a set of callable objects and corresponding event > hooks in useful places that can be set by a client application at runtime. > This is intended to provide a scalable mechanism for implementing counters > (#retries, #namenode requests) or application specific testing e.g. simulate > a dropped connection when the test system running the client application > requests. > Current implementation plan is a struct full of callbacks (std::functions) > owned by the FileSystemImpl. A callback could be set (or left as a no-op) > and when the code hits the corresponding event it will be invoked with a > reference to the object (for context) and each method argument by reference. > The callback returns a bool: true to continue execution or false to bail out > of the calling method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9616) libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests.
[ https://issues.apache.org/jira/browse/HDFS-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Hansen updated HDFS-9616: - Attachment: HDFS-9616.HDFS-8707.002.patch Added callbacks that provide context (cluster, file, and event) and a channel back to the libhdfspp code (currently can simulate errors if doing a debug compile). > libhdfs++ Add runtime hooks to allow a client application to add low level > monitoring and tests. > > > Key: HDFS-9616 > URL: https://issues.apache.org/jira/browse/HDFS-9616 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9616.HDFS-8707.002.patch > > > It would be nice to have a set of callable objects and corresponding event > hooks in useful places that can be set by a client application at runtime. > This is intended to provide a scalable mechanism for implementing counters > (#retries, #namenode requests) or application specific testing e.g. simulate > a dropped connection when the test system running the client application > requests. > Current implementation plan is a struct full of callbacks (std::functions) > owned by the FileSystemImpl. A callback could be set (or left as a no-op) > and when the code hits the corresponding event it will be invoked with a > reference to the object (for context) and each method argument by reference. > The callback returns a bool: true to continue execution or false to bail out > of the calling method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9616) libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests.
[ https://issues.apache.org/jira/browse/HDFS-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Hansen updated HDFS-9616: - Status: Patch Available (was: Open) > libhdfs++ Add runtime hooks to allow a client application to add low level > monitoring and tests. > > > Key: HDFS-9616 > URL: https://issues.apache.org/jira/browse/HDFS-9616 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9616.HDFS-8707.002.patch > > > It would be nice to have a set of callable objects and corresponding event > hooks in useful places that can be set by a client application at runtime. > This is intended to provide a scalable mechanism for implementing counters > (#retries, #namenode requests) or application specific testing e.g. simulate > a dropped connection when the test system running the client application > requests. > Current implementation plan is a struct full of callbacks (std::functions) > owned by the FileSystemImpl. A callback could be set (or left as a no-op) > and when the code hits the corresponding event it will be invoked with a > reference to the object (for context) and each method argument by reference. > The callback returns a bool: true to continue execution or false to bail out > of the calling method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10183) Prevent race condition during class initialization
[ https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204495#comment-15204495 ] Sangjin Lee commented on HDFS-10183: I agree that JLS makes it clear that a memory barrier is required (by the JVM) and is expected from the user standpoint. This is something we should be able to rely on safely, or we have a bigger problem. And I don't think there is anything special about {{ThreadLocal}}. I think it is a good idea to make these static variables final for a semantic reason and possibly to work around a JVM bug. However, for the record, we should be able to rely on any initial values of (non-final) static fields in general. I'm +1 on the patch with that note. > Prevent race condition during class initialization > -- > > Key: HDFS-10183 > URL: https://issues.apache.org/jira/browse/HDFS-10183 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 2.9.0 >Reporter: Pavel Avgustinov >Assignee: Pavel Avgustinov >Priority: Minor > Fix For: 2.9.0 > > Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch > > > In HADOOP-11969, [~busbey] tracked down a non-deterministic > {{NullPointerException}} to an oddity in the Java memory model: When multiple > threads trigger the loading of a class at the same time, one of them wins and > creates the {{java.lang.Class}} instance; the others block during this > initialization, but once it is complete they may obtain a reference to the > {{Class}} which has non-{{final}} fields still containing their default (i.e. > {{null}}) values. This leads to runtime failures that are hard to debug or > diagnose. > HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are > very likely to be accessed from multiple threads, and thus the problem is > particularly severe there. Consequently, the patch removed all occurrences of > the issue in the code base. > Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a > refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151], > and introduced a [new instance of the > problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43]. > The attached patch addresses the issue by adding the missing {{final}} > modifier in these two cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers
[ https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10187: --- Status: Patch Available (was: Open) > Add a "list" refresh handler to list all registered refresh > identifiers/handlers > > > Key: HDFS-10187 > URL: https://issues.apache.org/jira/browse/HDFS-10187 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: commandline, supportability > Attachments: HDFS-10187.001.patch > > > HADOOP-10376 added a new feature to register handlers for refreshing daemon > configurations, because name node properties can be reconfigured without > restarting the daemon. This can be a very useful generic interface, but so > far no real handlers have been registered using this interface. I added a new > 'list' handler to list all registered handlers. My plan is to add more > handlers in the future using this interface. > Another minor fix is return a more explicit error message to the client if a > handler is not registered. (It is currently logged at namenode side, but > client side only gets a "Failed to get response." message without knowing why) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204427#comment-15204427 ] John Zhuge commented on HDFS-10185: --- bq. Yes, my new code is used to check if {{Thread.currentThread().interrupt()}} works. Because the original code {{assertTrue(Thread.interrupted())}} of checking this is not executed. Sorry I don't understand, hasn't the unit test of Java Thread package already tested it? If you think the code {{assertTrue(Thread.interrupted())}} can never be reached, use {{Assert.fail}}: {code} Thread.currentThread().interrupt(); try { stm.hflush(); Assert.fail("Not interrupted as expected"); } catch (InterruptedIOException ie) { System.out.println("Got expected exception during flush"); } {code} However, do you think this comment is no longer true? {quote} // If we made it past the hflush(), then that means that the ack made it back // from the pipeline before we got to the wait() call. In that case we should // still have interrupted status. {quote} > TestHFlushInterrupted verifies interrupt state incorrectly > -- > > Key: HDFS-10185 > URL: https://issues.apache.org/jira/browse/HDFS-10185 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10185.001.patch > > > In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places > verifying interrupt state incorrectly. As follow: > {code} > Thread.currentThread().interrupt(); > try { > stm.hflush(); > // If we made it past the hflush(), then that means that the ack made > it back >// from the pipeline before we got to the wait() call. In that case we > should > // still have interrupted status. > assertTrue(Thread.interrupted()); > } catch (InterruptedIOException ie) { > System.out.println("Got expected exception during flush"); > } > {code} > When stm do the {{hflush}} operation, it will throw interrupted exception and > the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put > this before the {{hflush}}, this method will clear interrupted state and the > expected exception will not be throw. The similar problem also appears after > in stm.close. > So we should use a way to get state without clearing interrupted state like > {{Thread.currentThread().isInterrupted()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10188) libhdfs++: Implement debug allocators
James Clampffer created HDFS-10188: -- Summary: libhdfs++: Implement debug allocators Key: HDFS-10188 URL: https://issues.apache.org/jira/browse/HDFS-10188 Project: Hadoop HDFS Issue Type: Sub-task Reporter: James Clampffer Assignee: James Clampffer I propose implementing a set of memory new/delete pairs with additional checking to detect double deletes, read-after-delete, and write-after-deletes to help debug resource ownership issues and prevent new ones from entering the library. One of the most common issues we have is use-after-free issues. The continuation pattern makes these really tricky to debug because by the time a segsegv is raised the context of what has caused the error is long gone. The plan is to add allocators that can be turned on that can do the following, in order of runtime cost. 1: no-op, forward through to default new/delete 2: memset free'd memory to 0 3: implement operator new with mmap, lock that region of memory once it's been deleted; obviously this can't be left to run forever because the memory is never unmapped This should also put some groundwork in place for implementing specialized allocators for tiny objects that we churn through like std::string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers
[ https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10187: --- Labels: commandline supportability (was: supportability) > Add a "list" refresh handler to list all registered refresh > identifiers/handlers > > > Key: HDFS-10187 > URL: https://issues.apache.org/jira/browse/HDFS-10187 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: commandline, supportability > Attachments: HDFS-10187.001.patch > > > HADOOP-10376 added a new feature to register handlers for refreshing daemon > configurations, because name node properties can be reconfigured without > restarting the daemon. This can be a very useful generic interface, but so > far no real handlers have been registered using this interface. I added a new > 'list' handler to list all registered handlers. My plan is to add more > handlers in the future using this interface. > Another minor fix is return a more explicit error message to the client if a > handler is not registered. (It is currently logged at namenode side, but > client side only gets a "Failed to get response." message without knowing why) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers
[ https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10187: --- Attachment: HDFS-10187.001.patch Rev01: added the list refresh handler, a test case to verify it is added by default, command line help/usage, and updated documentation. > Add a "list" refresh handler to list all registered refresh > identifiers/handlers > > > Key: HDFS-10187 > URL: https://issues.apache.org/jira/browse/HDFS-10187 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-10187.001.patch > > > HADOOP-10376 added a new feature to register handlers for refreshing daemon > configurations, because name node properties can be reconfigured without > restarting the daemon. This can be a very useful generic interface, but so > far no real handlers have been registered using this interface. I added a new > 'list' handler to list all registered handlers. My plan is to add more > handlers in the future using this interface. > Another minor fix is return a more explicit error message to the client if a > handler is not registered. (It is currently logged at namenode side, but > client side only gets a "Failed to get response." message without knowing why) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers
Wei-Chiu Chuang created HDFS-10187: -- Summary: Add a "list" refresh handler to list all registered refresh identifiers/handlers Key: HDFS-10187 URL: https://issues.apache.org/jira/browse/HDFS-10187 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.7.2 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang HADOOP-10376 added a new feature to register handlers for refreshing daemon configurations, because name node properties can be reconfigured without restarting the daemon. This can be a very useful generic interface, but so far no real handlers have been registered using this interface. I added a new 'list' handler to list all registered handlers. My plan is to add more handlers in the future using this interface. Another minor fix is return a more explicit error message to the client if a handler is not registered. (It is currently logged at namenode side, but client side only gets a "Failed to get response." message without knowing why) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10183) Prevent race condition during class initialization
[ https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204345#comment-15204345 ] Daryn Sharp commented on HDFS-10183: My reading of the JLS and the footnotes seems to pretty clearly indicate memory barriers are required. I suspect HADOOP-11969 discovered a (hopefully fixes) jvm bug, so this patch is probably cosmetic but certainly doesn't hurt anything. > Prevent race condition during class initialization > -- > > Key: HDFS-10183 > URL: https://issues.apache.org/jira/browse/HDFS-10183 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 2.9.0 >Reporter: Pavel Avgustinov >Assignee: Pavel Avgustinov >Priority: Minor > Fix For: 2.9.0 > > Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch > > > In HADOOP-11969, [~busbey] tracked down a non-deterministic > {{NullPointerException}} to an oddity in the Java memory model: When multiple > threads trigger the loading of a class at the same time, one of them wins and > creates the {{java.lang.Class}} instance; the others block during this > initialization, but once it is complete they may obtain a reference to the > {{Class}} which has non-{{final}} fields still containing their default (i.e. > {{null}}) values. This leads to runtime failures that are hard to debug or > diagnose. > HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are > very likely to be accessed from multiple threads, and thus the problem is > particularly severe there. Consequently, the patch removed all occurrences of > the issue in the code base. > Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a > refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151], > and introduced a [new instance of the > problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43]. > The attached patch addresses the issue by adding the missing {{final}} > modifier in these two cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2043) TestHFlush failing intermittently
[ https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-2043: Attachment: HDFS-2043.002.patch > TestHFlush failing intermittently > - > > Key: HDFS-2043 > URL: https://issues.apache.org/jira/browse/HDFS-2043 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Aaron T. Myers >Assignee: Lin Yiqun > Attachments: HDFS-2043.002.patch, HDFS.001.patch > > > I can't reproduce this failure reliably, but it seems like TestHFlush has > been failing intermittently, with the frequency increasing of late. > Note the following two pre-commit test runs from different JIRAs where > TestHFlush seems to have failed spuriously: > https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/ > https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2043) TestHFlush failing intermittently
[ https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204126#comment-15204126 ] Lin Yiqun commented on HDFS-2043: - As [~kihwal] pointed out, it seems to be an actual race. The two exception logs info in HDFS-10181 also indicated that the stream has been executed other operations when the {{DataStreamer.run}} was did. Update a patch to catch the potential exceptions in stream writing operations and add some chances to retry. Assigned this jira to me, pending jenkins. > TestHFlush failing intermittently > - > > Key: HDFS-2043 > URL: https://issues.apache.org/jira/browse/HDFS-2043 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Aaron T. Myers > Attachments: HDFS-2043.002.patch, HDFS.001.patch > > > I can't reproduce this failure reliably, but it seems like TestHFlush has > been failing intermittently, with the frequency increasing of late. > Note the following two pre-commit test runs from different JIRAs where > TestHFlush seems to have failed spuriously: > https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/ > https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2043) TestHFlush failing intermittently
[ https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-2043: Status: Patch Available (was: Open) > TestHFlush failing intermittently > - > > Key: HDFS-2043 > URL: https://issues.apache.org/jira/browse/HDFS-2043 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Aaron T. Myers >Assignee: Lin Yiqun > Attachments: HDFS-2043.002.patch, HDFS.001.patch > > > I can't reproduce this failure reliably, but it seems like TestHFlush has > been failing intermittently, with the frequency increasing of late. > Note the following two pre-commit test runs from different JIRAs where > TestHFlush seems to have failed spuriously: > https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/ > https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-2043) TestHFlush failing intermittently
[ https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun reassigned HDFS-2043: --- Assignee: Lin Yiqun > TestHFlush failing intermittently > - > > Key: HDFS-2043 > URL: https://issues.apache.org/jira/browse/HDFS-2043 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Aaron T. Myers >Assignee: Lin Yiqun > Attachments: HDFS-2043.002.patch, HDFS.001.patch > > > I can't reproduce this failure reliably, but it seems like TestHFlush has > been failing intermittently, with the frequency increasing of late. > Note the following two pre-commit test runs from different JIRAs where > TestHFlush seems to have failed spuriously: > https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/ > https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10184) Introduce unit tests framework for HDFS UI
[ https://issues.apache.org/jira/browse/HDFS-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204086#comment-15204086 ] Steve Loughran commented on HDFS-10184: --- can you see if you can get away with HtmlUnit first? That uses the JVM's own JS engine & so doesn't need the browser —it'll be able to run under Jenkins. As it's designed to run under JUnit, it'll be maintainable by the current set of java developers, without anyone having to learn node.js > Introduce unit tests framework for HDFS UI > -- > > Key: HDFS-10184 > URL: https://issues.apache.org/jira/browse/HDFS-10184 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Haohui Mai > > The current HDFS UI is based on HTML5 and it does not have unit tests yet. > Occasionally things break and we can't catch it. We should investigate and > introduce unit test frameworks such as Mocha for the UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10186) DirectoryScanner: Improve logs by adding full path of both actual and expected block directories
[ https://issues.apache.org/jira/browse/HDFS-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203998#comment-15203998 ] Kai Zheng commented on HDFS-10186: -- The patch LGTM. Thanks [~rakeshr] for the nice work. > DirectoryScanner: Improve logs by adding full path of both actual and > expected block directories > > > Key: HDFS-10186 > URL: https://issues.apache.org/jira/browse/HDFS-10186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Minor > Attachments: HDFS-10186-001.patch > > > As per the > [discussion|https://issues.apache.org/jira/browse/HDFS-7648?focusedCommentId=15195908=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15195908], > this jira is to improve directory scanner log by adding the wrong and > correct directory path so that admins can take necessary actions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203975#comment-15203975 ] Lin Yiqun commented on HDFS-10185: -- The failed reason of unit test {{hadoop.hdfs.TestHFlush }} is not related by the change in patch. And I tested it in local, the result is good. > TestHFlushInterrupted verifies interrupt state incorrectly > -- > > Key: HDFS-10185 > URL: https://issues.apache.org/jira/browse/HDFS-10185 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10185.001.patch > > > In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places > verifying interrupt state incorrectly. As follow: > {code} > Thread.currentThread().interrupt(); > try { > stm.hflush(); > // If we made it past the hflush(), then that means that the ack made > it back >// from the pipeline before we got to the wait() call. In that case we > should > // still have interrupted status. > assertTrue(Thread.interrupted()); > } catch (InterruptedIOException ie) { > System.out.println("Got expected exception during flush"); > } > {code} > When stm do the {{hflush}} operation, it will throw interrupted exception and > the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put > this before the {{hflush}}, this method will clear interrupted state and the > expected exception will not be throw. The similar problem also appears after > in stm.close. > So we should use a way to get state without clearing interrupted state like > {{Thread.currentThread().isInterrupted()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9555) LazyPersistFileScrubber should still sleep if there are errors in the clear progress
[ https://issues.apache.org/jira/browse/HDFS-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203881#comment-15203881 ] Hadoop QA commented on HDFS-9555: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 33s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 18s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 143m 14s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12777436/9555-v1.patch | | JIRA Issue | HDFS-9555 | | Optional Tests | asflicense
[jira] [Commented] (HDFS-6489) DFS Used space is not correct computed on frequent append operations
[ https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203879#comment-15203879 ] Weiwei Yang commented on HDFS-6489: --- This issue happened to us as well. We have some clients continually append data to HDFS, this causes the dfs usage jump up to a very high value and gives insufficient disk space issue just like this. It looks like HDFS refreshes the space usage in a time interval (with property fs.du.interval), default value is 60, 10 minutes ... that means the dfs usage will be inaccurate and causing a lot of operations fail in 10 minutes (even client did close the streams). We really should have this fixed. > DFS Used space is not correct computed on frequent append operations > > > Key: HDFS-6489 > URL: https://issues.apache.org/jira/browse/HDFS-6489 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.7.1 >Reporter: stanley shi > Attachments: HDFS6489.java > > > The current implementation of the Datanode will increase the DFS used space > on each block write operation. This is correct in most scenario (create new > file), but sometimes it will behave in-correct(append small data to a large > block). > For example, I have a file with only one block(say, 60M). Then I try to > append to it very frequently but each time I append only 10 bytes; > Then on each append, dfs used will be increased with the length of the > block(60M), not teh actual data length(10bytes). > Consider in a scenario I use many clients to append concurrently to a large > number of files (1000+), assume the block size is 32M (half of the default > value), then the dfs used will be increased 1000*32M = 32G on each append to > the files; but actually I only write 10K bytes; this will cause the datanode > to report in-sufficient disk space on data write. > {quote}2014-06-04 15:27:34,719 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock > BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received > exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: > Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, > FINALIZED{quote} > But the actual disk usage: > {quote} > [root@hdsh143 ~]# df -h > FilesystemSize Used Avail Use% Mounted on > /dev/sda3 16G 2.9G 13G 20% / > tmpfs 1.9G 72K 1.9G 1% /dev/shm > /dev/sda1 97M 32M 61M 35% /boot > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10186) DirectoryScanner: Improve logs by adding full path of both actual and expected block directories
[ https://issues.apache.org/jira/browse/HDFS-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203867#comment-15203867 ] Hadoop QA commented on HDFS-10186: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 12s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 50s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 142m 45s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.TestGetFileChecksum | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestHFlush | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12794472/HDFS-10186-001.patch | | JIRA Issue | HDFS-10186 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux
[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203857#comment-15203857 ] Hadoop QA commented on HDFS-10185: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 45s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 16s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 163m 38s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.TestHFlush | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12794463/HDFS-10185.001.patch | | JIRA Issue | HDFS-10185 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux def2ab8f11bf 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ed1e23f | | Default Java