[jira] [Commented] (HDFS-7878) API - expose a unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724038#comment-16724038 ] Adam Antal commented on HDFS-7878: -- We'd like to have this commit in branch-3.0 because of HDFS-14159. I saw that it has been committed but soon reverted. [~chris.douglas], could you please explain why it has been reverted? Does this break something? Anyways, the cherry-pick is clean for me and the build succeeds. If a full testsuite is also necessary we can file another jira for that. > API - expose a unique file identifier > - > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Chris Douglas >Priority: Major > Labels: BB2015-05-TBR > Fix For: 3.1.0 > > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose a unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227560#comment-16227560 ] Chris Douglas commented on HDFS-7878: - Reverted from 3.0.0, will target 3.1.0. > API - expose a unique file identifier > - > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Chris Douglas > Labels: BB2015-05-TBR > Fix For: 3.1.0 > > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose a unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227160#comment-16227160 ] Sergey Shelukhin commented on HDFS-7878: Thank you for all the work getting this in! I thought it wouldn't actually ever happen :) > API - expose a unique file identifier > - > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Chris Douglas > Labels: BB2015-05-TBR > Fix For: 3.0.0 > > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose a unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227128#comment-16227128 ] Hudson commented on HDFS-7878: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13168 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13168/]) HDFS-7878. API - expose a unique file identifier. (cdouglas: rev d015e0bbd5416943cb4875274e67b7077c00e54b) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java * (edit) hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawPathHandle.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Options.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractOpenTest.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/ContractOptions.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/ContractTestUtils.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsPathHandle.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HarFileSystem.java * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathHandle.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/resources/contract/hdfs.xml * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileStatusSerialization.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java > API - expose a unique file identifier > - > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Chris Douglas > Labels: BB2015-05-TBR > Fix For: 3.0.0 > > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225910#comment-16225910 ] Sean Mackrory commented on HDFS-7878: - No other feedback from me - +1 on the bits I understand, like Steve :) > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225895#comment-16225895 ] Anu Engineer commented on HDFS-7878: Thanks for the heads up. Please go ahead. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225867#comment-16225867 ] Chris Douglas commented on HDFS-7878: - Thanks [~elgoiri] and [~ste...@apache.org]. If there's no other feedback (/cc [~mackrorysd], [~andrew.wang], [~anu], [~eddyxu]) I'll commit this tomorrow. Followup JIRAs should be easier to review. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225724#comment-16225724 ] Steve Loughran commented on HDFS-7878: -- Bits I Understand are +1 from me > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225394#comment-16225394 ] Íñigo Goiri commented on HDFS-7878: --- [~chris.douglas], thanks for pushing this. V21 LGTM. For the QA comment: * HADOOP-14990 tracks the issues with the license. * The unit tests seem spurious. * I'd like to get rid of the javadoc warnings but I don't see a clean way. Regarding the patch itself: * The contract looks good * The contract tests cover the use case * The HDFS implementation seems correct * I haven't been able to go through the HDFS test for the contract in the logs but I assume that QA ran those and they don't fail so it looks good. [~chris.douglas], those tests run, right? * The documentation is clear and filed HDFS-12729 to track the other parts that are not clearly documented. +1 overall > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225318#comment-16225318 ] Chris Douglas commented on HDFS-7878: - Folks, can this get a definitive review? I think v21 addresses all the feedback and it is ready to commit. If there are other fixes- missing annotations, clearer docs, etc.- I'm happy to complete a checklist before commit, but this needs to converge. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, > HDFS-7878.21.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223313#comment-16223313 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 20s{color} | {color:green} root: The patch generated 0 new + 425 unchanged - 2 fixed = 425 total (was 427) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 3s{color} | {color:red} hadoop-common-project_hadoop-common generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 10s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 33s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 26s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 37s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}155m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestMaintenanceState | | | hadoop.hdfs.TestCrcCorruption | | | hadoop.hdfs.TestSmallBlock | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure200 | | | hadoop.hdfs.TestLeaseRecovery2 | | |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223105#comment-16223105 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 18m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 15s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 35s{color} | {color:orange} root: The patch generated 1 new + 424 unchanged - 2 fixed = 425 total (was 426) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 32s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 14s{color} | {color:red} hadoop-common-project_hadoop-common generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 55s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 20s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}109m 45s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 33s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}230m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-7878 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12894438/HDFS-7878.20.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1646#comment-1646 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 18m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 53s{color} | {color:green} root: The patch generated 0 new + 424 unchanged - 2 fixed = 424 total (was 426) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 18s{color} | {color:red} hadoop-common-project_hadoop-common generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 15s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 8s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}138m 52s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 54s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}280m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ha.TestZKFailoverController | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.namenode.TestReencryption | | |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220898#comment-16220898 ] Sean Mackrory commented on HDFS-7878: - {quote}Wire compatibility with HDFS{quote} How do commented out lines ensure wire compatibility? It would make sense if these were obsolete fields and we didn't want to reuse obsolete number in case older messages get misinterpreted, but then we should be reusing. Nevertheless, it appears we're not doing that in the latest patch anymore. I do think this resolves my previous concerns with the patch. In testCrossSerializationProto and testJavaSerialization we're removing assertions that the PathHandle to what should be the same file should be identical. Isn't that still true, and should be? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219673#comment-16219673 ] Íñigo Goiri commented on HDFS-7878: --- I'm going through the HDFS code itself and it might be good to document the .reserved folder structure. Probably also adding a unit test for the {{DFSUtilClient#makePathFromFileId}} more for documenting than for actual testing. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, > HDFS-7878.18.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219549#comment-16219549 ] Íñigo Goiri commented on HDFS-7878: --- Thanks [~chris.douglas] for this. I went over the patch and it looks good. I personally think that the best reference for this are the new tests in {{AbstractContractOpenTest}}. I would almost reference it in the docu. Following this line of thought, I would add more comments to those tests explaining the details. {{testOpenFileByExact}} does a pretty good job (it could explain a little more the failure case at the end). I would try to follow the same line in all the new tests and extend it a little. BTW, not sure what is the policy on expanding the imports, for example: {code} import static org.apache.hadoop.fs.contract.ContractTestUtils.*; {code} > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218511#comment-16218511 ] Steve Loughran commented on HDFS-7878: -- filesystem.md LGTM, +1 for that bit, Same for the API AbstractContractOpenTest: L241, L260. if you use LambdaTestUtils.intercept() it automatically creates the exception message including a toString() value of the lambda exp. I could imagine adding something similar for all the skip-if-unsupported ops, eg. {code} static T evalOrSkip(String op, Callable eval) throws Exception { try { return eval.call() } catch (UnsupportedException ex) { skip("Unsupported feature: " + op) } } {code} Then you could go {code} PathHandle fd = evalOrSkip("exact", () -> fd.getFileSystem().getPathHandle(stat, HandleOpt.exact())); {code} ...etc w.r.t HDFS changes, I'm not qualified to comment on the implementation. Sorry. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216140#comment-16216140 ] Chris Douglas commented on HDFS-7878: - The javadoc fails to generate a {{\@link}} to a vararg parameter, both with the ellipsis and array syntax. Not sure how to fix that. Test failures are due to timeouts and port in use errors. This is ready for review, if anyone has cycles for it. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, > HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213707#comment-16213707 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 12m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 11s{color} | {color:green} root: The patch generated 0 new + 346 unchanged - 2 fixed = 346 total (was 348) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 57s{color} | {color:red} hadoop-common-project_hadoop-common generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 14s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 18s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 20s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}187m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.federation.metrics.TestFederationMetrics | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | HDFS-7878 | |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213510#comment-16213510 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 40s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 16m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 33s{color} | {color:orange} root: The patch generated 8 new + 346 unchanged - 2 fixed = 354 total (was 348) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 7s{color} | {color:red} hadoop-common-project_hadoop-common generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 54s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 48s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}127m 59s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}247m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210392#comment-16210392 ] Chris Douglas commented on HDFS-7878: - Failure is due to HADOOP-14961 > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210116#comment-16210116 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 2m 50s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7878 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12892907/HDFS-7878.14.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21739/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182986#comment-16182986 ] Andrew Wang commented on HDFS-7878: --- Beta is our compatibility freeze for 3.0.0. I don't know the history of 2.1.0-beta to GA, but breaking compatibility between beta and GA violates the user expectations of a beta. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182381#comment-16182381 ] Steve Loughran commented on HDFS-7878: -- I think you can change APIs after a beta...it's only on the final release that the guidelines kick in. Topic of many arguments in the past, including even wire formats. (0.20.204 etc) > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181721#comment-16181721 ] Sean Mackrory commented on HDFS-7878: - {quote}Neither API has been part of a release, yet. {quote} But it is included in a branch that's being prepared for a release. Unless we can still squeeze this in to beta-1 (and we may indeed be a bit late for that - [~andrew.wang]?), I would assume this is headed for 3.1 or maybe 3.0 - and either way it would need to keep a constructor with the same prototype as what currently exists in branch-3.0. Right? On an unrelated note, I second your thoughts that it should be open(FileHandle) and not open(FileStatus). But other than that, I'd +1 the rest of the patch. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181641#comment-16181641 ] Chris Douglas commented on HDFS-7878: - bq. The following is already annotated as Public and Stable in branch-3.0, but is removed in this patch bq. Can we make sure a function with that prototype is added back? LocatedFileStatus is a similar situation Neither API has been part of a release, yet. This was supposed to be part of 3.0.0-beta1 (and several earlier releases), but reviewers haven't been able to free up cycles for it. bq. Could someone also enlighten me as to the purpose of the commented out lines in FSProtos. Wire compatibility with HDFS > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181634#comment-16181634 ] Sean Mackrory commented on HDFS-7878: - Looks like this patch is violating some of compatibility guarantees. The following is already annotated as Public and Stable in branch-3.0, but is removed in this patch: {code} public FileStatus(long, boolean, int, long, long, long, FsPermission, String, String, Path, Path, boolean, boolean, boolean) {code} Can we make sure a function with that prototype is added back? LocatedFileStatus is a similar situation, although it's "Evolving": {code} public LocatedFileStatus(long length, boolean isdir, int, long, long, long, FsPermission, String, String, Path, Path, boolean, boolean, boolean, BlockLocation[] locations) {code} Could someone also enlighten me as to the purpose of the commented out lines in FSProtos.proto? I thought it was odd that we were replacing "alias = 13", but I have no idea why that line is there in the first place. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172208#comment-16172208 ] Chris Douglas commented on HDFS-7878: - Any other feedback on the patch? As an API for open-exact, one possible implementation could use the {{Options}} pattern used in FileContext and SequenceFile i.e., {code:java} PathHandle FileStatus::getPathHandle(Options... opts); {code} which would imply {{open(PathHandle)}} instead of {{open(FileStatus)}}. A few folks have raised the idea that ignoring some fields in the {{FileStatus}} instance could be confusing (if the file were renamed, permissions/modfication time changed, etc.). Using {{PathHandle}} explicitly would make that clearer, and the Options API is one, generic way to surface different FileSystem capabilities. However, it would mean that serializing a generic {{FileStatus}} object may not be sufficient to implement the contract, unless its serialized form supports all possible Options. This could be handled by a method on FileSystem i.e., {code:java} PathHandle FileSystem::getPathHandle(FileStatus status, Options... opts) {code} Which is cleaner in some respects, particularly if guaranteeing an option requires an RPC to get/set state in the FileSystem. It also implies that the {{PathHandle}} is sufficient to serialize across processes. It does nothing to prevent crossing {{PathHandle}} instances across FileSystems, unless the FileSystem serialized a guard on each instance. This is all shuffling around a few APIs; the functionality is similar. Setting a default of open-exact is probably what most users expect, and what most FileSystems (S3, WASB) will implement. [~sershe], could you be more explicit about the use in Hive? Do you need open-by-inodeID to resolve to any version of the file? It's worth mentioning that this spec is incomplete. Even if the open includes guards, the stream is still subject to whatever the FIleSystem supports. So a consistent open could still see stale/updated state. /cc [~anu], [~andrew.wang], [~ste...@apache.org] > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, > HDFS-7878.12.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160598#comment-16160598 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 39s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 24s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 8s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 13m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 37s{color} | {color:green} root generated 0 new + 1281 unchanged - 2 fixed = 1281 total (was 1283) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 27s{color} | {color:orange} root: The patch generated 1 new + 446 unchanged - 4 fixed = 447 total (was 450) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 7s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 37s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}110m 19s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 15s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}210m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure110 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure100 | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.namenode.TestReencryptionWithKMS | | |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160256#comment-16160256 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 47s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 50s{color} | {color:green} root generated 0 new + 1281 unchanged - 2 fixed = 1281 total (was 1283) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 5s{color} | {color:orange} root: The patch generated 6 new + 446 unchanged - 4 fixed = 452 total (was 450) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 35s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 21s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 6s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 41s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestFileStatusSerialization | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure020 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030 | |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155096#comment-16155096 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 4s{color} | {color:green} root: The patch generated 0 new + 409 unchanged - 1 fixed = 409 total (was 410) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 11s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 21s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 2s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 8s{color} | {color:red} hadoop-hdfs-httpfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}193m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.net.TestDNS | | | hadoop.metrics2.sink.TestFileSink | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure100 | | | hadoop.hdfs.TestClientProtocolForPipelineRecovery | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure020 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090 | | | hadoop.hdfs.TestLeaseRecoveryStriped | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestFileCorruption | | |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152542#comment-16152542 ] Steve Loughran commented on HDFS-7878: -- Coming together nicely h4. RawPathHandle * L108: any limit on buffer size here? I'm just worrying about malicious DoS against client/server RAM h4. filesystem.md L104: review spelling; maybe make a MUST throw, it's a new feature after all. L179: back-quote all types in the para L720: I've never come across the word "prenominate" before, looks like the US version of "aforementioned". Need something globally understood, e.g. "declared previously". L729: {code} result = FSDataInputStream(0, FS.Files'[p']) {code} h4. AbstractContractOpenTest L191: We should have {{ContractTestUtils}} call for rename(), if not, time to add one. L192: {{ContractTestUtils.assertPathDoesNotExist}} HDFS code is for HDFS team to review. Do make sure it can handle the condition: new API call without PathHandle bytes, with a test creating {{HdfsPathHandle}} for this. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, > HDFS-7878.09.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151793#comment-16151793 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 38s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 21s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 59s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 41s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s{color} | {color:green} hadoop-aliyun in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}176m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | | hadoop.hdfs.TestLeaseRecoveryStriped | | | hadoop.hdfs.TestClientProtocolForPipelineRecovery | | | hadoop.hdfs.TestFileCreationClient | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure000 | | Timed out junit tests | org.apache.hadoop.hdfs.TestWriteReadStripedFile | | | org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | HDFS-7878 | | JIRA
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142582#comment-16142582 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 59s{color} | {color:orange} root: The patch generated 2 new + 382 unchanged - 0 fixed = 384 total (was 382) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 14s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 22s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 15s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 25s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 44s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}171m 57s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestFileAppendRestart | | | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | | | hadoop.hdfs.TestReadStripedFileWithDecoding | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 | | | hadoop.hdfs.TestWriteReadStripedFile | | | hadoop.hdfs.server.namenode.TestReencryption | | | hadoop.hdfs.TestLeaseRecoveryStriped | | | hadoop.hdfs.TestClientProtocolForPipelineRecovery | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-7878 | | JIRA Patch URL |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141239#comment-16141239 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 4s{color} | {color:orange} root: The patch generated 5 new + 300 unchanged - 0 fixed = 305 total (was 300) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 57s{color} | {color:red} hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 31s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 30s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 48s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 39s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 30s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}176m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-common-project/hadoop-common | | | Class org.apache.hadoop.fs.RawPathHandle defines non-transient non-serializable instance field fd In RawPathHandle.java:instance field fd In RawPathHandle.java | | Failed junit tests | hadoop.fs.TestHarFileSystem | | | hadoop.fs.TestFilterFileSystem | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 | | | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | Timed out junit tests | org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-7878 | | JIRA Patch
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140437#comment-16140437 ] Chris Douglas commented on HDFS-7878: - bq. There's now a builder API for open/create There isn't a builder API for open, and I couldn't find an open JIRA/patch for it. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140397#comment-16140397 ] Andrew Wang commented on HDFS-7878: --- We still have an outstanding question for Sergey that affects the PathHandle vs. FileStatus choice: whether these IDs need to be passed cross-process. A FileStatus will be significantly bigger in this case. DirectFS sounds like overkill, I'd hope we can address this with an incremental improvement. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140300#comment-16140300 ] Chris Douglas commented on HDFS-7878: - bq. the "better" calls. Fine by me. If this JIRA only added an option to the builder for {{open}}, that would satisfy most of our use cases. bq. I'm a bit confused as to what a PathHandle to an as yet uncreated path would be? I was thinking we could expose the inode ID for the target directory. HDFS doesn't supply this now, but it could. The FS implementations over object stores might have trouble supporting it. bq. doing this in FileContext would be better Maybe? This came up during the [discussion|https://issues.apache.org/jira/browse/HADOOP-12910?focusedCommentId=15189492=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15189492] of async renames. We keep suggesting that we push functionality and users to that API, but it's not happening. I like the idea of making {{FileContext}} into an "advanced" API that allows clients to reason about consistency, visibility, performance, and durability. But for the near and foreseeable future, most of us need the "open by file ID" API. bq. I know I've been doing things underneath FileSystem, but that doesn't mean we should do more in terms of evolving that API. I won't ask for consistency, but I will ask for a short path through this particular issue. HDFS-6984 took a very long detour while I incorporated all the feedback. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140008#comment-16140008 ] Steve Loughran commented on HDFS-7878: -- # FS APIs? Pull in [~sanjay.radia] # There's now a builder API for open/create, that's the only one that should be added for a path handle # similarly, the full rename with opts, none of the other "utility" renames; the list calls should always return an RemoteIterator. That is, the "better" calls. # though there, I'm a bit confused as to what a PathHandle to an as yet uncreated path would be? # Overall, and I think Sanjay will be aligned with me here, doing this in FileContext would be better. I know I've been doing things underneath FileSystem, but that doesn't mean we should do more in terms of evolving that API. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139332#comment-16139332 ] Chris Douglas commented on HDFS-7878: - [~sershe] I'm finishing a patch, and will post it this week. This took a long detour through HDFS-6984. We seem to be down to whether we should work directly with {{PathHandle}} instances or with {{FileStatus}} instances. While {{FileSystem}} has been trending toward {{FileStatus}}-based APIs for a [long time|https://issues.apache.org/jira/browse/HADOOP-6198?focusedCommentId=12744611=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12744611], I agree 1) we don't yet have consensus on this approach and 2) that trend has stalled as {{FileSystem}} evolution accommodates new features, rather than improving base functionality. Nonetheless, I'd like to make a case for it, here. Repeating discussion of its (in)efficiency discussed [elsewhere|https://issues.apache.org/jira/browse/HDFS-6984?focusedCommentId=15755358=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15755358], the overhead of passing {{FileStatus}} instead of a {{PathHandle}} is rarely significant, and solvable in cases managing thousands or millions of handles are in memory. [~sershe], can you comment on whether this overhead would be significant for your use case? Mostly, I want to avoid a proliferation of similar APIs in {{FileSystem}}. We need something for open. Let's solve that, and see whether it's useful. *Alternatively*, we could create a new API designed around {{PathHandle}} instances. By way of example, we could use have an API like: {code:java} public class FileSystem /* blah blah */ { /* blah blah */ public DirectFS direct() { /* ... */ } } interface DirectFS { FSDataInputStream open(PathHandle file /* opts */); FSDataOutputStream create(PathHandle parent, Path child /* opts */); FSDataOutputStream append(PathHandle file /* opts */); boolean rename(Path src, PathHandle dst /* opts */); boolean rename(PathHandle src, Path dst /* opts */); boolean rename(PathHandle src, PathHandle dst /* opts */); RemoteIterator listFileStatus(PathHandle dir /* opts */); /* globFileStatus, ACLs, xattr, etc. */ } {code} {{DirectFS}} calls would follow the {{FileSystem}} specification, with the additional requirement that every {{PathHandle}} resolves to the entity extracted from a {{FileStatus}} from that {{FileSystem}}, to the extent that is enforceable on that {{FileSystem}}. Among the opts, {{FileSystem}} implementations could add specific criteria that change how strictly the "same entity" constraint is enforced. This would be an alternative way to encode the quasi-read-committed semantics in HADOOP-12077. It would also admit cases like the one that motivated this JIRA i.e., "I don't care where this file is in the namespace, just open it". In the fullness of time, _maybe_ we would implement this. However, all the use cases we have now are much, much simpler. We need a consistent handle to read files, we have some ideas how to implement this for other storage systems like S3/Azure. If the {{PathHandle}} API gets no traction, at least we have a reasonable fallback so {{FileSystem::open(FileStatus, ...)}} can dispatch to {{FileSystem::open(Path, ...)}}. [~andrew.wang] and [~ste...@apache.org], please, please let me know if this is OK with you soon, so we can finish this. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138848#comment-16138848 ] Sergey Shelukhin commented on HDFS-7878: I'd like to attempt to resurrect this. However from reading the comments above I don't sense a consensus here... At this point, I really don't care at all about the class and method structure anymore as long as there's a usable API. Can someone please summarize the above discussion or guide one of the patch versions thru? It reads to me like every approach taken w.r.t. the classes/etc. was rejected by at least one person so I'm not sure which one to take. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100856#comment-16100856 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-7878 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7878 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12826530/HDFS-7878.06.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/20412/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963711#comment-15963711 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-7878 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7878 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12826530/HDFS-7878.06.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19036/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515911#comment-15515911 ] Steve Loughran commented on HDFS-7878: -- I see what you are getting at. No objections from me then. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515098#comment-15515098 ] Anu Engineer commented on HDFS-7878: Yes, I think we should use {{BikeShed}} class instead of long, given that we might want to inherit from it. I also love the fact that {{BikeShed}} is an abstract class, so I can inherit all wonderfully *colored* {{BikeSheds}}. On a more serious note : I am {{+1}} on version 6 of this patch. This patch seems to have addressed all substantive comments we have had on this issue. Since this is going into Alpha-2, why don't we commit this and then discuss renaming {{InodeId}} to {{BikeShed}} or whatever it is going to be named in a separate Jira. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514987#comment-15514987 ] Chris Douglas commented on HDFS-7878: - bq. I'd highlight YARN app submission as a key use case; it currently uses timestamps and gets very confused if you overwrite something, even if its contents are unchanged YARN {{LocalResourceProto}} could replace some of its metadata with a {{FileStatusProto}} from HDFS-6984, if it were available. That should include this identifier. bq. The other way to expose it would be from a byte[] of version info An opaque {{byte[]}} is awkward with multiple args. A caller would need to pass a {{Map}}, arrays of {{Path}} and {{byte[]}}, or a composite type (kind of redundant, given {{FileStatus}} exists). We could add a {{FileSystem#getHandle(FileStatus)}} API to return an opaque, serializable type that's the minimal set of bytes to refer to that entity. I suppose we could mandate {{BikeShed#getPath()}} so it's not too annoying, but as a subset, we're saving a handful of bytes over {{FileStatus}}. Particularly after HDFS-6984, if someone wants to store a few million of them efficiently, there are better methods for that. That said, I agree that {{FileStatusProto}} should represent the {{BikeShed}} as bytes. bq. Additionally, InodeId looks implementation-specific to me, which makes this API not useful to or be supported natively by other backend To be clear, this supports using {{FileStatus}} in {{FileSystem}} APIs, rather than {{InodeId}} e.g., {{FileSystem#open(InodeId)}}? Do you agree that we should use a type for {{BikeShed}}, not just a long? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514658#comment-15514658 ] Anu Engineer commented on HDFS-7878: [~chris.douglas] +1 on {{BikeShed}} :) > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514629#comment-15514629 ] Lei (Eddy) Xu commented on HDFS-7878: - [~chris.douglas] Thanks a lot for working on it. I'd prefer to use {{FileStatus}} with {{file id}} instead a new {{FileHandler}} or {{InodeId}}. As it is more familiar with the users, and if people want to use {{FileId}} alone to save memory (e.g., using in cache), they have the choice of using {{FileStatus#getFileId()}}. I think it'd be easier to make this API be used by downstream projects. Additionally, {{InodeId}} looks implementation-specific to me, which makes this API not useful to or be supported natively by other backend (i.e., Azure or S3)?. One additional point is that {{stat(2)}} returns inode ({{stat.st_inode}}) as well, so it should not be too surprised for the end user. And it might be worthwhile to take this chance to finally change {{FileStatus}} to be serializable for Protobuf (HDFS-6984) from Hadoop 3 and onward. Thanks. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512866#comment-15512866 ] Steve Loughran commented on HDFS-7878: -- I see: if I do getFileStatus, I get back enough info that I can open off that, guaranteed that the version of the file opened is from the status check, else I'd see an error. And this is done with a new field, so things like checksums/etags can be used off object stores, a version number if someone every hooked VAXFS up to Hadoop, etc I also see why this implies that FileStatus needs to be serializable: I want to be able to submit something to a server and be confident that is what gets picked up. I'd highlight YARN app submission as a key use case; it currently uses timestamps and gets very confused if you overwrite something, even if its contents are unchanged. The other way to expose it would be from a byte[] of version info, so anything can marshall file version info as {{(path, bytes[])}}; open(FileStatus status) would just be mapped to {{open(Path, status.versionInfo}}. I think that could be more flexible in terms of passing data around, especially as you could extend the protobuf in things like AM launch context, that is, in {{LocalResourceProto}}. Have you raised this with the YARN team? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512247#comment-15512247 ] Chris Douglas commented on HDFS-7878: - bq. is this somehow going to change end user APIs so a path isn't enough to refer to things? No, I'll try to summarize. This JIRA proposes an API for {{FileSystem}} that exposes HDFS open-by-inode. Not every implementation can enforce HDFS semantics precisely, but many can support an "open with verification" API that improves on the TOCTOU races common to most applications. These would mostly be new APIs. While v05 proposed a {{long}} as the fileId, implementations other than HDFS use different metadata (mostly strings; matching the {{FileStatus}} metadata is often possible). If the {{FileHandle}} were exposed as a type, then implementations could (opaquely) embed metadata in {{FileStatus}} for "consistent" operations. v06 added a {{InodeId}} type (renamed {{FileHandle}} or {{PathHandle}} or {{BikeShed}}). These metadata can be encoded as a new field in {{FileStatus}}, which would be API compatible, but serialized instances would not work across major versions. While this seems like a reasonable jump to make in 3.x, it could cause some pain. This aspect is discussed in HDFS-6984. In this JIRA, we're discussing user-facing APIs for consistently opening a file, a use case [~sershe] needs in Hive and HDFS-9806 needs for correctness. As [~cmccabe] pointed out, we also want to consider consistent handling of directories and symlinks as we define this API. In favor of augmenting {{FileStatus}}: it's simple, it's probably what most users expect, and it's serializable. It would be a natural, unsurprising API for create/rename/delete/listFileStatus. That said, it's also significantly larger than an 8 byte fileId, but implementations can detect crossed streams (i.e., requesting a fileId from the wrong {{FileSystem}}; {{ViewFS}} can use it for demux). In favor of {{open(FileHandle)}}, we could add {{FileSystem#createHandle(FileStatus)}} that _may_ use an RPC to generate a serializable instance. These could be the minimal, serializable metadata to refer to that inode. ping [~fabbri], [~eddyxu] > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510055#comment-15510055 ] Steve Loughran commented on HDFS-7878: -- proposal wise, I'm somewhat confused. is this somehow going to change end user APIs so a path isn't enough to refer to things? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508091#comment-15508091 ] Chris Douglas commented on HDFS-7878: - bq. Any changes to FileStatus has to be done so that external filesystems (e.g. google cloud storage) which subclass FileStatus don't break. I know, given the pain Guice causes us it'd be retaliation, but the GCS team aren't the guice team, and would upset users. Of course. On HDFS-6984, I'm anxious changing the {{FileStatus}} serialization between 2.x and 3.x. Distcp is fine- that dependency is only intra-job- but if someone has a {{SequenceFile}} full of {{FileStatus}} objects in their clusters, those would become unreadable without a 2.x common jar. Whether it's worth it or not, we can discuss in HDFS-6984. bq. FileStatus is part of the public FS API, documented in FileSystem.md. You're proposing changing it, aren't you? Which means you get to update the doc and the tests in AbstractContractGetFileStatusTest Happily. Do you have feedback on this proposal, outside of its requirement to update the FS specification? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507647#comment-15507647 ] Steve Loughran commented on HDFS-7878: -- hmmm. * S3a adds an {{S3AFileStatus}} with a new field "isEmptyDir". I've contemplated making it a subclass of {{LocatedFileStatus}}. so lists operations returning those wouldn't need any new objects. * grepping the Hadoop production code, FileStatus.getPath is often used as the input to FS operations (open, delete, rename). But not so often for create(), and rename destinations. * Any changes to FileStatus has to be done so that external filesystems (e.g. google cloud storage) which subclass FileStatus don't break. I know, given the pain Guice causes us it'd be retaliation, but the GCS team aren't the guice team, and would upset users. Now, the bad news. FileStatus is part of the public FS API, documented in {{FileSystem.md}}. You're proposing changing it, aren't you? Which means you get to update the doc and the tests in {{AbstractContractGetFileStatusTest}}. And now that I know that JIRA is planning to change the file, I'm going to be expecting to see that handles. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504298#comment-15504298 ] Chris Douglas commented on HDFS-7878: - bq. How about using name service id ("dfs.nameservice.id") here? Tracing through {{DFSClient}} init is quite a journey. Isn't the nameservice ID passed as the URI for the client? Is setting it as in v06 insufficient? The {{FileSystem}} implementations I looked at often return nonce information as a String. [S3a|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html] could check ETag or version ID in the response (both Strings), [Azure|https://msdn.microsoft.com/en-us/library/azure/dd179371.aspx] blobs could use Etag or DateTime (easy to encode), and HDFS needs to convert the inodeId to a Path segment, anyway. That said: Strings in the JVM aren't an efficient representation (particularly of HDFS inodes), some implementations may not return sufficient metadata to generate a {{FileHandle}} from {{FileStatus}} (requiring an RPC, so {{FileStatus}} would need a back-pointer to its {{FileSystem}}), and requiring {{toString()}} serialization is regrettable. We could add {{byte[] FileSystem::createFileHandle(FileStatus)}} (or similar variants), with the contract that these are the minimum set of bytes for a [comparably configured] {{FileSystem}} instance to address exactly that inode. This seems redundant with all the existing serialization, and most of the APIs would be awkward (e.g., {{open(Path p, byte[] nonce)}} ?). [~sershe], is it important to maintain {{open(FileHandle)}} independent of the {{FileStatus}} instance? If Hive were to serialize the {{FileStatus}} instance (with {{FileHandle}}) instead of just the path/inode, then it could use this API. Adding other {{FileSystem}} operations accepting {{FileStatus}} also has the virtue of reusing the most-commonly used {{getFileStatus}} and {{listFileStatus}}, rather than another set of APIs managing {{FileHandle}}. Put more directly, if we were to add directory listing (from a dirent, similar to {{ftw/nftw}}), delete, rename, etc. we would probably not want to add these for an opaque {{FileHandle}} reference (to which the caller would have to retain a map). Many applications perform checks on ownership, last-modification time, and other metadata in race with {{FileSystem}} operations; AFAIK it's comparatively rare that users would prefer that their operation to apply to whatever entity is referenced by a {{Path}} at that moment. Point being: even if the {{FileStatus}}-oriented APIs were used thoughtlessly, I doubt users would be surprised at the semantics. Instead of adding {{FileHandle}} as a {{Writable}}, perhaps this should take HDFS-6984 as a prerequisite. I'm not sure why {{HdfsFileStatus}} doesn't extend {{FileStatus}}. Is that something we could change in 3.x? Aside: Java has already taken {{FileDescriptor}} and recently {{Path}}; should this be a {{PathHandle}}? Ping [~ste...@apache.org], [~cnauroth] > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468143#comment-15468143 ] Jing Zhao commented on HDFS-7878: - {{clusterID}} can be a candidate, but it can only be retrieved from either VERSION file or the NamespaceInfo. It is hard for a dfsclient to map a clusterId to a specific NameNode. How about using name service id ("dfs.nameservice.id") here? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456969#comment-15456969 ] Chris Douglas commented on HDFS-7878: - [~jingzhao] Good idea on {{FileHandle}}; I prefer that to {{InodeId}}, though it's potentially confusing if we want to allow handles to directories, too (e.g., if a process wanted to walk a subtree, ignoring parent renames). bq. one question is how HdfsInodeId#nnId handles HA setup and different FileSystem schemes Good point. For HDFS, I don't think we care which protocol they use or which NN the client contacts. Is {{clusterID}} the right identifier? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456737#comment-15456737 ] Jing Zhao commented on HDFS-7878: - So the InodeId is actually a file handler? Maybe in that sense to directly have a {{open(FileHandler)}} is OK? For the patch, one question is how {{HdfsInodeId#nnId}} handles HA setup and different FileSystem schemes? For the same HDFS file, looks like the nnId URI can be derived from: # name service ID in HA or federation # the detailed host:port of the current active NN # URI with webhdfs scheme # URI with hdfs scheme Should we treat them as the same HdfsInodeId? Or maybe we always use name service id for nnId? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456213#comment-15456213 ] Chris Douglas commented on HDFS-7878: - bq. I meant when callers are in different processes/nodes, e.g. Hive generating splits for remote tasks. We can work around it by getting status and verifying file ID, then opening from status. However fileId-based open would be more convenient (and save a call, potentially - at least from the caller perspective, not sure if it's same difference from the NN calls perspective)... I'm missing something. The fileid/InodeId is a field in FileStatus. FileStatus-open uses inodeid-open directly; there's no second call... Oh; because v06 doesn't include the InodeId as part of the Writable contract. True. Would passing a full FileStatus across processes be too heavy? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456174#comment-15456174 ] Sergey Shelukhin commented on HDFS-7878: I meant when callers are in different processes/nodes, e.g. Hive generating splits for remote tasks. We can work around it by getting status and verifying file ID, then opening from status. However fileId-based open would be more convenient (and save a call, potentially - at least from the caller perspective, not sure if it's same difference from the NN calls perspective)... > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454081#comment-15454081 ] Hadoop QA commented on HDFS-7878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 35s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 36s{color} | {color:orange} root: The patch generated 8 new + 328 unchanged - 0 fixed = 336 total (was 328) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 0s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 0s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 10s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 35s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.TestHarFileSystem | | | hadoop.fs.TestFilterFileSystem | | | hadoop.hdfs.server.datanode.TestDataNodeLifeline | | | hadoop.hdfs.web.TestWebHDFS | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-7878 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12826530/HDFS-7878.06.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 22ec00c32c73 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6f4b0d3 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16600/artifact/patchprocess/diff-checkstyle-root.txt | | unit |
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453857#comment-15453857 ] Chris Douglas commented on HDFS-7878: - bq. One idea behind open(long/InodeId) is to be able to open files consistently; e.g. for partial caching (one needs to be sure that the cached data and the data read from FS are for the same file, guarding against overwrites). HDFS-9806 needs this API for exactly that reason. We need to open paths consistently, and at least know when an external reference diverges from state cached in local storage. bq. File ID is easy to propagate between different readers for this purpose, but it seems that FileStatus would be rather inconvenient. It forces the caller who is dealing with the FS to get the status by name first (which also only works if the name is known; in our case we do know the name) and verify that fileId is consistent. FileStatus is more state to pass around, but does it change the calling pattern? Some reader needs to obtain the internal reference. v06 was attempting to split the difference between related use cases: # Obtain a reference to the file, open by reference irrespective of changes in the hierarchical namespace (e.g., HDFS {{open("/.reserved/.inode/blahblah")}}) # Obtain a reference to the file, verify TOCTOU (e.g., list a directory, return a only if it's the same entity referenced in the first operation) # Versioning (e.g., refuse to open a reference if the entity returned is less recent than the metadata version) And so on. Would you prefer the {{open(InodeID)}} style API? Exposing the {{InodeId}} type to the user seemed unfortunate, but the difference in semantics is clearer to the caller. > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453828#comment-15453828 ] Sergey Shelukhin commented on HDFS-7878: One idea behind open(long/InodeId) is to be able to open files consistently; e.g. for partial caching (one needs to be sure that the cached data and the data read from FS are for the same file, guarding against overwrites). File ID is easy to propagate between different readers for this purpose, but it seems that FileStatus would be rather inconvenient. It forces the caller to get the status by name first (which also only works if the name is known; in our case we do know the name) and verify that fileId is consistent. Is it possible to keep both APIs? > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453830#comment-15453830 ] Sergey Shelukhin commented on HDFS-7878: Thanks for picking this up by the way! > API - expose an unique file identifier > -- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492596#comment-14492596 ] Colin Patrick McCabe commented on HDFS-7878: Symlinks and directories both have a unique file ID just the same as files. Maybe inodeID is a better name than fileID? Typically, you never actually get a FileInfo for a symlink itself unless you call getFileLinkInfo. If you simply call getFileInfo, you get the FileInfo for the file the symlink points to, not for the symlink itself. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490353#comment-14490353 ] Tsz Wo Nicholas Sze commented on HDFS-7878: --- ... I can add null check to get and throw UnsupportedOperationException for cleaner error ... That's a good idea. Then, we won't need hasFileId() since users can catch the UnsupportedOperationException. It is what we do for the other operations (e.g. concat, truncate in FileSystem). API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490354#comment-14490354 ] Tsz Wo Nicholas Sze commented on HDFS-7878: --- BTW, what happens if we call getFileID for directories and symlinks? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484342#comment-14484342 ] Hadoop QA commented on HDFS-7878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723710/HDFS-7878.05.patch against trunk revision d27e924. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10195//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10195//console This message is automatically generated. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382570#comment-14382570 ] Colin Patrick McCabe commented on HDFS-7878: I agree with [~jingzhao] that having a nullable field is better than having a subclass. If we start going down the subclass route we will have 2^N classes for each combination of nullable fields. Offhand, I can think of at least 5 fields which ought to be nullable (symlink, block_replication, blocksize, access_time, fileId). Keep in mind that block_replication, blocksize, and access_time are not relevant to directories, and the symlink field is only relevant to symlinks. I think it's better to just have fileId always there as a Long and set to null when it's not relevant. We can optimize the memory consumption more later with things like bitfields, cramming things into longs, etc. I also think that in hadoop 3, we should make FileStatus use protobuf to serialize itself so that we don't have these problems with adding new fields. But I think there is already a JIRA for that. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376519#comment-14376519 ] Sergey Shelukhin commented on HDFS-7878: [~cmccabe] are you ok with the latest patch? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370296#comment-14370296 ] Hadoop QA commented on HDFS-7878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705669/HDFS-7878.04.patch against trunk revision 61a4c7f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.tracing.TestTracing Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9982//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9982//console This message is automatically generated. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369753#comment-14369753 ] Jing Zhao commented on HDFS-7878: - To me to have a new class extending FileStatus is a little bit too heavy: we will have too many xxxFileStatus classes. But I do agree with Sergey that to have a null field is error prone and not clean. Besides we must make sure this new field is not serialized or deserialized in FileStatus#write or FileStatus#read, because this can cause compatibility issue easily. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369660#comment-14369660 ] Sergey Shelukhin commented on HDFS-7878: 1) Sure 2) That seems error prone; the code operating on FileStatus will have to know whether fileId is supposed to be there (if it's not, it's ok to be null, if it's supposed to be there and is null, it's a bug); serialization will lose data.Writables don't work well with optional fields... API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368360#comment-14368360 ] Colin Patrick McCabe commented on HDFS-7878: Thanks, [~sershe]. I like version 3 a lot more. [~jingzhao], did you get a chance to take a look? {code} + /** + * Opens an FSDataInputStream with the indicated file ID. + * @param fileId the file ID to open + * @param bufferSize the size of the buffer to be used. + */ + public FSDataInputStream open(long fileId, final int bufferSize) throws IOException { +return open(DFSUtil.makeInodePath(fileId), bufferSize); + } {code} Maybe we can save this one for another change? I think we have too many overloads of {{FileSystem#open}} already... there must be at least a dozen, and it's not clear to users how each of them are different, unless they check the javadoc closely. Plus they can already open {{/.reserved/.inodes/XYZ}} to get the same functionality (or even more functionality since we can specify things like the buffer size...) {code} public final FileStatus makeQualified(URI defaultUri, Path path) { -return new FileStatus(getLen(), isDir(), getReplication(), +return new DFSFileStatus(getLen(), isDir(), getReplication(), {code} I think we should just add a Long field (which can be null) to FileStatus. We don't have to (de)serialize it in FileStatus#write or FileStatus#read, since it's an optional field. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366400#comment-14366400 ] Hadoop QA commented on HDFS-7878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705180/HDFS-7878.03.patch against trunk revision 968425e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9944//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9944//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9944//console This message is automatically generated. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364104#comment-14364104 ] Sergey Shelukhin commented on HDFS-7878: Note: confusingly, HdfsFileStatus is not FileStatus, that's an internal class [~cmccabe] can you clarify what you mean by two RPCs? The API makes one RPC. Two RPCs will only happen if the user calls both get ID and get status, which is not necessary. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364389#comment-14364389 ] Colin Patrick McCabe commented on HDFS-7878: Hi [~jingzhao], I understand that the current patch only makes one RPC. My concern is that in order to get both the file ID and the other information about the file, two RPCs will be needed when using this API. It's just a bad API from an efficiency and concurrency point of view. It would be like if you needed 5 separate RPCs to get the length, access time, modification time, group, and user. There is a reason why FileStatus combines together all these fields into one class, and that same logic should lead us to including a way of getting file ID out of FileStatus. I understand that HdfsFileStatus already has {{HdfsFileStatus#getFileId}}. But this functionality should not be HDFS-specific. There are other filesystems that have file IDs. (Plus, if casting to HdfsFileStatus was adequate, there would be no need for this JIRA at all, since this cast is already possible.) Why not just add an accessor function to {{FileStatus}} with a default implementation like I suggested earlier? Adding a new function to a stable interface is a compatible change (and I note, that adding a new function to FileSystem is also adding a new function to a stable class). API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364067#comment-14364067 ] Jing Zhao commented on HDFS-7878: - bq. Colin wrote: if the client makes two different calls to getFileStatus... since we're doing 2x the RPCs to the NameNode that we need to... The current patch only makes one getFileStatus RPC. bq. Colin wrote: Then FileStatus objects returned from HDFS (and any other filesystem that has user-visible inode IDs) can return the inode ID This FileStatus object returned from HDFS is called HdfsFileStatus... bq. Colin wrote: We do 1/2 the RPCs of the current patch, put 1/2 the load on the NN, and don't open up another race condition. Again, the current patch only makes ONE single RPC, and your proposed approach is exactly what the current patch is doing except we already have HdfsFileStatus containing file ID information. Instead of changing or extending a public and stable interface like FileStatus, the easiest way is to only keep this API inside DistributedFileSystem and simply returning the file id contained inside of HdfsFileStatus. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363709#comment-14363709 ] Sergey Shelukhin commented on HDFS-7878: That is approximately what was done in the first version of the patch (see attached) and then replaced in favor of new API... can you guys ([~cmccabe] and [~jingzhao]) decide which way is better :) I can add the open-by-id method then... API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361303#comment-14361303 ] Colin Patrick McCabe commented on HDFS-7878: bq. Jing wrote: Could you please add more details here? Note that the getFileId API in the current patch only calls getFileStatus and returns the inode id field contained in the HdfsFileStatus. Or you mean the client is making both calls separately? Then why the subclass approach can solve this? My point is that if the client makes two different calls to getFileStatus, the file status could change in between. So we could end up with the ID of one file and the other details of another file. This is also inefficient, clearly, since we're doing 2x the RPCs to the NameNode that we need to. And since the NN is the hardest part of HDFS to scale (it hasn't been scaled horizontally) this is another concern. bq. If you call getFileStatus and open currently, you can have the same problem - status from one file, open from different file. Sure, and we ought to fix this too, by making it possible for the client to get {{FileStatus}} from a {{DFSInputStream}}. It would be as easy as just having a method inside DFSInputStream that called {{open(/.reserved/.inodes/inode-id-of-file)}}. bq. Sergey wrote: ID allows to overcome this by getting ID first, then using ID-based path. Of course if ID is obtained separately there's no guarantee but there's no way to overcome this. It seems like there is a very easy way to overcome this... just add an abstract function inside {{FileStatus}} that either throws {{OperationNotSupported}} or returns the inode ID. Then FileStatus objects returned from HDFS (and any other function that has user-visible inode IDs) can return the inode ID, and the default implementation can be throwing {{OperationNotSupported}}. We do 1/2 the RPCs of the current patch, put 1/2 the load on the NN, and don't open up another race condition. What do you think? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357646#comment-14357646 ] Sergey Shelukhin commented on HDFS-7878: [~cmccabe] ping? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353579#comment-14353579 ] Sergey Shelukhin commented on HDFS-7878: ping? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353616#comment-14353616 ] Jing Zhao commented on HDFS-7878: - We also need a full javadoc for the new API. other than that looks good to me. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353776#comment-14353776 ] Jing Zhao commented on HDFS-7878: - Thanks for the comments, Colin. bq. What if the file is deleted and re-created in between calling getFileStatus and calling getFileId? Could you please add more details here? Note that the {{getFileId}} API in the current patch only calls {{getFileStatus}} and returns the inode id field contained in the HdfsFileStatus. Or you mean the client is making both calls separately? Then why the subclass approach can solve this? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353755#comment-14353755 ] Colin Patrick McCabe commented on HDFS-7878: TOCTOU defined here: http://en.wikipedia.org/wiki/Time_of_check_to_time_of_use API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353753#comment-14353753 ] Colin Patrick McCabe commented on HDFS-7878: Thanks for tackling this! One big concern here, though: I'm concerned that having a separate getFileId call will create a lot of TOCTOU (time-of-check, time-of-use) race conditions. What if the file is deleted and re-created in between calling getFileStatus and calling getFileId? Then the client ends up caching the wrong file block locations (or other file data). -1 until we can figure this out. Sorry for the negativity. Why can't we just put this into the file info somewhere? I don't think the subclass approach is a bad one. To avoid casting, we could also have an accessor in the superclass that returns 0 (or throws an exception) when the ID is not available. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353801#comment-14353801 ] Sergey Shelukhin commented on HDFS-7878: in particular I wonder why getFileStatus API is privileged and has to be consistent with getFileId. If you call getFileStatus and open currently, you can have the same problem - status from one file, open from different file. ID allows to overcome this by getting ID *first*, then using ID-based path. Of course if ID is obtained separately there's no guarantee but there's no way to overcome this. I don't care either way about subclass or method approach. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353762#comment-14353762 ] Hadoop QA commented on HDFS-7878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703507/HDFS-7878.02.patch against trunk revision d6e05c5. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9803//console This message is automatically generated. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353766#comment-14353766 ] Sergey Shelukhin commented on HDFS-7878: For file opening case, cannot the file be opened by getting ID first, then using the ID-based path as indicated above? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353806#comment-14353806 ] Jing Zhao commented on HDFS-7878: - bq. ID allows to overcome this by getting ID first totally agree. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349859#comment-14349859 ] Sergey Shelukhin commented on HDFS-7878: [~jingzhao] actually getFileId already normalizes the file path (responding to gtalk discussion) So this is ready for +1 ;) API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347653#comment-14347653 ] Sergey Shelukhin commented on HDFS-7878: [~jingzhao] can you please review? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Attachments: HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347654#comment-14347654 ] Jing Zhao commented on HDFS-7878: - Instead of defining a new DFSFileStatus class, can we define a new {{getFileId}} API inside of DistributedFileSystem and use HdfsFileStatus there? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347938#comment-14347938 ] Hadoop QA commented on HDFS-7878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702631/HDFS-7878.01.patch against trunk revision c66c3ac. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9737//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9737//console This message is automatically generated. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347850#comment-14347850 ] Hadoop QA commented on HDFS-7878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702620/HDFS-7878.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9735//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9735//console This message is automatically generated. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347889#comment-14347889 ] Sergey Shelukhin commented on HDFS-7878: This QA was for old patch... API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HDFS-7878.01.patch, HDFS-7878.patch See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345880#comment-14345880 ] Jing Zhao commented on HDFS-7878: - bq. dfs.getClient().getFileInfo(path) yes, then you can use /.reserved/fileId to visit that file. Unfortunately both this API and HdfsFileStatus are now private inside of HDFS. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345824#comment-14345824 ] Sergey Shelukhin commented on HDFS-7878: HdfsFileStatus is not inherited from FileStatus. Do you mean dfs.getClient().getFileInfo(path)? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345775#comment-14345775 ] Suresh Srinivas commented on HDFS-7878: --- [~sershe], FileSystem#getFileStatus returns FileStatus. When using HDFS, this can be cast to HDFSFileStatus. This has HDFSFIleStatus#fileId. Is this what you are looking for? API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7878) API - expose an unique file identifier
[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345930#comment-14345930 ] Sergey Shelukhin commented on HDFS-7878: Can you make it public? Or better yet add API to FileSustem. API - expose an unique file identifier -- Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)