[jira] [Commented] (HDFS-6045) A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components.
[ https://issues.apache.org/jira/browse/HDFS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919196#comment-13919196 ] Steve Loughran commented on HDFS-6045: -- Can hit some scale limits very fast if a directory is both deep wide -and if the scan is done in a sync block in HDFS, the cost of the scan is visible to all. Oddly enough, object stores may handle this better than inode filesystems, as they effectively do deep scans of a simulated hierarchical filesystem -though there's usually a limit on the #of entries returned, so this operation would hit multiple HTTP round trips A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components. -- Key: HDFS-6045 URL: https://issues.apache.org/jira/browse/HDFS-6045 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode Reporter: Gera Shegalov Assignee: Gera Shegalov This comes up in YARN-1771/MAPREDUCE-4907 on the server/client side of PUBLIC Distributed Cache. The deeper the path the more beneficial is the feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6045) A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components.
[ https://issues.apache.org/jira/browse/HDFS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918688#comment-13918688 ] Gera Shegalov commented on HDFS-6045: - bq. [~chris.douglas] mentioned on YARN-1771 Symlinks might be awkward to support, but that discussion is for a separate ticket. Do you have a JIRA ref?. For simplicity, I think the semantics should return array corresponding to a fully resolved path that does not contain any symlinks. If f=/a/b/c/d/e/f and b is a symlink b-/tmp/x and e is symlink e-y then the the returned array will correspond to /tmp/x/c/d/y/f A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components. -- Key: HDFS-6045 URL: https://issues.apache.org/jira/browse/HDFS-6045 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode Reporter: Gera Shegalov This comes up in YARN-1771/MAPREDUCE-4907 on the server/client side of PUBLIC Distributed Cache. The deeper the path the more beneficial is the feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6045) A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components.
[ https://issues.apache.org/jira/browse/HDFS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918880#comment-13918880 ] Andrew Wang commented on HDFS-6045: --- Hi Gera, Right now there's no easy way to resolve symlinks on the server side, since the FileSystem mount table is a client-side concept, and symlinks can span multiple filesystems. Symlink resolution right now requires an RPC per link in the path. There are still unresolved issues related to HDFS symlinks, which is why we've disabled them for the time being. Anyway, symlink performance issues aside, I assume the goal here is a {{FileStatus[] getFileStatusComponents}} that, given {{/a/b}}, returns back FileStatus for /, /a, /a/b, etc? This sounds okay to me if you want to implement it. A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components. -- Key: HDFS-6045 URL: https://issues.apache.org/jira/browse/HDFS-6045 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode Reporter: Gera Shegalov This comes up in YARN-1771/MAPREDUCE-4907 on the server/client side of PUBLIC Distributed Cache. The deeper the path the more beneficial is the feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6045) A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components.
[ https://issues.apache.org/jira/browse/HDFS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918900#comment-13918900 ] Gera Shegalov commented on HDFS-6045: - Hi Andrew, yes I missed the fact that symlinks can target absolute URI to different filesystems. You captured the goal for this JIRA correctly. A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components. -- Key: HDFS-6045 URL: https://issues.apache.org/jira/browse/HDFS-6045 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode Reporter: Gera Shegalov This comes up in YARN-1771/MAPREDUCE-4907 on the server/client side of PUBLIC Distributed Cache. The deeper the path the more beneficial is the feature. -- This message was sent by Atlassian JIRA (v6.2#6252)