[jira] [Commented] (HDFS-6045) A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components.

2014-03-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919196#comment-13919196
 ] 

Steve Loughran commented on HDFS-6045:
--

Can hit some scale limits very fast if a directory is both deep  wide -and if 
the scan is done in a sync block in HDFS, the cost of the scan is visible to 
all.

Oddly enough, object stores may handle this better than inode filesystems, as 
they effectively do deep scans of a simulated hierarchical filesystem -though 
there's usually a limit on the #of entries returned, so this operation would 
hit multiple HTTP round trips

 A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all 
 path components.
 --

 Key: HDFS-6045
 URL: https://issues.apache.org/jira/browse/HDFS-6045
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode
Reporter: Gera Shegalov
Assignee: Gera Shegalov

 This comes up in YARN-1771/MAPREDUCE-4907 on the server/client side of PUBLIC 
 Distributed Cache. The deeper the path the more beneficial is the feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6045) A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components.

2014-03-03 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918688#comment-13918688
 ] 

Gera Shegalov commented on HDFS-6045:
-

bq. [~chris.douglas] mentioned on YARN-1771 Symlinks might be awkward to 
support, but that discussion is for a separate ticket. Do you have a JIRA ref?.

For simplicity, I think the semantics should return array corresponding to a 
fully resolved path that does not contain any symlinks.

If f=/a/b/c/d/e/f and b is a symlink b-/tmp/x and e is symlink e-y 

then the the returned array will correspond to /tmp/x/c/d/y/f

 A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all 
 path components.
 --

 Key: HDFS-6045
 URL: https://issues.apache.org/jira/browse/HDFS-6045
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode
Reporter: Gera Shegalov

 This comes up in YARN-1771/MAPREDUCE-4907 on the server/client side of PUBLIC 
 Distributed Cache. The deeper the path the more beneficial is the feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6045) A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components.

2014-03-03 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918880#comment-13918880
 ] 

Andrew Wang commented on HDFS-6045:
---

Hi Gera,

Right now there's no easy way to resolve symlinks on the server side, since the 
FileSystem mount table is a client-side concept, and symlinks can span multiple 
filesystems. Symlink resolution right now requires an RPC per link in the path. 
There are still unresolved issues related to HDFS symlinks, which is why we've 
disabled them for the time being.

Anyway, symlink performance issues aside, I assume the goal here is a 
{{FileStatus[] getFileStatusComponents}} that, given {{/a/b}}, returns back 
FileStatus for /, /a, /a/b, etc? This sounds okay to me if you want to 
implement it.

 A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all 
 path components.
 --

 Key: HDFS-6045
 URL: https://issues.apache.org/jira/browse/HDFS-6045
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode
Reporter: Gera Shegalov

 This comes up in YARN-1771/MAPREDUCE-4907 on the server/client side of PUBLIC 
 Distributed Cache. The deeper the path the more beneficial is the feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6045) A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all path components.

2014-03-03 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918900#comment-13918900
 ] 

Gera Shegalov commented on HDFS-6045:
-

Hi Andrew, yes I missed the fact that symlinks can target absolute URI to 
different filesystems. You captured the goal for this JIRA correctly.



 A single RPC API: FileStatus[] getFileStatus(Path f) to get status of all 
 path components.
 --

 Key: HDFS-6045
 URL: https://issues.apache.org/jira/browse/HDFS-6045
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode
Reporter: Gera Shegalov

 This comes up in YARN-1771/MAPREDUCE-4907 on the server/client side of PUBLIC 
 Distributed Cache. The deeper the path the more beneficial is the feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)