[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789795#comment-13789795
 ] 

Colin Patrick McCabe commented on HADOOP-9780:
--

I filed HADOOP-10034 to discuss optimizations for same-filesystem symlinks.  On 
this JIRA, we can make FS and FC return unresolved paths rather than resolved 
ones.  I think the consensus that we're coming to is that we would like to do 
this.

bq. Some amount of mapping configuration support needs to remain though to 
support plugging in alternative filesystem implementations that ship outside 
the Hadoop code

The HDFS NN just needs to know "is this scheme hdfs or not?"  No configuration 
is needed.  Anyway, we should probably discuss this on HADOOP-10034.

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789481#comment-13789481
 ] 

Chris Nauroth commented on HADOOP-9780:
---

bq. There is no client context-sensitivity unless different clients are using 
inconsistent DNS servers.

In the current code, there could still be an issue of client 
context-sensitivity if different clients map the schemes to entirely different 
filesystem implementations in their configuration.  Colin has suggested that we 
drop support for this kind of remapping, perhaps by freezing the definitions of 
the schemes that ship in hadoop-common, like hdfs and viewfs.  (Some amount of 
mapping configuration support needs to remain though to support plugging in 
alternative filesystem implementations that ship outside the Hadoop code.)

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789464#comment-13789464
 ] 

Sanjay Radia commented on HADOOP-9780:
--

Hit the "add" button accidentally. 
If the issue is rpc costs and SS resolution optimization then lets fix the jira 
title and comments. I suggest that we do that since the comments do not seem to 
match the title and description.

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789462#comment-13789462
 ] 

Sanjay Radia commented on HADOOP-9780:
--

The description and comments seem to be talking of two different things.
# the description says that following the symlink breaks compatibility. (I 
disagree - symlinks are suppose to be transparent for the most part).
# the comments discuss two things - RPC costs and client-context-sensitivity.
 **  If the symlink is /relative with no scheme then the NN has the option of 
resolving it to reduce the rpcs. We should do this. This was discussed during 
the symlink design and was marked as future optimization - we should have filed 
a jira for that at that time.
 ** if  the symlink is  fully-qualified with a scheme then it needs to be 
resolved on client side; an optimization can be done by the NN if the scheme 
and authority matches that of the NN. The NN cannot resolve a fiully qualified 
name that does not match its own scheme and authority due to security.

There is no client context-sensitivity unless different clients are using 
inconsistent DNS servers.

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789419#comment-13789419
 ] 

Colin Patrick McCabe commented on HADOOP-9780:
--

The whole "remapping schemes" thing seems really questionable, based on this 
and the other discussions we've had...  if there is no use case for that 
feature, I think we should drop it.

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789372#comment-13789372
 ] 

Chris Nauroth commented on HADOOP-9780:
---

bq. If the scheme of the absolute symlink is viewfs://, it's a cross-filesystem 
symlink, and we throw it back to the client. If it's hdfs://, it's ours and we 
handle it.

Is the proposal to add scheme-checking logic on the namenode side?  The 
challenge here is that the client can freely remap schemes to filesystems, so 
the hdfs scheme isn't necessarily {{org.apache.hadoop.fs.Hdfs}}, and the viewfs 
scheme isn't necessarily {{org.apache.hadoop.fs.viewfs.ViewFs}}.  Perhaps it's 
uncommon for clients to remap schemes, but it's possible.

It seems that a cross-filesystem symlink target stored with a specific scheme 
in HDFS does not have a single absolute meaning.  Instead, the meaning is 
context-sensitive depending on the client.  Client A may resolve it differently 
from client B.  Perhaps this could be handled by passing some kind of "link 
resolution context" in the RPCs, which would represent the client's view of the 
schemes?

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789284#comment-13789284
 ] 

Colin Patrick McCabe commented on HADOOP-9780:
--

bq. I think server-side resolution is problematic. How would this work with a 
chroot or viewfs?

The same way it works now.  If the scheme of the absolute symlink is 
{{viewfs://}}, it's a cross-filesystem symlink, and we throw it back to the 
client.  If it's {{hdfs://}}, it's ours and we handle it.

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788671#comment-13788671
 ] 

Andrew Wang commented on HADOOP-9780:
-

I think server-side resolution is problematic. How would this work with a 
chroot or viewfs? You don't know the mount table on the server side, so you 
might escape the chroot or resolve an absolute path that should have gone to a 
different filesystem.

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths

2013-10-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788653#comment-13788653
 ] 

Colin Patrick McCabe commented on HADOOP-9780:
--

So, the problem we have right now with returning unresolved paths right now is 
that each time the server encounters a symlink, it throws an exception, which 
triggers the client to do another RPC (well, actually two RPCs, due to an 
implementation quirk right now-- see HDFS-5293).  Returning unresolved paths 
would cause the client to keep redoing these path resolution RPCs over and 
over.  This doesn't scale-- basically it multiplies the load on the NameNode by 
at least 3x and possibly more, depending on the number of links.

To avoid this, I think we should resolve as much as possible of the symlink on 
the NameNode.  The NameNode already knows which inodes are symlinks, and it 
knows what they point to.  If what they point to is on the local NameNode 
(which should be the common case), we should just resolve it then and there and 
keep going, rather than doing the "please make another RPC to me" dance.

Obviously, this doesn't help in the case of cross-namespace symlinks.  However, 
it does help a lot in the extremely common case of links to things on the same 
NameNode.

In a way, this is similar to how {{LocalFileSystem}} already operates.  When 
you try to read a local file, it resolves as many symlinks as it can without 
throwing {{UnresolvedLinkException}}, unless a symlink is dangling.  There's no 
reason to ask the client for help if you don't need the help.

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)