On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:

> Dear all,
>    I am a little confusing about the URI, Home Directory and Working 
> Directory in the FileSystem.java or HDFS.
> 
>   I have listed my understanding about these concept, can someone please 
> figure out whether I am correct?  Thanks.
> 
>    The Home directory: This is usually a directory for a specific Hadoop 
> users. And for the path, it is a user specific path. In HDFS, it is like  
> HDFS://NameNode:port/user/USERNAME.

Correct.

>    The URI: Is this the root of the distributed filesystem. for HDFS, it is 
> just the HDFS://NameNode:port/ , each file/directory in the distributed 
> filesystem is just a file or subdirectory in this path.

Generally correct.  However, I'd strongly suggest avoiding the use of URIs 
directly.  It's better to obtain your filesystems via path.getFileSystem(conf) 
- it will extract the URI for the filesystem automatically.  See below for the 
correct definition of a Path.

>    The working directory: I am a little confused about this variable. At a 
> given time, there exists only one instance of the filesystem class, and the 
> working dir is a private state of the FS. And during the job running, hadoop 
> will switch among several dirs, and the working dir will be modified once it 
> is switched. Like in the shared system dir, home dir, or input/output dir.

Correct.

>    Although I have looked through the related document, I am still a little 
> confused about the java.net.URI,  java.io.File and org.apache.hadoop.fs.Path 
> class. It seems URI could be hdfs://XXX/XXX/FILENAME, while Path only can be 
> the path without the scheme, hostname and the port.  For the File class, it 
> is just an object for a specific file.

Your understanding of Path is incorrect.  Path is really just a veneer over a 
URI.  A Path can be qualified with a scheme/authority, or just be absolute or 
relative.  If a Path is not scheme qualified, it uses the defaultFS.  If the 
Path is not absolute, it's qualified against the working directory.  Path 
provides some niceties like not requiring percent encoding in the path portion 
of the URI, and allows use of glob chars and the quoting thereof.

I hope this helps!

Daryn

Reply via email to