Dear Daryn Sharp, Your reply helps me a lot for code reading of the HDFS and FileSystem interface.
Thanks. yours, Ling Kun On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp <da...@yahoo-inc.com> wrote: > On Apr 11, 2013, at 5:33 AM, Ling Kun wrote: > > > Dear all, > > I am a little confusing about the URI, Home Directory and Working > Directory in the FileSystem.java or HDFS. > > > > I have listed my understanding about these concept, can someone please > figure out whether I am correct? Thanks. > > > > The Home directory: This is usually a directory for a specific Hadoop > users. And for the path, it is a user specific path. In HDFS, it is like > HDFS://NameNode:port/user/USERNAME. > > Correct. > > > The URI: Is this the root of the distributed filesystem. for HDFS, it > is just the HDFS://NameNode:port/ , each file/directory in the distributed > filesystem is just a file or subdirectory in this path. > > Generally correct. However, I'd strongly suggest avoiding the use of URIs > directly. It's better to obtain your filesystems via > path.getFileSystem(conf) - it will extract the URI for the filesystem > automatically. See below for the correct definition of a Path. > > > The working directory: I am a little confused about this variable. At > a given time, there exists only one instance of the filesystem class, and > the working dir is a private state of the FS. And during the job running, > hadoop will switch among several dirs, and the working dir will be modified > once it is switched. Like in the shared system dir, home dir, or > input/output dir. > > Correct. > > > Although I have looked through the related document, I am still a > little confused about the java.net.URI, java.io.File and > org.apache.hadoop.fs.Path class. It seems URI could be > hdfs://XXX/XXX/FILENAME, while Path only can be the path without the > scheme, hostname and the port. For the File class, it is just an object > for a specific file. > > Your understanding of Path is incorrect. Path is really just a veneer > over a URI. A Path can be qualified with a scheme/authority, or just be > absolute or relative. If a Path is not scheme qualified, it uses the > defaultFS. If the Path is not absolute, it's qualified against the working > directory. Path provides some niceties like not requiring percent encoding > in the path portion of the URI, and allows use of glob chars and the > quoting thereof. > > I hope this helps! > > Daryn -- http://www.lingcc.com