Sebastian Nagel created NUTCH-2281: -------------------------------------- Summary: Support non-default FileSystem Key: NUTCH-2281 URL: https://issues.apache.org/jira/browse/NUTCH-2281 Project: Nutch Issue Type: Improvement Affects Versions: 1.12 Reporter: Sebastian Nagel Fix For: 1.13
If a path (input or output) does not belong to the configured default FileSystem various Nutch tools may raise an exception like {noformat} Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://..., expected: hdfs://... {noformat} This is fixed by getting a reference to the FileSystem from the Path object {noformat} FileSystem fs = path.getFileSystem(getConf()); {noformat} instead of {noformat} FileSystem fs = FileSystem.get(getConf()); {noformat} A given path (e.g., {{s3a://...}}) may not belong to the default file system ({{hdfs://}} or {{file://}} in local mode) and simple checks such as {{fs.exists(path)}} then will fail. Cf. [FileSystem.checkPath(path)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#checkPath(org.apache.hadoop.fs.Path)], and [FileSystem.get(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(org.apache.hadoop.conf.Configuration)] vs. [FileSystem.get(URI,conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(java.net.URI,%20org.apache.hadoop.conf.Configuration)] which is called by [Path.getFileSystem(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/Path.html#getFileSystem%28org.apache.hadoop.conf.Configuration%29]. Note that the FileSystem for input and output may be different, e.g., read from HDFS and write to S3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)