We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since
we are using hadoop-pseudo-distributed.
http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This
change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>

<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This
override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>



William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 9:54 AM, Billy Watson <williamrwat...@gmail.com>
wrote:

> I sent the same message to the hadoop mailing list b/c I'm not sure where
> the problem lies. I'm pretty sure it's the hadoop client, but the hadoop
> peeps may say it's b/c of a misconfiguration within pig, so JIC:
>
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
> line without issue. I have set some options in hadoop-env.sh to make sure
> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
> confusing, BTW and not enough searchable documentation on changes to the s3
> stuff in hadoop 2.6 IMHO).
>
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
> fail in pig, but rather fails in mapreduce with "Error:
> java.io.IOException: No FileSystem for scheme: s3n.”
>
> I have added [hadoop-install-loc]/lib and 
> [hadoop-install-loc]/share/hadoop/tools/lib/
> to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do
> this, the pig job will fail at 0% (before it ever gets to mapreduce) with a
> very similar “No fileystem for scheme s3n” error.
>
> I feel like at this point I just have to add the share/hadoop/tools/lib
> directory (and maybe lib) to the right environment variable, but I can’t
> figure out which environment variable that should be.
>
> I appreciate any help, thanks!!
>
>
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.
> lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at
> org.apache.pig.piggybank.storage.CSVExcelStorage.
> setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.
> executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
> at org.apache.pig.backend.hadoop.executionengine.
> mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1628) at org.apache.hadoop.mapred.
> YarnChild.main(YarnChild.java:158)
>
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>

Reply via email to