We found the correct configs. This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed. http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
We added a property to the core-site.xml file: <property> <name>fs.s3n.impl</name> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> <description>Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0</description> </property> And updated the classpath for mapreduce applications: <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value> <description>The classpath specifically for mapreduce jobs. This override is nec. so that s3n URLs work on hadoop 2.6.0+</description> </property> William Watson Software Engineer (904) 705-7056 PCS On Mon, Apr 20, 2015 at 9:54 AM, Billy Watson <williamrwat...@gmail.com> wrote: > I sent the same message to the hadoop mailing list b/c I'm not sure where > the problem lies. I'm pretty sure it's the hadoop client, but the hadoop > peeps may say it's b/c of a misconfiguration within pig, so JIC: > > I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command > line without issue. I have set some options in hadoop-env.sh to make sure > all the S3 stuff for hadoop 2.6 is set up correctly. (This was very > confusing, BTW and not enough searchable documentation on changes to the s3 > stuff in hadoop 2.6 IMHO). > > Anyways, when I run a pig job which accesses s3, it gets to 16%, does not > fail in pig, but rather fails in mapreduce with "Error: > java.io.IOException: No FileSystem for scheme: s3n.” > > I have added [hadoop-install-loc]/lib and > [hadoop-install-loc]/share/hadoop/tools/lib/ > to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do > this, the pig job will fail at 0% (before it ever gets to mapreduce) with a > very similar “No fileystem for scheme s3n” error. > > I feel like at this point I just have to add the share/hadoop/tools/lib > directory (and maybe lib) to the right environment variable, but I can’t > figure out which environment variable that should be. > > I appreciate any help, thanks!! > > > Stack trace: > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at > org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat. > setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce. > lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at > org.apache.pig.piggybank.storage.CSVExcelStorage. > setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop. > executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) > at org.apache.pig.backend.hadoop.executionengine. > mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1628) at org.apache.hadoop.mapred. > YarnChild.main(YarnChild.java:158) > > > William Watson > Software Engineer > (904) 705-7056 PCS >