Hi, I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n.” I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar “No fileystem for scheme s3n” error. I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can’t figure out which environment variable that should be. I appreciate any help, thanks!! Stack trace: org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) — Billy Watson -- William Watson Software Engineer (904) 705-7056 PCS
