One thing I think which i most likely missed completely is are you using an amazon EMR cluster or something in house?
--- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-04-20 16:21, Billy Watson wrote: > I appreciate the response. These JAR files aren't 3rd party. They're included > with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by > default and now they have to be loaded manually, if needed. > > Essentially the problem boils down to: > > - need to access s3n URLs > - cannot access without including the tools directory > - after including tools directory in HADOOP_CLASSPATH, failures start > happening later in job > - need to find right env variable (or shell script or w/e) to include jets3t > & other JARs needed to access s3n URLs (I think) > > William Watson > Software Engineer > (904) 705-7056 PCS > > On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <[email protected]> > wrote: > > you mention an environmental variable. the step before you specify the steps > to run to get to the result. you can specify a bash script that will allow > you to put any 3rd party jar files, for us we used esri, on the cluster and > propagate them to all nodes in the cluster as well. You can ping me off list > if you need further help. Thing is I havent used pig but my boss and coworker > wrote the mappers and reducers. to get these jars to the entire cluster was a > super small and simple bash script. > > --- > Regards, > Jonathan Aquilina > Founder Eagle Eye T > > On 2015-04-20 15:17, Billy Watson wrote: > > Hi, > > I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line > without issue. I have set some options in hadoop-env.sh to make sure all the > S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW > and not enough searchable documentation on changes to the s3 stuff in hadoop > 2.6 IMHO). > > Anyways, when I run a pig job which accesses s3, it gets to 16%, does not > fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: > No FileSystem for scheme: s3n." > > I have added [hadoop-install-loc]/lib and > [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env > variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail > at 0% (before it ever gets to mapreduce) with a very similar "No fileystem > for scheme s3n" error. > > I feel like at this point I just have to add the share/hadoop/tools/lib > directory (and maybe lib) to the right environment variable, but I can't > figure out which environment variable that should be. > > I appreciate any help, thanks!! > > Stack trace: > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at > org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at > org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) > at > org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > > -- Billy Watson > > -- > > William Watson > Software Engineer > (904) 705-7056 [1] PCS Links: ------ [1] tel:%28904%29%20705-7056
