One thing I think which i most likely missed completely is are you using
an amazon EMR cluster or something in house? 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote: 

> I appreciate the response. These JAR files aren't 3rd party. They're included 
> with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by 
> default and now they have to be loaded manually, if needed. 
> 
> Essentially the problem boils down to: 
> 
> - need to access s3n URLs 
> - cannot access without including the tools directory 
> - after including tools directory in HADOOP_CLASSPATH, failures start 
> happening later in job 
> - need to find right env variable (or shell script or w/e) to include jets3t 
> & other JARs needed to access s3n URLs (I think) 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS 
> 
> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <[email protected]> 
> wrote:
> 
> you mention an environmental variable. the step before you specify the steps 
> to run to get to the result. you can specify a bash script that will allow 
> you to put any 3rd party jar files, for us we used esri, on the cluster and 
> propagate them to all nodes in the cluster as well. You can ping me off list 
> if you need further help. Thing is I havent used pig but my boss and coworker 
> wrote the mappers and reducers. to get these jars to the entire cluster was a 
> super small and simple bash script. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-04-20 15:17, Billy Watson wrote: 
> 
> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line 
> without issue. I have set some options in hadoop-env.sh to make sure all the 
> S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW 
> and not enough searchable documentation on changes to the s3 stuff in hadoop 
> 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not 
> fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: 
> No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and 
> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env 
> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail 
> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem 
> for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib 
> directory (and maybe lib) to the right environment variable, but I can't 
> figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at 
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at 
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at 
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at 
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>  at 
> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>  at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
 at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:415) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 [1] PCS
 

Links:
------
[1] tel:%28904%29%20705-7056

Reply via email to