Thanks, anyways. Anyone else run into this issue? William Watson Software Engineer (904) 705-7056 PCS
On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <[email protected] > wrote: > Sadly I'll have to pull back I have only run a Hadoop map reduce cluster > with Amazon met > > Sent from my iPhone > > On 20 Apr 2015, at 16:53, Billy Watson <[email protected]> wrote: > > This is an install on a CentOS 6 virtual machine used in our test > environment. We use HDP in staging and production and we discovered these > issues while trying to build a new cluster using HDP 2.2 which upgrades > from Hadoop 2.4 to Hadoop 2.6. > > William Watson > Software Engineer > (904) 705-7056 PCS > > On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina < > [email protected]> wrote: > >> One thing I think which i most likely missed completely is are you >> using an amazon EMR cluster or something in house? >> >> >> >> --- >> Regards, >> Jonathan Aquilina >> Founder Eagle Eye T >> >> On 2015-04-20 16:21, Billy Watson wrote: >> >> I appreciate the response. These JAR files aren't 3rd party. They're >> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being >> loaded by default and now they have to be loaded manually, if needed. >> >> Essentially the problem boils down to: >> >> - need to access s3n URLs >> - cannot access without including the tools directory >> - after including tools directory in HADOOP_CLASSPATH, failures start >> happening later in job >> - need to find right env variable (or shell script or w/e) to include >> jets3t & other JARs needed to access s3n URLs (I think) >> >> >> >> William Watson >> Software Engineer >> (904) 705-7056 PCS >> >> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina < >> [email protected]> wrote: >> >>> you mention an environmental variable. the step before you specify the >>> steps to run to get to the result. you can specify a bash script that will >>> allow you to put any 3rd party jar files, for us we used esri, on the >>> cluster and propagate them to all nodes in the cluster as well. You can >>> ping me off list if you need further help. Thing is I havent used pig but >>> my boss and coworker wrote the mappers and reducers. to get these jars to >>> the entire cluster was a super small and simple bash script. >>> >>> >>> >>> --- >>> Regards, >>> Jonathan Aquilina >>> Founder Eagle Eye T >>> >>> On 2015-04-20 15:17, Billy Watson wrote: >>> >>> Hi, >>> >>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command >>> line without issue. I have set some options in hadoop-env.sh to make sure >>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very >>> confusing, BTW and not enough searchable documentation on changes to the s3 >>> stuff in hadoop 2.6 IMHO). >>> >>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does >>> not fail in pig, but rather fails in mapreduce with "Error: >>> java.io.IOException: No FileSystem for scheme: s3n." >>> >>> I have added [hadoop-install-loc]/lib and >>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env >>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail >>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem >>> for scheme s3n" error. >>> >>> I feel like at this point I just have to add the share/hadoop/tools/lib >>> directory (and maybe lib) to the right environment variable, but I can't >>> figure out which environment variable that should be. >>> >>> I appreciate any help, thanks!! >>> >>> >>> Stack trace: >>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) >>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) >>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at >>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at >>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at >>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at >>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at >>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) >>> at >>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) >>> at >>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) >>> at >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) >>> at >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) >>> at >>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at >>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at >>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at >>> java.security.AccessController.doPrivileged(Native Method) at >>> javax.security.auth.Subject.doAs(Subject.java:415) at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>> >>> >>> — Billy Watson >>> >>> -- >>> William Watson >>> Software Engineer >>> (904) 705-7056 PCS >>> >>> >
