Chris and Sato, Thanks a bunch! I've been so swamped by these and other issues we've been having in scrambling to upgrade our cluster that I forgot to file a bug. I certainly complained aloud that the docs were insufficient, but I didn't do anything to help the community so thanks a bunch for recognizing that and helping me out!
William Watson Software Engineer (904) 705-7056 PCS On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <[email protected]> wrote: > Hi Billy, Chris, > > Let me share a couple of my findings. > > I believe this was introduced by HADOOP-10893, > which was introduced from 2.6.0(HDP2.2). > > 1. fs.s3n.impl > > > We added a property to the core-site.xml file: > > You don't need to explicitly set this. It has never been done so in > previous versions. > > Take a look at FileSystem#loadFileSystem, which is called from > FileSystem#getFileSystemClass. > Subclasses of FileSystem are loaded automatically if they are available on > a classloader you care. > > So you just need to make sure hadoop-aws.jar is on a classpath. > > For file system shell, this is done in hadoop-env.sh, > while for a MR job, in mapreduce.application.classpath, > or for YARN, in yarn.application.classpath. > > 2. mapreduce.application.classpath > > > And updated the classpath for mapreduce applications: > > Note that it points to a distributed cache on the default HDP 2.2 > distribution. > > <property> > <name>mapreduce.application.classpath</name> > > <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value> > </property> > * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains > hadoop-aws.jar(S3NFileSystem) > > While on a vanilla hadoop, it looks like standard paths as yours. > > <property> > <name>mapreduce.application.classpath</name> > > <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value> > </property> > > Thanks, > Sato > > On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <[email protected]> > wrote: > >> Hello Billy, >> >> I think your experience indicates that our documentation is >> insufficient for discussing how to configure and use the alternative file >> systems. I filed issue HADOOP-11863 to track a documentation enhancement. >> >> https://issues.apache.org/jira/browse/HADOOP-11863 >> >> Please feel free to watch that issue if you'd like to be informed as it >> makes progress. Thank you for reporting back to the thread after you had a >> solution. >> >> Chris Nauroth >> Hortonworks >> http://hortonworks.com/ >> >> >> From: Billy Watson <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Monday, April 20, 2015 at 11:14 AM >> To: "[email protected]" <[email protected]> >> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6 >> >> We found the correct configs. >> >> This post was helpful, but didn't entirely work for us out of the box >> since we are using hadoop-pseudo-distributed. >> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/ >> >> We added a property to the core-site.xml file: >> >> <property> >> <name>fs.s3n.impl</name> >> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> >> <description>Tell hadoop which class to use to access s3 URLs. This >> change became necessary in hadoop 2.6.0</description> >> </property> >> >> And updated the classpath for mapreduce applications: >> >> <property> >> <name>mapreduce.application.classpath</name> >> >> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value> >> <description>The classpath specifically for mapreduce jobs. This >> override is nec. so that s3n URLs work on hadoop 2.6.0+</description> >> </property> >> >> William Watson >> Software Engineer >> (904) 705-7056 PCS >> >> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <[email protected]> >> wrote: >> >>> Thanks, anyways. Anyone else run into this issue? >>> >>> William Watson >>> Software Engineer >>> (904) 705-7056 PCS >>> >>> On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina < >>> [email protected]> wrote: >>> >>>> Sadly I'll have to pull back I have only run a Hadoop map reduce >>>> cluster with Amazon met >>>> >>>> Sent from my iPhone >>>> >>>> On 20 Apr 2015, at 16:53, Billy Watson <[email protected]> >>>> wrote: >>>> >>>> This is an install on a CentOS 6 virtual machine used in our test >>>> environment. We use HDP in staging and production and we discovered these >>>> issues while trying to build a new cluster using HDP 2.2 which upgrades >>>> from Hadoop 2.4 to Hadoop 2.6. >>>> >>>> William Watson >>>> Software Engineer >>>> (904) 705-7056 PCS >>>> >>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina < >>>> [email protected]> wrote: >>>> >>>>> One thing I think which i most likely missed completely is are you >>>>> using an amazon EMR cluster or something in house? >>>>> >>>>> >>>>> >>>>> --- >>>>> Regards, >>>>> Jonathan Aquilina >>>>> Founder Eagle Eye T >>>>> >>>>> On 2015-04-20 16:21, Billy Watson wrote: >>>>> >>>>> I appreciate the response. These JAR files aren't 3rd party. They're >>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped >>>>> being >>>>> loaded by default and now they have to be loaded manually, if needed. >>>>> >>>>> Essentially the problem boils down to: >>>>> >>>>> - need to access s3n URLs >>>>> - cannot access without including the tools directory >>>>> - after including tools directory in HADOOP_CLASSPATH, failures start >>>>> happening later in job >>>>> - need to find right env variable (or shell script or w/e) to include >>>>> jets3t & other JARs needed to access s3n URLs (I think) >>>>> >>>>> >>>>> >>>>> William Watson >>>>> Software Engineer >>>>> (904) 705-7056 PCS >>>>> >>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina < >>>>> [email protected]> wrote: >>>>> >>>>>> you mention an environmental variable. the step before you specify >>>>>> the steps to run to get to the result. you can specify a bash script that >>>>>> will allow you to put any 3rd party jar files, for us we used esri, on >>>>>> the >>>>>> cluster and propagate them to all nodes in the cluster as well. You can >>>>>> ping me off list if you need further help. Thing is I havent used pig but >>>>>> my boss and coworker wrote the mappers and reducers. to get these jars to >>>>>> the entire cluster was a super small and simple bash script. >>>>>> >>>>>> >>>>>> >>>>>> --- >>>>>> Regards, >>>>>> Jonathan Aquilina >>>>>> Founder Eagle Eye T >>>>>> >>>>>> On 2015-04-20 15:17, Billy Watson wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the >>>>>> command line without issue. I have set some options in hadoop-env.sh to >>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was >>>>>> very confusing, BTW and not enough searchable documentation on changes to >>>>>> the s3 stuff in hadoop 2.6 IMHO). >>>>>> >>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does >>>>>> not fail in pig, but rather fails in mapreduce with "Error: >>>>>> java.io.IOException: No FileSystem for scheme: s3n." >>>>>> >>>>>> I have added [hadoop-install-loc]/lib and >>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env >>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will >>>>>> fail >>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No >>>>>> fileystem >>>>>> for scheme s3n" error. >>>>>> >>>>>> I feel like at this point I just have to add the >>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment >>>>>> variable, but I can't figure out which environment variable that should >>>>>> be. >>>>>> >>>>>> I appreciate any help, thanks!! >>>>>> >>>>>> >>>>>> Stack trace: >>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) >>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) >>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at >>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) >>>>>> at >>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at >>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at >>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at >>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) >>>>>> at >>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) >>>>>> at >>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) >>>>>> at >>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) >>>>>> at >>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) >>>>>> at >>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) >>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at >>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at >>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at >>>>>> java.security.AccessController.doPrivileged(Native Method) at >>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>>>>> >>>>>> >>>>>> — Billy Watson >>>>>> >>>>>> -- >>>>>> William Watson >>>>>> Software Engineer >>>>>> (904) 705-7056 PCS >>>>>> >>>>>> >>>> >>> >> >
