Re: Hadoop streaming cacheArchive
Amareshwari, thanks for your help. This turned out to be user error (when packaging my JAR, I inadvertently included a lib directory, so the libraries actually existed in HDFS as ./lib/lib/perl..., when I was only expecting ./lib/perl... Thanks again, Norbert On Thu, Mar 20, 2008 at 3:03 AM, Amareshwari Sriramadasu < [EMAIL PROTECTED]> wrote: > Norbert Burger wrote: > > I'm trying to use the cacheArchive command-line options with the > > hadoop-0.15.3-streaming.jar. I'm using the option as follows: > > > > -cacheArchive hdfs://host:50001/user/root/lib.jar#lib > > > > Unfortunately, my PERL scripts fail with an error consistent with not > being > > able to find the 'lib' directory (which, as I understand, should point > back > > to an extracted version of the lib.jar). > > > > > Here, lib is created as a symlink in task's working directory. It will > have the jar file and extracted version of jar file. > Where are your PERL scripts searching for the lib? Is '.' included in > your classpath. > Otherwise you can use "mapred.job.classpath.archives" config item, this > adds the files to the classpath and also to the distributed cache > you can use > -jobconf > "mapred.job.classpath.archives=hdfs://host:50001/user/root/lib.jar#lib" > > I know that the original JAR exists in HDFS, but I don't see any > evidence of > > lib.jar or a link called 'lib' inside my job.jar. > link 'lib' will not be part of job.jar, but it will be distributed on > all the nodes during task launch and task's current working directory > will have the link 'lib' to the jar on cache. > > How can I troubleshoot > > cacheArchive further? Should the files/dirs specified via cacheArchive > be > > contained inside the job.jar? If not, where should they be in HDFS? > > > > > They can be anywhere on HDFS. You need give the complete path to add it > to the cache. > > Thanks for any help. > > > > Norbert > > > > > >
Re: Hadoop streaming cacheArchive
Norbert Burger wrote: I'm trying to use the cacheArchive command-line options with the hadoop-0.15.3-streaming.jar. I'm using the option as follows: -cacheArchive hdfs://host:50001/user/root/lib.jar#lib Unfortunately, my PERL scripts fail with an error consistent with not being able to find the 'lib' directory (which, as I understand, should point back to an extracted version of the lib.jar). Here, lib is created as a symlink in task's working directory. It will have the jar file and extracted version of jar file. Where are your PERL scripts searching for the lib? Is '.' included in your classpath. Otherwise you can use "mapred.job.classpath.archives" config item, this adds the files to the classpath and also to the distributed cache you can use -jobconf "mapred.job.classpath.archives=hdfs://host:50001/user/root/lib.jar#lib" I know that the original JAR exists in HDFS, but I don't see any evidence of lib.jar or a link called 'lib' inside my job.jar. link 'lib' will not be part of job.jar, but it will be distributed on all the nodes during task launch and task's current working directory will have the link 'lib' to the jar on cache. How can I troubleshoot cacheArchive further? Should the files/dirs specified via cacheArchive be contained inside the job.jar? If not, where should they be in HDFS? They can be anywhere on HDFS. You need give the complete path to add it to the cache. Thanks for any help. Norbert
Hadoop streaming cacheArchive
I'm trying to use the cacheArchive command-line options with the hadoop-0.15.3-streaming.jar. I'm using the option as follows: -cacheArchive hdfs://host:50001/user/root/lib.jar#lib Unfortunately, my PERL scripts fail with an error consistent with not being able to find the 'lib' directory (which, as I understand, should point back to an extracted version of the lib.jar). I know that the original JAR exists in HDFS, but I don't see any evidence of lib.jar or a link called 'lib' inside my job.jar. How can I troubleshoot cacheArchive further? Should the files/dirs specified via cacheArchive be contained inside the job.jar? If not, where should they be in HDFS? Thanks for any help. Norbert