You shouldn't have to modify the Hadoop environment, no. You just have to roll all the dependencies into your job jar file. You want to use Mahout's ".job" file which contains all of its dependencies. Merge it with your classes and use that.
On Mon, Feb 21, 2011 at 5:52 PM, Zhengguo 'Mike' SUN <[email protected]> wrote: > Hi Lokendra, > > The thing is that I am using a shared cluter, which I don't have control on > the environment. I can only attch the needed jars in my own jar. > > > From: Lokendra Singh <[email protected]> > To: [email protected]; Zhengguo 'Mike' SUN <[email protected]> > Sent: Monday, February 21, 2011 11:31 AM > Subject: Re: LanczosSolver and ClassNotFoundException > > Hi, > > If you are mainly facing problems with ClassNotFound in Hadoop Environment, > I would suggest you to put all the reqd (including mahout) jars in > HADOOP_CLASSPATH in '$HADOOP_HOME/conf/hadoop-env.sh'. Also, while running > the MR job, make sure that $HADOOP_HOME/conf exists in your classpath. > > Regards > Lokendra > > On Mon, Feb 21, 2011 at 9:50 PM, Zhengguo 'Mike' SUN > <[email protected]>wrote: > >> Hi All, >> >> I was playing with the LanczosSolver class in Mahout. What I did is copying >> the code in TestDistributedLanczosSolver.java and trying to run it in a >> shared cluster. I also packaged core, core-test, math, math-test, and >> mahout-collection 5 jars under the lib/ directory of my own jar. This new >> jar worked correctly on my local machine under Hadoop's local mode. When I >> submitted it to the cluster, I got ClassNotFoundException when running the >> TimesSquaredJob. The stack trace is as follow: >> >> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector >> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:247) >> at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:866) >> at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71) >> at >> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1613) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1555) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43) >> at >> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> I also wrote a simple MapReduce job to test if I can access the Vector >> class with some naive code like the following: >> >> Vector v = new DenseVector(100); >> v.assign(3.14); >> >> This job worked fine in the cluster. Thus, it seemed that it is not the >> problem to reference the Vector class. What could be wrong if it is not a >> dependence problem? >> >> >> >> > > >
