But the mising class org.apache.mahout.math.Vector is in the mahout-math jar and I have already packaged it under /lib of my own jar. Also, my little experiemnt showed that there is no problem to access the Vector class in Mappers. Thus, I tend to think this may not be a dependency problem. The exception was thrown when Hadoop tried to read the SequenceFile that has the Vector class as the value. Is there any difference between accessing a class in the Mapper (my little experiment) and accessing a class as the value of a SequenceFile in RecordReader?
From: Sean Owen <[email protected]> To: [email protected]; Zhengguo 'Mike' SUN <[email protected]> Sent: Monday, February 21, 2011 1:00 PM Subject: Re: LanczosSolver and ClassNotFoundException You shouldn't have to modify the Hadoop environment, no. You just have to roll all the dependencies into your job jar file. You want to use Mahout's ".job" file which contains all of its dependencies. Merge it with your classes and use that. On Mon, Feb 21, 2011 at 5:52 PM, Zhengguo 'Mike' SUN <[email protected]> wrote: > Hi Lokendra, > > The thing is that I am using a shared cluter, which I don't have control on > the environment. I can only attch the needed jars in my own jar. > > > From: Lokendra Singh <[email protected]> > To: [email protected]; Zhengguo 'Mike' SUN <[email protected]> > Sent: Monday, February 21, 2011 11:31 AM > Subject: Re: LanczosSolver and ClassNotFoundException > > Hi, > > If you are mainly facing problems with ClassNotFound in Hadoop Environment, > I would suggest you to put all the reqd (including mahout) jars in > HADOOP_CLASSPATH in '$HADOOP_HOME/conf/hadoop-env.sh'. Also, while running > the MR job, make sure that $HADOOP_HOME/conf exists in your classpath. > > Regards > Lokendra > > On Mon, Feb 21, 2011 at 9:50 PM, Zhengguo 'Mike' SUN > <[email protected]>wrote: > >> Hi All, >> >> I was playing with the LanczosSolver class in Mahout. What I did is copying >> the code in TestDistributedLanczosSolver.java and trying to run it in a >> shared cluster. I also packaged core, core-test, math, math-test, and >> mahout-collection 5 jars under the lib/ directory of my own jar. This new >> jar worked correctly on my local machine under Hadoop's local mode. When I >> submitted it to the cluster, I got ClassNotFoundException when running the >> TimesSquaredJob. The stack trace is as follow: >> >> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector >> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:247) >> at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:866) >> at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71) >> at >> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1613) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1555) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43) >> at >> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> I also wrote a simple MapReduce job to test if I can access the Vector >> class with some naive code like the following: >> >> Vector v = new DenseVector(100); >> v.assign(3.14); >> >> This job worked fine in the cluster. Thus, it seemed that it is not the >> problem to reference the Vector class. What could be wrong if it is not a >> dependence problem? >> >> >> >> > > >
