I haven't been actively running Mahout for a while, but I do watch plenty of Hadoop students run into the ClassNotFoundException problem.
A standard Hadoop job jar has a lib subdir, which contains (as jars) all of the dependencies. Typically the missing class problem is caused by somebody building their own Hadoop job jar, where they don't include a dependent jar (such as mahout-math) in the lib subdir. Or somebody is trying to run a job locally, using the job jar directly, which then has to be unpacked as otherwise these embedded lib/*.jar classes aren't on the classpath. But neither of those seem to match what Jake was doing: > (just running things like "./bin/mahout svd -i <input> -o <output> etc... ") I was going to try this out from trunk, but an svn up on trunk and then "mvn install" failed to pass one of the tests: > Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.025 sec <<< > FAILURE! > fullRankTall(org.apache.mahout.math.QRDecompositionTest) Time elapsed: 0.014 > sec <<< ERROR! > java.lang.NoSuchFieldError: MAX > at > org.apache.mahout.math.QRDecompositionTest.assertEquals(QRDecompositionTest.java:122) > at > org.apache.mahout.math.QRDecompositionTest.fullRankTall(QRDecompositionTest.java:38) -- Ken On May 8, 2011, at 3:44pm, Sean Owen wrote: > It definitely works for me to package into one class. Is this merely > "icky" or does it not work for another reason? > Yes I'm not suggesting we make users tweak the Maven build, but that > we make this tweak ourselves. It's just removing the overriding of > "unpack" behavior in job.xml files that I mean. > > On Sun, May 8, 2011 at 11:36 PM, Benson Margulies <[email protected]> > wrote: >> There isn't a good solution for 0.5. >> >> The code that calls setJarByClass has to pass a class that is NOT in >> the lib directory, but rather in the unpacked classes. It's really >> easy to build a hadoop job with Mahout that violates that rule due to >> all the static methods that create jobs. >> >> We seem to have a consensus to rework all the jobs as beans so that >> this can be wrestled into control. >> >> >> >> On Sun, May 8, 2011 at 6:16 PM, Jake Mannix <[email protected]> wrote: >>> On Sun, May 8, 2011 at 2:58 PM, Sean Owen <[email protected]> wrote: >>> >>>> If I recall the last discussion on this correctly -- >>>> >>>> No you don't want to put anything in Hadoop's lib/ directory. Even if >>>> you can, that's not the "right" way. >>>> You want to use the job file indeed, which should contain all dependencies. >>>> However, it packages dependencies as jars-in-the-jar, which doesn't >>>> work for Hadoop. >>>> >>> >>> I thought that hadoop was totally fine with jars inside of the jar, if >>> they're >>> in the lib directory? >>> >>> >>>> I think if you modify the Maven build to just repackage all classes >>>> into the main jar, it works. It works for me at least. >>>> >>> >>> Clearly we're not expecting people to do this. I wasn't even running with >>> special new classes, it wasn't finding *Vector* - if this doesn't work on >>> a real cluster, then most of our entire codebase (which requires >>> mahout-math) doesn't work. >>> >>> -jake >>> >> -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
