It definitely works for me to package into one class. Is this merely "icky" or does it not work for another reason? Yes I'm not suggesting we make users tweak the Maven build, but that we make this tweak ourselves. It's just removing the overriding of "unpack" behavior in job.xml files that I mean.
On Sun, May 8, 2011 at 11:36 PM, Benson Margulies <[email protected]> wrote: > There isn't a good solution for 0.5. > > The code that calls setJarByClass has to pass a class that is NOT in > the lib directory, but rather in the unpacked classes. It's really > easy to build a hadoop job with Mahout that violates that rule due to > all the static methods that create jobs. > > We seem to have a consensus to rework all the jobs as beans so that > this can be wrestled into control. > > > > On Sun, May 8, 2011 at 6:16 PM, Jake Mannix <[email protected]> wrote: >> On Sun, May 8, 2011 at 2:58 PM, Sean Owen <[email protected]> wrote: >> >>> If I recall the last discussion on this correctly -- >>> >>> No you don't want to put anything in Hadoop's lib/ directory. Even if >>> you can, that's not the "right" way. >>> You want to use the job file indeed, which should contain all dependencies. >>> However, it packages dependencies as jars-in-the-jar, which doesn't >>> work for Hadoop. >>> >> >> I thought that hadoop was totally fine with jars inside of the jar, if >> they're >> in the lib directory? >> >> >>> I think if you modify the Maven build to just repackage all classes >>> into the main jar, it works. It works for me at least. >>> >> >> Clearly we're not expecting people to do this. I wasn't even running with >> special new classes, it wasn't finding *Vector* - if this doesn't work on >> a real cluster, then most of our entire codebase (which requires >> mahout-math) doesn't work. >> >> -jake >> >
