I think we are certainly broken for backend use of Mahout (e.g. for stuff like lucene analyzer strategies) but FWIW last time i tried to run SSVD code it worked and it does use math stuff and it also does setJarByClass.
Unfortunately, i can't run much else at the moment. On Mon, May 9, 2011 at 1:20 PM, Jake Mannix <[email protected]> wrote: > On Mon, May 9, 2011 at 1:09 PM, Benson Margulies <[email protected]>wrote: > >> Once more from the top. >> >> There is a hadoop convention. Is has nothing to do with the >> MANIFEST.MF as I read the code. >> > > Ah, sorry, that was something we do with these lib/-ified jars here at > work (it's pretty common practice to do this, it's too bad it's not a > java-supported spec). > > >> I'm not an evangelist for the maven-shade-plugin, but my very >> unscientific impression is that people walk up to mahout and expect >> the mahout command to just 'work'. Unless someone can unveil a way to >> script the exploitation of the distributed cache, that means that the >> jar file that the mahout command hands to the hadoop command has to >> use the 'lib/' convention, and have the correct structure of raw and >> lib-ed classes. >> > > Totally agree, if it works. > > >> Further, any unsophisticated user who goes to incorporate Mahout into >> a larger structure has to do likewise. >> > > Well, users who want to incorporate mahout into a larger structure > will have their own build system to interact with, and will need > to be instructed to take our individual jars and package them > up properly, no? > > >> We could avoid exciting uses of the shade plugin altogether if we >> didn't have these static methods that initialize jobs and call >> setJarByClass on themselves. However, I don't see that for 0.5 unless >> we want to push the schedule back and make a concerted effort. >> >> Further, I am concerned, based on Jake's remarks, that even following >> the hadoop lib/ convention correctly doesn't always work, and we have >> no diagnostic insight into the nature of the failure. >> > > Can someone please try out our current code on another real cluster, > so we have another data point? My worry is that even without > this setJarByClass business, we're not working properly. If we are, > I'm fine fixing this classpath stuff in 0.6 > > If we're broken now, it needs fixing, asap. > > -jake >
