Ok for what it's worth -1 on shade plugin. Lack of dependencies in present doesn't make it a good practice IMO.
It cost equally the same to manage backend classpath "correctly" (i.e. compatible way with all future dependencies) and in "incompatible" way if we implement it now. It may not be so easy in the future if we lock ourselves in doing that in particular way. I don't see any drawback (including "user support emails") with Mahout's driver code (aka AbstractJob) doing it on its own vs. handling it with shade. But there's something to gain. Such as compliance with java jar spec. On Mon, May 9, 2011 at 10:42 AM, Benson Margulies <[email protected]> wrote: > Paul, > > The usual maven tool for the purpose is the shade plugin. As > previously noted, there are some possible pitfalls, but there's > evidence (including yours) that they are not relevant to Mahout at the > moment. > > --benson > > > On Mon, May 9, 2011 at 1:37 PM, Paul Mahon <[email protected]> wrote: >> I use the one big jar technique for regular hadoop and mahout jobs because >> of these kinds of problems. I use the jarjar task with ant, I expect Maven >> has something similar. I haven't had any of the class not found problems >> since I started doing it. >> >> On 05/09/2011 10:32 AM, Benson Margulies wrote: >>>> >>>> So that explains how some user rebundlings don't work with us, sometimes. >>>> What it doesn't explain is why running the regular, not-rebundled >>>> "mahout-examples-0.5-SNAPSHOT-job.jar" via the bin/mahout shell >>>> script is throwing this ClassNotFoundException for me (and it's happened >>>> to Sean, and according to the list archives, others as well) in a >>>> production >>>> cluster. >>> >>> I agree that it doesn't explain. However, the code in hadoop that >>> implements this mechanism, well, if you ask me ... it STINKS. It >>> wouldn't surprise me if it fails in some case we haven't >>> characterized. This would argue for Sean's 'one big jar' approach. >> >
