>
> I'm not an evangelist for the maven-shade-plugin, but my very
> unscientific impression is that people walk up to mahout and expect
> the mahout command to just 'work'. Unless someone can unveil a way to
> script the exploitation of the distributed cache, that means that the
> jar file that the mahout command hands to the hadoop command has to
> use the 'lib/' convention, and have the correct structure of raw and
> lib-ed classes.

here is what i think :

We require to setup MAHOUT_HOME. well, most hadoop project require
something of the sort.

then AbstractJob implements walking the lib tree and adding those
paths (based on MAHOUT_HOME
or otherwise derived knowledge of lib location) and throws all the
jars there into backend path. all mahout projects
do something similar. Where's the complexity in that?

>
> Further, any unsophisticated user who goes to incorporate Mahout into
> a larger structure has to do likewise.

Yes.

There are two issues here;
1) client side api use.

That should be fine as long as MAHOUT_HOME points to the right place.
since user is not involved in writing driver code, we are golden.

2) backend side use of mahout? Not terribly expected, but maybe. E.g.
if mahout allows to specify external
strategies to do 'stuff' , such as external lucene analyzer in the
seq2sparse, yes.

In this case, well, we need to figure how to handle this ad-hoc thru
command line.
Let's look how other projects deal with the problem? Oh yes, they all
implement their own custom
mechanisms for these cases too!

Such as :

-- pig uses custom command register(jar)
-- hive has auxlib folder in HIVE_HOME where it expects find user jars!

Something similar should be good for us as a part of ecosystem, should it not?

Reply via email to