Aside from your issues that you describe I would suggest moving to 0.4 or even trunk. A LOT of improvements have happened.
On Fri, Feb 4, 2011 at 3:36 PM, Quiroz Hernandez, Andres < [email protected]> wrote: > Hello, > > I have set up a release version (pre-compiled) of Mahout 0.3 on top of a > hadoop cluster with version 0.20.2 and am able to run mahout algorithms > from the command line without a problem, e.g.: > > mahout seq2sparse -i input_dir -o output_dir -wt tf -seq > > However, I tried invoking the algorithms within a java program using the > MahoutDriver class in the following way: > > MahoutDriver.main(args); > > Where args = {"seq2sparse", "-i", "input_dir", "-o", "output_dir", > "-wt", "tf", "-seq"} > > This call fails with the message: > > 11/02/04 18:20:58 ERROR driver.MahoutDriver: MahoutDriver failed with > args: [seq2sparse, -i, input_dir, -o, output_dir, -wt, tf, -seq] > > I believe that the problem is that I am not passing all of the jar > dependencies that the mahout driver class needs to run the algorithm, > and that this is taken care of by the mahout run script, but I am not > very familiar with shell scripting and cannot tell exactly how that is > taken care of. If I am correct, please let me know how I can include > those dependencies (all of which I assume are in the $MAHOUT_HOME/lib > folder), either in the arguments or otherwise. If not, please let me > know what is the correct way to start the algorithms from the code. > > I also tried using the SparseVectorsFromSequenceFiles class (or any > other algorithm driver class) directly with the corresponding arguments > except for the short name (seq2sparse), and that call fails more > explicitly with a ClassNotFoundException (which is why I concluded > dependencies are the problem). > > Thank you for your help, > > Andres >
