Just ran seq2sparse on a clean checkout of trunk with a cluster started by Whirr. This works without problems.
frank@franktop:~/Desktop/mahout$ bin/mahout seq2sparse --input target/posts --output target/seq2sparse --weight tfidf --namedVector Running on hadoop, using HADOOP_HOME=/usr/local/hadoop HADOOP_CONF_DIR=/home/frank/.whirr/frank-cluster/ 11/05/11 17:57:17 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/05/11 17:57:18 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1 11/05/11 17:57:18 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0 11/05/11 17:57:18 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1 11/05/11 17:57:19 INFO common.HadoopUtil: Deleting target/seq2sparse 11/05/11 17:58:42 INFO input.FileInputFormat: Total input paths to process : 1 11/05/11 17:58:45 INFO mapred.JobClient: Running job: job_201105111409_0009 11/05/11 17:58:46 INFO mapred.JobClient: map 0% reduce 0% 11/05/11 17:59:00 INFO mapred.JobClient: map 100% reduce 0% Frank On Tue, May 10, 2011 at 5:34 PM, Jake Mannix <[email protected]> wrote: > On Tue, May 10, 2011 at 8:24 AM, Sean Owen <[email protected]> wrote: > >> I peeked in the examples job jar and it definitely does have this class, >> along with the other dependencies (after my patch). Double-check that >> you've >> done the clean build an "install" again? and maybe even print out >> MAHOUT_JOB >> in the script to double-check what it is using? >> > > [jake@smf1-ady-15-sr1 bla]$ jar -tf mahout-examples-0.5-SNAPSHOT-job.jar | > grep "/Analyzer.class" > org/apache/lucene/analysis/Analyzer.class > > [swap exec for echo in last line of bin/mahout ] > > [jake@smf1-ady-15-sr1 mahout-distribution-0.5-SNAPSHOT]$ ./bin/mahout > Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20 > No HADOOP_CONF_DIR set, using /usr/lib/hadoop-0.20/src/conf > /usr/lib/hadoop-0.20/bin/hadoop jar > /home/jake/mahout-distribution-0.5-SNAPSHOT/mahout-examples-0.5-SNAPSHOT-job.jar > org.apache.mahout.driver.MahoutDriver > > :\ > > >> On Tue, May 10, 2011 at 12:40 AM, Jake Mannix <[email protected]> >> wrote: >> >> > wah. Even trying to do seq2sparse doesn't work for me: >> > >> > [jake@smf1-ady-15-sr1 mahout-distribution-0.5-SNAPSHOT]$ ./bin/mahout >> > seq2sparse -i hdfs://<namenode>/user/jake/text_temp -o >> > hdfs://<namenode>/user/jake/text_vectors_temp >> > Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20 >> > No HADOOP_CONF_DIR set, using /usr/lib/hadoop-0.20/src/conf >> > 11/05/09 23:36:01 WARN driver.MahoutDriver: No seq2sparse.props found on >> > classpath, will use command-line arguments only >> > 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum >> > n-gram size is: 1 >> > 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum >> > LLR value: 1.0 >> > 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Number >> of >> > reduce tasks: 1 >> > 11/05/09 23:36:04 INFO input.FileInputFormat: Total input paths to >> process >> > : >> > 1 >> > 11/05/09 23:36:10 INFO mapred.JobClient: Running job: >> > job_201104300433_126621 >> > 11/05/09 23:36:12 INFO mapred.JobClient: map 0% reduce 0% >> > 11/05/09 23:36:47 INFO mapred.JobClient: Task Id : >> > attempt_201104300433_126621_m_000000_0, Status : FAILED >> > 11/05/09 23:37:07 INFO mapred.JobClient: Task Id : >> > attempt_201104300433_126621_m_000000_1, Status : FAILED >> > Error: java.lang.ClassNotFoundException: >> > org.apache.lucene.analysis.Analyzer >> > >> > ---- >> > >> > Note I'm not specifying any fancy analyzer. Just trying to run with the >> > defaults. :\ >> > >> > -jake >> >
