Just ran seq2sparse on a clean checkout of trunk with a cluster
started by Whirr. This works without problems.

frank@franktop:~/Desktop/mahout$ bin/mahout seq2sparse --input
target/posts --output target/seq2sparse --weight tfidf  --namedVector
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
HADOOP_CONF_DIR=/home/frank/.whirr/frank-cluster/
11/05/11 17:57:17 WARN conf.Configuration: DEPRECATED: hadoop-site.xml
found in the classpath. Usage of hadoop-site.xml is deprecated.
Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
override properties of core-default.xml, mapred-default.xml and
hdfs-default.xml respectively
11/05/11 17:57:18 INFO vectorizer.SparseVectorsFromSequenceFiles:
Maximum n-gram size is: 1
11/05/11 17:57:18 INFO vectorizer.SparseVectorsFromSequenceFiles:
Minimum LLR value: 1.0
11/05/11 17:57:18 INFO vectorizer.SparseVectorsFromSequenceFiles:
Number of reduce tasks: 1
11/05/11 17:57:19 INFO common.HadoopUtil: Deleting target/seq2sparse
11/05/11 17:58:42 INFO input.FileInputFormat: Total input paths to process : 1
11/05/11 17:58:45 INFO mapred.JobClient: Running job: job_201105111409_0009
11/05/11 17:58:46 INFO mapred.JobClient:  map 0% reduce 0%
11/05/11 17:59:00 INFO mapred.JobClient:  map 100% reduce 0%

Frank

On Tue, May 10, 2011 at 5:34 PM, Jake Mannix <[email protected]> wrote:
> On Tue, May 10, 2011 at 8:24 AM, Sean Owen <[email protected]> wrote:
>
>> I peeked in the examples job jar and it definitely does have this class,
>> along with the other dependencies (after my patch). Double-check that
>> you've
>> done the clean build an "install" again? and maybe even print out
>> MAHOUT_JOB
>> in the script to double-check what it is using?
>>
>
> [jake@smf1-ady-15-sr1 bla]$ jar -tf mahout-examples-0.5-SNAPSHOT-job.jar |
> grep "/Analyzer.class"
> org/apache/lucene/analysis/Analyzer.class
>
> [swap exec for echo in last line of bin/mahout ]
>
> [jake@smf1-ady-15-sr1 mahout-distribution-0.5-SNAPSHOT]$ ./bin/mahout
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20
> No HADOOP_CONF_DIR set, using /usr/lib/hadoop-0.20/src/conf
> /usr/lib/hadoop-0.20/bin/hadoop jar
> /home/jake/mahout-distribution-0.5-SNAPSHOT/mahout-examples-0.5-SNAPSHOT-job.jar
> org.apache.mahout.driver.MahoutDriver
>
> :\
>
>
>> On Tue, May 10, 2011 at 12:40 AM, Jake Mannix <[email protected]>
>> wrote:
>>
>> > wah.  Even trying to do seq2sparse doesn't work for me:
>> >
>> > [jake@smf1-ady-15-sr1 mahout-distribution-0.5-SNAPSHOT]$ ./bin/mahout
>> > seq2sparse -i hdfs://<namenode>/user/jake/text_temp -o
>> > hdfs://<namenode>/user/jake/text_vectors_temp
>> > Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20
>> > No HADOOP_CONF_DIR set, using /usr/lib/hadoop-0.20/src/conf
>> > 11/05/09 23:36:01 WARN driver.MahoutDriver: No seq2sparse.props found on
>> > classpath, will use command-line arguments only
>> > 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum
>> > n-gram size is: 1
>> > 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum
>> > LLR value: 1.0
>> > 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Number
>> of
>> > reduce tasks: 1
>> > 11/05/09 23:36:04 INFO input.FileInputFormat: Total input paths to
>> process
>> > :
>> > 1
>> > 11/05/09 23:36:10 INFO mapred.JobClient: Running job:
>> > job_201104300433_126621
>> > 11/05/09 23:36:12 INFO mapred.JobClient:  map 0% reduce 0%
>> > 11/05/09 23:36:47 INFO mapred.JobClient: Task Id :
>> > attempt_201104300433_126621_m_000000_0, Status : FAILED
>> > 11/05/09 23:37:07 INFO mapred.JobClient: Task Id :
>> > attempt_201104300433_126621_m_000000_1, Status : FAILED
>> > Error: java.lang.ClassNotFoundException:
>> > org.apache.lucene.analysis.Analyzer
>> >
>> > ----
>> >
>> > Note I'm not specifying any fancy analyzer.  Just trying to run with the
>> > defaults. :\
>> >
>> >  -jake
>>
>

Reply via email to