Has anyone seen this error as well, while trying to create sparse vectors from 
a sequenced directory.


First sequencing:

bin/mahout seqdirectory -i /user/hadoop/htmlless_articles  -o 
/user/hadoop/htmless_articles_seq -ow
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop/
No HADOOP_CONF_DIR set, using /usr/local/hadoop//src/conf 
11/06/20 13:31:35 WARN driver.MahoutDriver: No seqdirectory.props found on 
classpath, will use command-line arguments only
11/06/20 13:31:35 INFO common.AbstractJob: Command line arguments: 
{--charset=UTF-8, --chunkSize=64, --endPhase=2147483647, 
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, 
--input=/user/hadoop/htmlless_articles, --keyPrefix=, 
--output=/user/hadoop/htmless_articles_seq, --overwrite=null, --startPhase=0, 
--tempDir=temp}
11/06/20 13:31:35 INFO common.HadoopUtil: Deleting 
/user/hadoop/htmless_articles_seq
11/06/20 13:31:43 INFO driver.MahoutDriver: Program took 8298 ms 


Then when trying to create sparse vectors:


bin/mahout seq2sparse -i /htmless_articles_seq -o /htmless_articles_vectors_1 
-wt tfidf

11/06/20 13:13:20 INFO mapred.JobClient: Task Id : 
attempt_201104261414_0625_m_000000_0, Status : FAILED
Error: LUCENE_31

Reply via email to