Has anyone seen this error as well, while trying to create sparse vectors from
a sequenced directory.
First sequencing:
bin/mahout seqdirectory -i /user/hadoop/htmlless_articles -o
/user/hadoop/htmless_articles_seq -ow
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop/
No HADOOP_CONF_DIR set, using /usr/local/hadoop//src/conf
11/06/20 13:31:35 WARN driver.MahoutDriver: No seqdirectory.props found on
classpath, will use command-line arguments only
11/06/20 13:31:35 INFO common.AbstractJob: Command line arguments:
{--charset=UTF-8, --chunkSize=64, --endPhase=2147483647,
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter,
--input=/user/hadoop/htmlless_articles, --keyPrefix=,
--output=/user/hadoop/htmless_articles_seq, --overwrite=null, --startPhase=0,
--tempDir=temp}
11/06/20 13:31:35 INFO common.HadoopUtil: Deleting
/user/hadoop/htmless_articles_seq
11/06/20 13:31:43 INFO driver.MahoutDriver: Program took 8298 ms
Then when trying to create sparse vectors:
bin/mahout seq2sparse -i /htmless_articles_seq -o /htmless_articles_vectors_1
-wt tfidf
11/06/20 13:13:20 INFO mapred.JobClient: Task Id :
attempt_201104261414_0625_m_000000_0, Status : FAILED
Error: LUCENE_31