Hey all, I'm trying the Latent Dirichlet Allocation operator. I made my term vectors as specified here: https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html with these commands:
~/Scripts/Mahout/trunk/bin/mahout seqdirectory --input /home/ben/Scripts/eipi/files --output /home/ben/Scripts/eipi/mahout_out -chunk 1 ~/Scripts/Mahout/trunk/bin/mahout seq2sparse -i /home/ben/Scripts/eipi/mahout_out -o /home/ben/Scripts/eipi/termvecs -wt tf -seq Then I run this, trying to follow these instructions: https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html ~/Scripts/Mahout/trunk/bin/mahout lda -i /home/ben/Scripts/eipi/termvecs -o /home/ben/Scripts/eipi/lda_working -k 2 -v 100 And I get: MAHOUT-JOB: /home/ben/Scripts/Mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar 11/09/04 16:28:59 INFO common.AbstractJob: Command line arguments: {--endPhase=2147483647, --input=/home/ben/Scripts/eipi/termvecs, --maxIter=-1, --numTopics=2, --numWords=100, --output=/home/ben/Scripts/eipi/lda_working, --startPhase=0, --tempDir=temp, --topicSmoothing=-1.0} 11/09/04 16:29:00 INFO lda.LDADriver: LDA Iteration 1 11/09/04 16:29:01 INFO input.FileInputFormat: Total input paths to process : 4 11/09/04 16:29:01 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-ben/mapred/staging/ben692167368/.staging/job_local_0001 Exception in thread "main" java.io.FileNotFoundException: File file:/home/ben/Scripts/eipi/termvecs/tokenized-documents/data does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at ... Does anyone know what I'm doing wrong?
