The "all_documents" folder is a directory of files in plain text format.
________________________________ From: Suneel Marthi <[email protected]> To: "[email protected]" <[email protected]>; Michael Constantine <[email protected]> Sent: Friday, July 13, 2012 12:27 PM Subject: Re: SeqDirectory The command seems ok. I can only ask u to verify that ur input path is correct and has documents that need to be processed. ________________________________ From: Michael Constantine <[email protected]> To: "[email protected]" <[email protected]>; Suneel Marthi <[email protected]> Sent: Friday, July 13, 2012 10:58 AM Subject: Re: SeqDirectory hduser@ubuntu:~$ $MAHOUT_HOME/bin/mahout seqdirectory --input Documents/all_documents --output Documents/output -c UTF-8 -chunk 5 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using HADOOP_HOME=/usr/local/hadoop No HADOOP_CONF_DIR set, using /usr/local/hadoop/conf MAHOUT-JOB: /usr/local/mahout/examples/target/mahout-examples-0.6-job.jar 12/07/13 06:50:14 INFO common.AbstractJob: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=Documents/all_documents, --keyPrefix=, --output=Documents/output, --startPhase=0, --tempDir=temp} 12/07/13 06:50:16 INFO driver.MahoutDriver: Program took 2059 ms (Minutes: 0.03431666666666667) Here is my terminal screen. However, the directory "output" is empty. ________________________________ From: Suneel Marthi <[email protected]> To: "[email protected]" <[email protected]>; Michael Constantine <[email protected]> Sent: Friday, July 13, 2012 10:49 AM Subject: Re: SeqDirectory What's the command u were using for SeqDirectory? ________________________________ From: Michael Constantine <[email protected]> To: "[email protected]" <[email protected]> Sent: Friday, July 13, 2012 10:38 AM Subject: SeqDirectory Hello, I have a directory of the enron e-mails in plain text format. When I run the SeqDirectory on them to try to convert them to SequenceFile, it finishes in seconds and says it has completed the job, but the output folder is empty. Is there something that I am missing? Thanks, Mike
