What does this do? And is it what you want? org.apache.mahout.text.PrefixAdditionFilter
You can run these apps from inside Eclipse/IntelliJ, and single step where it walks files. On Wed, Feb 22, 2012 at 7:01 PM, Temese Szalai <[email protected]> wrote: > Hello - > > I'm new to Mahout and I'm not having any luck trying to use seqdirectory to > create seqfiles so that i can then generate vectors from text files. > Seems like this operation should work like a charm. > > Here is the command that I used to attempt to process the Reuters corpus > into seqfiles and the output that I got in the terminal. > > *$ bin/mahout seqdirectory -c UTF-8 -i examples/reuters-extracted/ -o > reuters-seqfiles* > *Running on hadoop, using > HADOOP_HOME=/Users/temeseszalai/Desktop/hadoop-0.20.203.0* > *No HADOOP_CONF_DIR set, using > /Users/temeseszalai/Desktop/hadoop-0.20.203.0/src/conf * > *12/02/22 16:29:01 INFO common.AbstractJob: Command line arguments: > {--charset=UTF-8, --chunkSize=64, --endPhase=2147483647, > --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, > --input=examples/reuters-extracted/, --keyPrefix=, > --output=reuters-seqfiles, --startPhase=0, --tempDir=temp}* > *12/02/22 16:29:02 INFO driver.MahoutDriver: Program took 418 ms* > > I am using mahout-distribution-0.5 on Mac OSX (10.7.3). > I don't get any error messages from seqdirectory. I just don't get any > seqfiles. > > the output directory is always empty and the time it takes to run is always > minimal.. have tried with different data, different paths, have had someone > else with > considerably more java experience sanity check and still no luck. > > I'm clearly doing something wrong ... No idea what ... I've tried poking > around to see if anyone else has had the same issue and haven't turned up > much that is useful. > > Any thoughts? Guidance would definitely be appreciated. > > Thanks in advance. > Temese -- Lance Norskog [email protected]
