Hello -
I'm new to Mahout and I'm not having any luck trying to use seqdirectory to
create seqfiles so that i can then generate vectors from text files.
Seems like this operation should work like a charm.
Here is the command that I used to attempt to process the Reuters corpus
into seqfiles and the output that I got in the terminal.
*$ bin/mahout seqdirectory -c UTF-8 -i examples/reuters-extracted/ -o
reuters-seqfiles*
*Running on hadoop, using
HADOOP_HOME=/Users/temeseszalai/Desktop/hadoop-0.20.203.0*
*No HADOOP_CONF_DIR set, using
/Users/temeseszalai/Desktop/hadoop-0.20.203.0/src/conf *
*12/02/22 16:29:01 INFO common.AbstractJob: Command line arguments:
{--charset=UTF-8, --chunkSize=64, --endPhase=2147483647,
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter,
--input=examples/reuters-extracted/, --keyPrefix=,
--output=reuters-seqfiles, --startPhase=0, --tempDir=temp}*
*12/02/22 16:29:02 INFO driver.MahoutDriver: Program took 418 ms*
I am using mahout-distribution-0.5 on Mac OSX (10.7.3).
I don't get any error messages from seqdirectory. I just don't get any
seqfiles.
the output directory is always empty and the time it takes to run is always
minimal.. have tried with different data, different paths, have had someone
else with
considerably more java experience sanity check and still no luck.
I'm clearly doing something wrong ... No idea what ... I've tried poking
around to see if anyone else has had the same issue and haven't turned up
much that is useful.
Any thoughts? Guidance would definitely be appreciated.
Thanks in advance.
Temese