I think its version/doc mismatch. The current version just takes the
input path as seqFileDir.
seqFileDir = getInputPath();
On 05-09-2012 12:56, javaboom wrote:
I've tried to use "clusterdump". I followed this manual
https://cwiki.apache.org/MAHOUT/cluster-dumper.html
I tried the following command line
$MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-10
--pointsDir output/clusteredPoints --output
$MAHOUT_HOME/examples/output/clusteranalyze.txt
I got a problem i.e., "clusterdump" cannot recognize the option
"--seqFileDir". Then I checked the help option of the command as follows:
============================================================================
root@ubuntu:~/trunk/bin# ./mahout clusterdump --help
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /root/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
Usage:
[--input <input> --output <output> --outputFormat <outputFormat>
--substring
<substring> --numWords <numWords> --pointsDir <pointsDir> --samplePoints
<samplePoints> --dictionary <dictionary> --dictionaryType <dictionaryType>
--evaluate --distanceMeasure <distanceMeasure> --help --tempDir <tempDir>
--startPhase <startPhase> --endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for
output.
--outputFormat (-of) outputFormat The optional output format to
write the results as. Options:
TEXT, CSV or GRAPH_ML
--substring (-b) substring The number of chars of the
asFormatString() to print
--numWords (-n) numWords The number of top terms to
print
--pointsDir (-p) pointsDir The directory containing points
sequence files mapping input
vectors to their cluster. If
specified, then the program
will
output the points associated
with
a cluster
--samplePoints (-sp) samplePoints Specifies the maximum number of
points to include _per_
cluster.
The default is to include all
points
--dictionary (-d) dictionary The dictionary file
--dictionaryType (-dt) dictionaryType The dictionary file type
(text|sequencefile)
--evaluate (-e) Run ClusterEvaluator and
CDbwEvaluator over the input.
The
output will be appended to the
rest of the output at the end.
--distanceMeasure (-dm) distanceMeasure The classname of the
DistanceMeasure. Default is
SquaredEuclidean
--help (-h) Print out help
--tempDir tempDir Intermediate output directory
--startPhase startPhase First phase to run
--endPhase endPhase Last phase to run
Specify HDFS directories while running on hadoop; else specify local file
system directories
12/09/05 15:17:25 INFO driver.MahoutDriver: Program took 170 ms (Minutes:
0.0028333333333333335)
============================================================================
Could you please help me? How can I solve this problem? Have I used
different Mahout version?
Thank you in advance
--
View this message in context:
http://lucene.472066.n3.nabble.com/Does-clusterdump-still-support-option-seqFileDir-tp4005517.html
Sent from the Mahout User List mailing list archive at Nabble.com.