Nevermind, got it to work, had to fix the script though. Thanks. Gustavo
On Fri, Sep 7, 2012 at 5:54 PM, Gustavo Enrique Salazar Torres < [email protected]> wrote: > Hi there: > > I'm trying to finish an improvement to the Kmeans algorithm but I first > need to get it run in order to compare results. > But running the cluster-reuters.sh script I get this error: > > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. > Running on hadoop, using /home/gustavo/Desktop/yandex_data/hadoop- > 0.20.203.0/bin/hadoop and > HADOOP_CONF_DIR=/home/gustavo/Desktop/yandex_data/hadoop-0.20.203.0/conf > MAHOUT-JOB: > /home/gustavo/Desktop/yandex_data/mahout-distribution-0.7/mahout-examples-0.7-job.jar > 12/09/07 17:47:43 INFO common.AbstractJob: Command line arguments: > {--clustering=null, --clusters=[./reuters-kmeans-clusters], > --convergenceDelta=[0.5], > --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], > --endPhase=[2147483647], > --input=[./reuters_out_seqdir_kmeans/tfidf-vectors], --maxIter=[10], > --method=[mapreduce], --numClusters=[20], --output=[./reuters-kmeans], > --overwrite=null, --startPhase=[0], --tempDir=[temp]} > 12/09/07 17:47:44 INFO common.HadoopUtil: Deleting reuters-kmeans-clusters > 12/09/07 17:47:44 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 12/09/07 17:47:44 INFO zlib.ZlibFactory: Successfully loaded & initialized > native-zlib library > 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new compressor > 12/09/07 17:47:44 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to > reuters-kmeans-clusters/part-randomSeed > 12/09/07 17:47:44 INFO kmeans.KMeansDriver: Input: > reuters_out_seqdir_kmeans/tfidf-vectors Clusters In: > reuters-kmeans-clusters/part-randomSeed Out: reuters-kmeans Distance: > org.apache.mahout.common.distance.CosineDistanceMeasure > 12/09/07 17:47:44 INFO kmeans.KMeansDriver: convergence: 0.5 max > Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable > Input Vectors: {} > 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new decompressor > Exception in thread "main" java.lang.IllegalStateException: No input > clusters found in reuters-kmeans-clusters/part-randomSeed. Check your -c > argument. > at > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:218) > > As you can see the initial clusters are being created but for a reason I > don't understand why they are being found. > Below is the 'cat' command on the part file containing clusters: > > $ dfs -cat reuters-kmeans-clusters/part-randomSeed > SEQ > org.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable > *org.apache.hadoop.io.compress.DefaultCodec b�W3 K�E�߇H��Vgustavo > > Can anyone help me please? > > Thanks > Gustavo Salazar >
