Hi Paritosh: I made it work on Hadoop mode, not Local. I don't know if thats desirable. I also got this error: Hadoop libraries are missing when running local and, from what I saw at the mahout script, it simply discards all libraries when MAHOUT_LOCAL is set. So, is the local mode used for anything? (please forgive my ignorance, I don't know the whole project)
Gustavo On Sat, Sep 8, 2012 at 2:35 AM, Paritosh Ranjan <[email protected]> wrote: > Can you open up a jira describing the problem and submitting the patch for > your fix? > https://issues.apache.org/**jira/browse/MAHOUT<https://issues.apache.org/jira/browse/MAHOUT> > > > On 08-09-2012 09:40, Gustavo Enrique Salazar Torres wrote: > >> Nevermind, got it to work, had to fix the script though. >> >> Thanks. >> Gustavo >> >> On Fri, Sep 7, 2012 at 5:54 PM, Gustavo Enrique Salazar Torres < >> [email protected]> wrote: >> >> Hi there: >>> >>> I'm trying to finish an improvement to the Kmeans algorithm but I first >>> need to get it run in order to compare results. >>> But running the cluster-reuters.sh script I get this error: >>> >>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. >>> Running on hadoop, using /home/gustavo/Desktop/yandex_**data/hadoop- >>> 0.20.203.0/bin/hadoop and >>> HADOOP_CONF_DIR=/home/gustavo/**Desktop/yandex_data/hadoop-0.20.203.0/** >>> conf >>> MAHOUT-JOB: >>> /home/gustavo/Desktop/yandex_**data/mahout-distribution-0.7/** >>> mahout-examples-0.7-job.jar >>> 12/09/07 17:47:43 INFO common.AbstractJob: Command line arguments: >>> {--clustering=null, --clusters=[./reuters-kmeans-**clusters], >>> --convergenceDelta=[0.5], >>> --distanceMeasure=[org.apache.**mahout.common.distance.** >>> CosineDistanceMeasure], >>> --endPhase=[2147483647], >>> --input=[./reuters_out_seqdir_**kmeans/tfidf-vectors], --maxIter=[10], >>> --method=[mapreduce], --numClusters=[20], --output=[./reuters-kmeans], >>> --overwrite=null, --startPhase=[0], --tempDir=[temp]} >>> 12/09/07 17:47:44 INFO common.HadoopUtil: Deleting >>> reuters-kmeans-clusters >>> 12/09/07 17:47:44 INFO util.NativeCodeLoader: Loaded the native-hadoop >>> library >>> 12/09/07 17:47:44 INFO zlib.ZlibFactory: Successfully loaded & >>> initialized >>> native-zlib library >>> 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new compressor >>> 12/09/07 17:47:44 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to >>> reuters-kmeans-clusters/part-**randomSeed >>> 12/09/07 17:47:44 INFO kmeans.KMeansDriver: Input: >>> reuters_out_seqdir_kmeans/**tfidf-vectors Clusters In: >>> reuters-kmeans-clusters/part-**randomSeed Out: reuters-kmeans Distance: >>> org.apache.mahout.common.**distance.CosineDistanceMeasure >>> 12/09/07 17:47:44 INFO kmeans.KMeansDriver: convergence: 0.5 max >>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.**VectorWritable >>> Input Vectors: {} >>> 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new decompressor >>> Exception in thread "main" java.lang.**IllegalStateException: No input >>> clusters found in reuters-kmeans-clusters/part-**randomSeed. Check your >>> -c >>> argument. >>> at >>> org.apache.mahout.clustering.**kmeans.KMeansDriver.** >>> buildClusters(KMeansDriver.**java:218) >>> >>> As you can see the initial clusters are being created but for a reason I >>> don't understand why they are being found. >>> Below is the 'cat' command on the part file containing clusters: >>> >>> $ dfs -cat reuters-kmeans-clusters/part-**randomSeed >>> SEQ >>> org.apache.hadoop.io.Text5org.**apache.mahout.clustering.** >>> iterator.ClusterWritable >>> *org.apache.hadoop.io.**compress.DefaultCodec b�W3 K�E�߇H��Vgustavo >>> >>> Can anyone help me please? >>> >>> Thanks >>> Gustavo Salazar >>> >>> > >
