Regarding the errors,
which version of Mahout are you using?
There was some problem in cluster-reuters.sh ( build-reuters.sh calls
cluster-reuters.sh ) which has been fixed in the last release 0.7.
________________________________________
From: Svet [[email protected]]
Sent: Tuesday, June 19, 2012 2:51 PM
To: [email protected]
Subject: several info
Hi all,
First of all i would like to thanks Praveenesh Kumar for helping me with hadoop
and mahout!!!
Nevertheless i have several questions about Mahout.
1) I need Mahout working with SOLR. Can somebody give me a great tutorial to
make them starting together?
2)What exactly the possibilities of input and output files of Mahout (especially
when Mahout works with SOLR, i know that output file of SOLR is XML)?
3)Which of thoses algorythms are using Hadoop? And please complete the list if i
forgot some.
-Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation
4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans
clustering (but its the same error with fuzzykmeans)
Can somebody help me with this error? (but look at 8) ! )
###########################
12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalStateException: No clusters found. Check your -c path.
at
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/06/19 13:33:52 INFO mapred.JobClient: map 0% reduce 0%
12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001
12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.InterruptedException: K-Means Iteration
failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-
randomSeed
at
org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:
371)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja
va:316)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java
:239)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
a:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
a:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
###########################
5)problem also with "./build-reuters" but lda (but look at 8) ! )
############################
12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalArgumentException
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at
org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)
at
org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)
at
org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper
.java:96)
at
org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav
a:102)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/06/19 13:40:02 INFO mapred.JobClient: map 0% reduce 0%
12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001
12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed
processing /tmp/mahout-work-hduser/reuters-lda/state-0
at
org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)
at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)
at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
a:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
a:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
############################
6)But i was starting "./build-reuters" with dirichlet clustering and it wrote
20clusters without problems (but look at 8) ! )
The result is :
############################
...
12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:
2.3768166666666666)
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-
examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--
dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-
dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --
distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur
e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --
seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --
startPhase=0, --substring=100, --tempDir=temp}
DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}
Top Terms:
DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}
Top Terms:
DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}
Top Terms:
DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}
Top Terms:
DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}
Top Terms:
DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}
Top Terms:
DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}
Top Terms:
DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}
Top Terms:
DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}
Top Terms:
DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}
Top Terms:
DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}
Top Terms:
DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}
Top Terms:
DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}
Top Terms:
DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}
Top Terms:
DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}
Top Terms:
DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}
Top Terms:
DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}
Top Terms:
DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}
Top Terms:
DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}
Top Terms:
DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}
Top Terms:
12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters
12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:
0.01315)
############################
7) And the end : "./build-reuters" with minhash clustering.
Works good!
8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/
...
Thanks everybody
Regards