Hi all,

        
First of all i would like to thanks Praveenesh Kumar for helping me with hadoop 
and mahout!!!

Nevertheless i have several questions about Mahout.

1) I need Mahout working with SOLR. Can somebody give me a great tutorial to 
make them starting together?

2)What exactly the possibilities of input and output files of Mahout 
(especially 
when Mahout works with SOLR, i know that output file of SOLR is XML)?

3)Which of thoses algorythms are using Hadoop? And please complete the list if 
i 
forgot some.
          -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation




4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans 
clustering (but its the same error with fuzzykmeans)
 Can somebody help me with this error? (but look at 8) ! )
###########################
12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at 
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/06/19 13:33:52 INFO mapred.JobClient:  map 0% reduce 0%
12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001
12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.InterruptedException: K-Means Iteration 
failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-
randomSeed
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:
371)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja
va:316)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java
:239)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
a:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
a:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

###########################


5)problem also with "./build-reuters" but lda (but look at 8) ! )
############################
12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalArgumentException
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
        at 
org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)
        at 
org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)
        at 
org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper
.java:96)
        at 
org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav
a:102)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/06/19 13:40:02 INFO mapred.JobClient:  map 0% reduce 0%
12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001
12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed 
processing /tmp/mahout-work-hduser/reuters-lda/state-0
        at 
org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)
        at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)
        at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
a:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
a:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
############################


6)But i was starting "./build-reuters" with dirichlet clustering and it wrote 
20clusters without problems (but look at 8) ! )
The result is :
############################
...
12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes: 
2.3768166666666666)
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-
examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--
dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-
dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --
distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur
e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --
seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --
startPhase=0, --substring=100, --tempDir=temp}
DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}
        Top Terms: 
DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}
        Top Terms: 
DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}
        Top Terms: 
DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}
        Top Terms: 
DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}
        Top Terms: 
DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}
        Top Terms: 
DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}
        Top Terms: 
DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}
        Top Terms: 
DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}
        Top Terms: 
DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}
        Top Terms: 
DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}
        Top Terms: 
DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}
        Top Terms: 
DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}
        Top Terms: 
DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}
        Top Terms: 
DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}
        Top Terms: 
DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}
        Top Terms: 
DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}
        Top Terms: 
DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}
        Top Terms: 
DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}
        Top Terms: 
DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}
        Top Terms: 
12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters
12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes: 
0.01315)
############################


7) And the end : "./build-reuters" with minhash clustering.
Works good!


8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/

...



Thanks everybody
Regards

Reply via email to