How do you want to combine Mahout and Solr? Also, Solr is a web
service and can receive and supply data in several different formats.

On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <[email protected]> wrote:
> Regarding the errors,
> which version of Mahout are you using?
> There was some problem in cluster-reuters.sh ( build-reuters.sh calls 
> cluster-reuters.sh ) which has been fixed in the last release 0.7.
> ________________________________________
> From: Svet [[email protected]]
> Sent: Tuesday, June 19, 2012 2:51 PM
> To: [email protected]
> Subject: several info
>
> Hi all,
>
>
> First of all i would like to thanks Praveenesh Kumar for helping me with 
> hadoop
> and mahout!!!
>
> Nevertheless i have several questions about Mahout.
>
> 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to
> make them starting together?
>
> 2)What exactly the possibilities of input and output files of Mahout 
> (especially
> when Mahout works with SOLR, i know that output file of SOLR is XML)?
>
> 3)Which of thoses algorythms are using Hadoop? And please complete the list 
> if i
> forgot some.
>          -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation
>
>
>
>
> 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans
> clustering (but its the same error with fuzzykmeans)
>  Can somebody help me with this error? (but look at 8) ! )
> ###########################
> 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001
> java.lang.IllegalStateException: No clusters found. Check your -c path.
>        at
> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/06/19 13:33:52 INFO mapred.JobClient:  map 0% reduce 0%
> 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001
> 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0
> Exception in thread "main" java.lang.InterruptedException: K-Means Iteration
> failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-
> randomSeed
>        at
> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:
> 371)
>        at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja
> va:316)
>        at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java
> :239)
>        at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
>        at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> a:43)
>        at java.lang.reflect.Method.invoke(Method.java:601)
>        at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
> a:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
> ###########################
>
>
> 5)problem also with "./build-reuters" but lda (but look at 8) ! )
> ############################
> 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001
> java.lang.IllegalArgumentException
>        at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>        at
> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)
>        at
> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)
>        at
> org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper
> .java:96)
>        at
> org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav
> a:102)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/06/19 13:40:02 INFO mapred.JobClient:  map 0% reduce 0%
> 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001
> 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0
> Exception in thread "main" java.lang.InterruptedException: LDA Iteration 
> failed
> processing /tmp/mahout-work-hduser/reuters-lda/state-0
>        at
> org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)
>        at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)
>        at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> a:43)
>        at java.lang.reflect.Method.invoke(Method.java:601)
>        at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
> a:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> ############################
>
>
> 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote
> 20clusters without problems (but look at 8) ! )
> The result is :
> ############################
> ...
> 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:
> 2.3768166666666666)
> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
> MAHOUT_LOCAL is set, running locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-
> examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
> jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
> log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--
> dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-
> dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --
> distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur
> e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --
> seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --
> startPhase=0, --substring=100, --tempDir=temp}
> DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}
>        Top Terms:
> DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}
>        Top Terms:
> DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}
>        Top Terms:
> DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}
>        Top Terms:
> DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}
>        Top Terms:
> DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}
>        Top Terms:
> DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}
>        Top Terms:
> DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}
>        Top Terms:
> DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}
>        Top Terms:
> DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}
>        Top Terms:
> DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}
>        Top Terms:
> DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}
>        Top Terms:
> DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}
>        Top Terms:
> DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}
>        Top Terms:
> DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}
>        Top Terms:
> DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}
>        Top Terms:
> DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}
>        Top Terms:
> DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}
>        Top Terms:
> DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}
>        Top Terms:
> DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}
>        Top Terms:
> 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters
> 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:
> 0.01315)
> ############################
>
>
> 7) And the end : "./build-reuters" with minhash clustering.
> Works good!
>
>
> 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/
>
> ...
>
>
>
> Thanks everybody
> Regards
>



-- 
Lance Norskog
[email protected]

Reply via email to