How do you want to combine Mahout and Solr? Also, Solr is a web service and can receive and supply data in several different formats.
On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <[email protected]> wrote: > Regarding the errors, > which version of Mahout are you using? > There was some problem in cluster-reuters.sh ( build-reuters.sh calls > cluster-reuters.sh ) which has been fixed in the last release 0.7. > ________________________________________ > From: Svet [[email protected]] > Sent: Tuesday, June 19, 2012 2:51 PM > To: [email protected] > Subject: several info > > Hi all, > > > First of all i would like to thanks Praveenesh Kumar for helping me with > hadoop > and mahout!!! > > Nevertheless i have several questions about Mahout. > > 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to > make them starting together? > > 2)What exactly the possibilities of input and output files of Mahout > (especially > when Mahout works with SOLR, i know that output file of SOLR is XML)? > > 3)Which of thoses algorythms are using Hadoop? And please complete the list > if i > forgot some. > -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation > > > > > 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans > clustering (but its the same error with fuzzykmeans) > Can somebody help me with this error? (but look at 8) ! ) > ########################### > 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.IllegalStateException: No clusters found. Check your -c path. > at > org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 12/06/19 13:33:52 INFO mapred.JobClient: map 0% reduce 0% > 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001 > 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0 > Exception in thread "main" java.lang.InterruptedException: K-Means Iteration > failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part- > randomSeed > at > org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java: > 371) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja > va:316) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java > :239) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav > a:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav > a:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > > ########################### > > > 5)problem also with "./build-reuters" but lda (but look at 8) ! ) > ############################ > 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124) > at > org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92) > at > org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper > .java:96) > at > org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav > a:102) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 12/06/19 13:40:02 INFO mapred.JobClient: map 0% reduce 0% > 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001 > 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0 > Exception in thread "main" java.lang.InterruptedException: LDA Iteration > failed > processing /tmp/mahout-work-hduser/reuters-lda/state-0 > at > org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449) > at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249) > at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav > a:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav > a:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > ############################ > > > 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote > 20clusters without problems (but look at 8) ! ) > The result is : > ############################ > ... > 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes: > 2.3768166666666666) > MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath. > MAHOUT_LOCAL is set, running locally > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout- > examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j- > jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j- > log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {-- > dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse- > dirichlet/dictionary.file-0, --dictionaryType=sequencefile, -- > distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur > e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, -- > seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, -- > startPhase=0, --substring=100, --tempDir=temp} > DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]} > Top Terms: > DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]} > Top Terms: > DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]} > Top Terms: > DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]} > Top Terms: > DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]} > Top Terms: > DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]} > Top Terms: > DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]} > Top Terms: > DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]} > Top Terms: > DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]} > Top Terms: > DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]} > Top Terms: > DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]} > Top Terms: > DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]} > Top Terms: > DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]} > Top Terms: > DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]} > Top Terms: > DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]} > Top Terms: > DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]} > Top Terms: > DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]} > Top Terms: > DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]} > Top Terms: > DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]} > Top Terms: > DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]} > Top Terms: > 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters > 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes: > 0.01315) > ############################ > > > 7) And the end : "./build-reuters" with minhash clustering. > Works good! > > > 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/ > > ... > > > > Thanks everybody > Regards > -- Lance Norskog [email protected]
