Hi Lance, Thank you for your fast answer. I was changing my : CLASSPATH=/opt/lucene-3.6.0/lucene-core-3.6.0.jar:/opt/lucene-3.6.0/lucene-core-3.6.0-javadoc.jar:/opt/lucene-3.6.0/lucene-test-framework-3.6.0.jar:/opt/lucene-3.6.0/lucene-test-framework-3.6.0-javadoc.jar:.
And put 3.6.0 in the pom.xml But: csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8$ ./bin/mahout seq2sparse --input ./examples/output/ --output ./toto/output/ hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency/slf4j-jcl-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1 12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0 12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1 12/07/19 09:03:56 INFO input.FileInputFormat: Total input paths to process : 15 12/07/19 09:03:56 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-csi/mapred/staging/csi-379951768/.staging/job_local_0001 Exception in thread "main" java.io.FileNotFoundException: File file:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:919) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807) at org.apache.hadoop.mapreduce.Job.submit(Job.java:465) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495) at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93) at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:255) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8$ ls _logs part-r-00000 _policy _SUCCESS There is no /usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data here! Thank you -----Message d'origine----- De : Lance Norskog [mailto:[email protected]] Envoyé : jeudi 19 juillet 2012 09:33 À : [email protected] Objet : Re: .txt to vector Yes, the Mahout analyzer would have to be updated for Lucene 4.0. I suggest using an earlier one. Mahout uses with Lucene in a very simple way, and it is OK to use any earlier Lucene from 3.1 to 3.6. On Wed, Jul 18, 2012 at 11:50 PM, Videnova, Svetlana <[email protected]> wrote: > Hi Sean, > > In fact i was using lucene version 3.6.0 (saw that in the pom.xml) But > in my classpath I was using lucene version 4.0.0 > > I change pom.xml to 4.0.0 => <lucene.version>4.0.0</lucene.version> > > But still the same error: > ### > Exception in thread "main" java.lang.VerifyError: class > org.apache.mahout.vectorizer.DefaultAnalyzer overrides final method > tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/ana > lysis/TokenStream; > ### > > Should I change something else? Or may be lucene 4.0 is too recent for > mahout!? > > > > Thank you > > -----Message d'origine----- > De : Sean Owen [mailto:[email protected]] Envoyé : mercredi 18 juillet > 2012 22:52 À : [email protected] Objet : Re: .txt to vector > > This means you're using it with an incompatible version of Lucene. I think > we're on 3.1. Check the version that Mahout depends upon and use at least > that version or later. > > On Wed, Jul 18, 2012 at 6:04 PM, Videnova, Svetlana < > [email protected]> wrote: > >> I'm working with mahout. I'm trying to do web service in java by >> myself who will take the output of solr and give this file to mahout. >> For the moment I successfully do the recommendation part. >> Now I'm trying to clusterise. For this I have to vectorise the output >> of solr. >> Do you have any idea how to do it please? I was following >> https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html >> BUT : doesn't work very well (at all...). >> >> I'm trying to find how to transform .txt to vector for mahout in >> order to clusterise and categorise my information. Is it possible? I >> saw that I have to use seqdirectory And seq2sparse. >> >> Seqdirectory create a file (with some numbers and everything...) this >> step is ok But then when I have to use seq2sparse that gives me this >> error: >> >> csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8$ ./bin/mahout >> seq2sparse --input ./examples/output/ --output ./toto/output/ hadoop >> binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running >> locally >> SLF4J: Class path contains multiple SLF4J bindings. >> SLF4J: Found binding in >> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/mahout-exa >> m ples-0.8-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> SLF4J: Found binding in >> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency >> / slf4j-jcl-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> SLF4J: Found binding in >> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency >> / slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >> explanation. >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles: >> Maximum n-gram size is: 1 >> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles: >> Minimum LLR value: 1.0 >> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles: >> Number of reduce tasks: 1 Exception in thread "main" >> java.lang.VerifyError: class >> org.apache.mahout.vectorizer.DefaultAnalyzer overrides final method >> tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; >> at java.lang.ClassLoader.defineClass1(Native Method) >> at >> java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) >> at java.lang.ClassLoader.defineClass(ClassLoader.java:615) >> at >> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) >> at >> java.net.URLClassLoader.defineClass(URLClassLoader.java:283) >> at >> java.net.URLClassLoader.access$000(URLClassLoader.java:58) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:197) >> at java.security.AccessController.doPrivileged(Native >> Method) >> at >> java.net.URLClassLoader.findClass(URLClassLoader.java:190) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >> at >> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >> at >> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:199) >> at >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at >> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >> at >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >> at >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) >> >> im using only lucene 4.0! >> >> CLASSPATH=/opt/lucene-4.0.0-ALPHA/demo/lucene-demo-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/core/lucene-core-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/analysis/common/lucene-analyzers-common-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/queryparser/lucene-queryparser-4.0.0-ALPHA.jar:. >> >> Please where im wrong? >> >> >> Thank you all >> Regards >> >> >> >> >> >> >> Think green - keep it on the screen. >> >> This e-mail and any attachment is for authorised use by the intended >> recipient(s) only. It may contain proprietary material, confidential >> information and/or be subject to legal privilege. It should not be >> copied, disclosed to, retained or used by, any other party. If you >> are not an intended recipient then please promptly delete this e-mail >> and any attachment and all copies and inform the sender. Thank you. >> >> > > Think green - keep it on the screen. > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any attachment > and all copies and inform the sender. Thank you. > -- Lance Norskog [email protected] Think green - keep it on the screen. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
