Re: .txt to vector

Alexander Aristov Thu, 19 Jul 2012 03:06:10 -0700

you've got another problem now

Exception in thread "main" java.io.FileNotFoundException: File
file:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data does
not exist.


Best Regards
Alexander Aristov


On 19 July 2012 12:30, Videnova, Svetlana <[email protected]>wrote:

> Hi Lance,
>
> Thank you for your fast answer.
> I was changing my :
> CLASSPATH=/opt/lucene-3.6.0/lucene-core-3.6.0.jar:/opt/lucene-3.6.0/lucene-core-3.6.0-javadoc.jar:/opt/lucene-3.6.0/lucene-test-framework-3.6.0.jar:/opt/lucene-3.6.0/lucene-test-framework-3.6.0-javadoc.jar:.
>
> And put 3.6.0 in the pom.xml
>
>
> But:
>
> csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8$ ./bin/mahout
> seq2sparse --input ./examples/output/ --output ./toto/output/
> hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running
> locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency/slf4j-jcl-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum
> n-gram size is: 1
> 12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum
> LLR value: 1.0
> 12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number
> of reduce tasks: 1
> 12/07/19 09:03:56 INFO input.FileInputFormat: Total input paths to process
> : 15
> 12/07/19 09:03:56 INFO mapred.JobClient: Cleaning up the staging area
> file:/tmp/hadoop-csi/mapred/staging/csi-379951768/.staging/job_local_0001
> Exception in thread "main" java.io.FileNotFoundException: File
> file:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data does
> not exist.
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>         at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)
>         at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>         at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:919)
>         at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:936)
>         at
> org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>         at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495)
>         at
> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
>         at
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:255)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>
> csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8$
> ls
> _logs  part-r-00000  _policy  _SUCCESS
>
>  There is no
> /usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data here!
>
>
> Thank you
>
> -----Message d'origine-----
> De : Lance Norskog [mailto:[email protected]]
> Envoyé : jeudi 19 juillet 2012 09:33
> À : [email protected]
> Objet : Re: .txt to vector
>
> Yes, the Mahout analyzer would have to be updated for Lucene 4.0. I
> suggest using an earlier one. Mahout uses with Lucene in a very simple way,
> and it is OK to use any earlier Lucene from 3.1 to 3.6.
>
> On Wed, Jul 18, 2012 at 11:50 PM, Videnova, Svetlana <
> [email protected]> wrote:
> > Hi Sean,
> >
> > In fact i was using lucene version 3.6.0 (saw that in the pom.xml) But
> > in my classpath I was using lucene version 4.0.0
> >
> > I change pom.xml to 4.0.0 => <lucene.version>4.0.0</lucene.version>
> >
> > But still the same error:
> > ###
> > Exception in thread "main" java.lang.VerifyError: class
> > org.apache.mahout.vectorizer.DefaultAnalyzer overrides final method
> > tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/ana
> > lysis/TokenStream;
> > ###
> >
> > Should I change something else? Or may be lucene 4.0 is too recent for
> mahout!?
> >
> >
> >
> > Thank you
> >
> > -----Message d'origine-----
> > De : Sean Owen [mailto:[email protected]] Envoyé : mercredi 18 juillet
> > 2012 22:52 À : [email protected] Objet : Re: .txt to vector
> >
> > This means you're using it with an incompatible version of Lucene. I
> think we're on 3.1. Check the version that Mahout depends upon and use at
> least that version or later.
> >
> > On Wed, Jul 18, 2012 at 6:04 PM, Videnova, Svetlana <
> [email protected]> wrote:
> >
> >> I'm working with mahout. I'm trying to do web service in java by
> >> myself who will take the output of solr and give this file to mahout.
> >> For the moment I successfully do the recommendation part.
> >> Now I'm trying to clusterise. For this I have to vectorise the output
> >> of solr.
> >> Do you have any idea how to do it please? I was following
> >> https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
> >> BUT : doesn't work very well (at all...).
> >>
> >> I'm trying to find how to transform .txt to vector for mahout in
> >> order to clusterise and categorise my information. Is it possible? I
> >> saw that I have to use seqdirectory And seq2sparse.
> >>
> >> Seqdirectory create a file (with some numbers and everything...) this
> >> step is ok But then when I have to use seq2sparse that gives me this
> >> error:
> >>
> >> csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8$ ./bin/mahout
> >> seq2sparse --input ./examples/output/ --output ./toto/output/ hadoop
> >> binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running
> >> locally
> >> SLF4J: Class path contains multiple SLF4J bindings.
> >> SLF4J: Found binding in
> >> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/mahout-exa
> >> m ples-0.8-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> SLF4J: Found binding in
> >> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency
> >> / slf4j-jcl-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> SLF4J: Found binding in
> >> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency
> >> / slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> >> explanation.
> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> >> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles:
> >> Maximum n-gram size is: 1
> >> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles:
> >> Minimum LLR value: 1.0
> >> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles:
> >> Number of reduce tasks: 1 Exception in thread "main"
> >> java.lang.VerifyError: class
> >> org.apache.mahout.vectorizer.DefaultAnalyzer overrides final method
> >>
> tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;
> >>                 at java.lang.ClassLoader.defineClass1(Native Method)
> >>                 at
> >> java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
> >>                 at
> java.lang.ClassLoader.defineClass(ClassLoader.java:615)
> >>                 at
> >> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
> >>                 at
> >> java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
> >>                 at
> >> java.net.URLClassLoader.access$000(URLClassLoader.java:58)
> >>                 at
> java.net.URLClassLoader$1.run(URLClassLoader.java:197)
> >>                 at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>                 at
> >> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >>                 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >>                 at
> >> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >>                 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >>                 at
> >>
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:199)
> >>                 at
> >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>                 at
> >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >>                 at
> >>
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55)
> >>                 at
> >> sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >> Method)
> >>                 at
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>                 at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>                 at java.lang.reflect.Method.invoke(Method.java:597)
> >>                 at
> >>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >>                 at
> >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >>                 at
> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >>
> >> im using only lucene 4.0!
> >>
> >>
> CLASSPATH=/opt/lucene-4.0.0-ALPHA/demo/lucene-demo-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/core/lucene-core-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/analysis/common/lucene-analyzers-common-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/queryparser/lucene-queryparser-4.0.0-ALPHA.jar:.
> >>
> >> Please where im wrong?
> >>
> >>
> >> Thank you all
> >> Regards
> >>
> >>
> >>
> >>
> >>
> >>
> >> Think green - keep it on the screen.
> >>
> >> This e-mail and any attachment is for authorised use by the intended
> >> recipient(s) only. It may contain proprietary material, confidential
> >> information and/or be subject to legal privilege. It should not be
> >> copied, disclosed to, retained or used by, any other party. If you
> >> are not an intended recipient then please promptly delete this e-mail
> >> and any attachment and all copies and inform the sender. Thank you.
> >>
> >>
> >
> > Think green - keep it on the screen.
> >
> > This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
>
>

Re: .txt to vector

Reply via email to