Yes, if you make a subclass of StandardAnalyzer or your own Analyzer that has a constructor with no arguments (presumably which calls a superclass constructor with the arguments you want), that should work nicely. (You could also just add a zero-argument constructor to your own custom analyzer.)
On Wed, Apr 20, 2011 at 1:25 PM, Camilo Lopez <[email protected]> wrote: > Ian, > > Using 3.0.x ( the one that comes by default in Mahouts trunk now), > by nullary consstructor you mean I should overload the constructor to receive > no args in my own custom class? > > > On 2011-04-20, at 1:23 PM, Ian Helmke wrote: > >> What version of lucene are you using? If you use lucene 3.0 or later, >> you can't use StandardAnalyzer as-is because it has no no-args >> constructor. You could try the mahout DefaultAnalyzer (which wraps the >> lucene analyzer in a no-argument constructor). I have gotten custom >> analyzers to work, but they need to have a nullary constructor. >> >> >> On Wed, Apr 20, 2011 at 12:58 PM, Camilo Lopez <[email protected]> >> wrote: >>> Hi List, >>> >>> Trying to run custom analizer classes I'm always getting >>> InstantiationException, at first I suspected my own code, but trying with >>> what is supposed to be the default value >>> 'org.apache.lucene.analysis.standard.StandardAnalyzer' I still get the same >>> exception. >>> >>> This is the command >>> >>> bin/mahout seq2sparse -i /htmless_articles_seq -o >>> /htmless_articles_vectors_1 -ng 3 -x35 -wt tfidf -a >>> org.apache.lucene.analysis.standard.StandardAnalyzer -nv >>> >>> >>> Looking a little deeper (ie catching the InstantiationException and >>> throwing getCause()) InstantiationException in turns out the problem is >>> caused by a NullPointerException >>> >>> Exception in thread "main" java.lang.NullPointerException >>> at >>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:211) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >>> at >>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:52) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at >>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >>> at >>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>> >>> >>> Am I missing something, is there another way to create/use custom analyzers >>> in seq2sparse? >>> >>> >>> > >
