Re: Custom analyzers for seq2sparse

Ian Helmke Wed, 20 Apr 2011 10:33:22 -0700

Yes, if you make a subclass of StandardAnalyzer or your own Analyzer
that has a constructor with no arguments (presumably which calls a
superclass constructor with the arguments you want), that should work
nicely. (You could also just add a zero-argument constructor to your
own custom analyzer.)


On Wed, Apr 20, 2011 at 1:25 PM, Camilo Lopez <[email protected]> wrote:
> Ian,
>
> Using 3.0.x ( the one that comes by default in Mahouts trunk now),
> by nullary consstructor you mean I should overload the constructor to receive
> no args in my own custom class?
>
>
> On 2011-04-20, at 1:23 PM, Ian Helmke wrote:
>
>> What version of lucene are you using? If you use lucene 3.0 or later,
>> you can't use StandardAnalyzer as-is because it has no no-args
>> constructor. You could try the mahout DefaultAnalyzer (which wraps the
>> lucene analyzer in a no-argument constructor). I have gotten custom
>> analyzers to work, but they need to have a nullary constructor.
>>
>>
>> On Wed, Apr 20, 2011 at 12:58 PM, Camilo Lopez <[email protected]> 
>> wrote:
>>> Hi List,
>>>
>>> Trying to run custom analizer classes I'm always getting 
>>> InstantiationException, at first I suspected my own code, but trying with 
>>> what is supposed to be the default value 
>>> 'org.apache.lucene.analysis.standard.StandardAnalyzer' I still get the same 
>>> exception.
>>>
>>> This is the command
>>>
>>> bin/mahout seq2sparse  -i /htmless_articles_seq -o 
>>> /htmless_articles_vectors_1 -ng 3 -x35 -wt tfidf -a 
>>> org.apache.lucene.analysis.standard.StandardAnalyzer  -nv
>>>
>>>
>>> Looking a little deeper (ie catching the InstantiationException and 
>>> throwing getCause())  InstantiationException in turns out the problem is 
>>> caused by a NullPointerException
>>>
>>> Exception in thread "main" java.lang.NullPointerException
>>>        at 
>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:211)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>        at 
>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:52)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>        at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at 
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>        at 
>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>        at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>
>>>
>>> Am I missing something, is there another way to create/use custom analyzers 
>>> in seq2sparse?
>>>
>>>
>>>
>
>

Re: Custom analyzers for seq2sparse

Reply via email to