Re: Clustering techniques, tips and tricks

Abbas Wed, 07 Mar 2012 01:30:38 -0800

Hi Bogdan,

This is in reply to your previous post where you asked about having word-
stoppers 
in Mahout.


Well, recently I was fighting with the same thing and found a solution, 
which worked perfectly fine. What you should do is - 
1. Create your own (customized) Lucene Analyzer by extending Analyzer class 
and overriding tokenStream method

2. Create a jar file containing your custom analyzer. Make sure to have your 
lucene jar file in the MANIFEST.mf. 

3. Place the jar in mahout/examples/target/dependency. In case you get 
ClassNotFoundException in the next step, you may like to put the two jar files 
in 
hadoop/lib/ as well. Also you can try making entries of the jar files in 
HADOOP_CLASSPATH and CLASSPATH environment variable.

4. Then run your seq2sparse command by mentioning your custom analyzer in -a 
parameter

5. Run your k-means command as you would otherwise do.

Hope this helps

If you need the complete code for custom analyzer, let me know.

Thanks
Abbas

Re: Clustering techniques, tips and tricks

Reply via email to