Hi Bogdan, This is in reply to your previous post where you asked about having word- stoppers in Mahout.
Well, recently I was fighting with the same thing and found a solution, which worked perfectly fine. What you should do is - 1. Create your own (customized) Lucene Analyzer by extending Analyzer class and overriding tokenStream method 2. Create a jar file containing your custom analyzer. Make sure to have your lucene jar file in the MANIFEST.mf. 3. Place the jar in mahout/examples/target/dependency. In case you get ClassNotFoundException in the next step, you may like to put the two jar files in hadoop/lib/ as well. Also you can try making entries of the jar files in HADOOP_CLASSPATH and CLASSPATH environment variable. 4. Then run your seq2sparse command by mentioning your custom analyzer in -a parameter 5. Run your k-means command as you would otherwise do. Hope this helps If you need the complete code for custom analyzer, let me know. Thanks Abbas
