I want crawlers to index Thai language using nutch-1.0.  (Thai has no space
between words!)

I looked at plugins/lib-lucene-analyzers.  It contains ThaiAnalyzer. So, I
tried to add the plugin.includes  property in nutch-site.xml as below.

<property>
      <name>plugin.includes</name>
     
<value>language-identifier|nutch-extensionpoints|lib-lucene-analyzers|scoring-opic|protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>
      <description>Plugin</description>
    </property> 


This does not work. It cannot index  So, I checked handoop.log. It shows
something like

2010-05-27 12:21:07,029 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2010-05-27 12:21:07,030 INFO  plugin.PluginRepository - Registered Plugins:
2010-05-27 12:21:07,030 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2010-05-27 12:21:07,030 INFO  plugin.PluginRepository -         Lucene Analysers
(lib-lucene-analyzers)
2010-05-27 12:21:07,030 INFO  plugin.PluginRepository -         Language
Identification Parser/Filter (language-identifier)


I don't know if this means the plugin was loaded.


How can I make use of Thai Analysis? Is the property tag above is correct?
And, How can I check if the crawler use ThaiAnayzer to do indexing.


Need you help. I've stuck with this problem for many days.

Thank you.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-I-use-multi-language-analyzer-tp847380p847380.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to