Is there an API doc or design doc that I can read to
understand where you are? Is the language plugin architecture
already in the main trunk?
The only available document is
http://wiki.apache.org/nutch/MultiLingualSupport
and sometimes I maintain this page
http://wiki.apache.org/nutch/JeromeCharron
Here are some issues that I've been worried about:
* Support of multilingual plugin?
** If one plugin can support more than one languages,
the language needs to be passed at each analyzsis.
I don't understand your need.
But if you have an analysis plugin that can handle many languages, you
can simply define many implementations in your plugin xml, ie
<extension id="org.apache.nutch.analysis.cjk"
name="CJKAnalyzer"
point="org.apache.nutch.analysis.NutchAnalyzer">
<implementation id="org.apache.nutch.analysis.cn.ChineseAnalyzer"
class="org.apache.nutch.analysis.cjk.CJKAnalyzer ">
<parameter name="lang" value="cn"/>
</implementation>
<implementation id="org.apache.nutch.analysis.kr.KoreanAnalyzer"
class="org.apache.nutch.analysis.cjk.CJKAnalyzer">
<parameter name="lang" value="kr"/>
</implementation>
<implementation id="org.apache.nutch.analysis.jp.JapaneseAnalyzer"
class="org.apache.nutch.analysis.cjk.CJKAnalyzer">
<parameter name="lang" value="jp"/>
</implementation>
</extension>
** This assumes language identification is done before
analysis. Is it the case ?
Yes.
* Support of a different analyzer for query than index
** Analyzer for query may need to behave differently than
analyzer for indexinging. Can your architecture
specify different analyzers for indexing and query?
In fact, to avoid adding a QueryAnalyser extension point,
the Query use the same Analyzer implementation that the one
for document analysis.
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers