thanks qi wu.After modification,nutch invoked my plug-in.I will try modify
the plug-in LanguageIdentifier.
2007/4/3, qi wu <[EMAIL PROTECTED]>:

The selection of analyzers were triggered by the "lang" property in the
doc object. The lang property of doc were set by the plug-in
LanguageIdentifier.Unfortunately, LanguageIdentifier can't support Chinese
now. If you only need to deal with Chinese documents and English
documents,you can hardcode the lang property of doc to "zh" .In  "
Indexer.java", modify the code as blow:
// NutchAnalyzer analyzer = factory.get(doc.get("lang"));
  NutchAnalyzer analyzer = factory.get("zh");

----- Original Message -----
From: "zhao xiuwen" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, April 03, 2007 12:33 AM
Subject: Replace CJK lanaguage analyzer in nutch


> NutchAnalysis segments CJK term word-by-word.In order to make Nutch
> support Chinese well, I developed a plug-in for Chinese.
> I placed ChineseAnalyzer.class(extends NutchAnalyzer) and
> ChineseTokenizer.class in a jar-nutch-0.8.1/plugins/analysis-zh.jar, and
> configured plugin.xml and nutch-site.xml. I think nutch should
> replace NutchDocumentAnalyzer by ChineseAnalyzer,but nutch didn't.
What's
> wrong ?
> * plugin.xml configuration*:
> *  <?xml version="1.0" encoding="UTF-8"?>*
>
> *<plugin
>   id="analysis-zh"
>   name="Chinese Analysis Plug-in"
>   version="1.0.0"
>   provider-name="org.apache.nutch">*
>
> *   <runtime>
>      <library name="analysis-zh.jar">
>         <export name="*"/>
>      </library>
>   </runtime>*
>
> *   <requires>
>      <import plugin="nutch-extensionpoints" />
>   </requires>*
>
> *   <extension id="org.apache.nutch.analysis.zh"
>              name="ChineseAnalyzer"
>              point="org.apache.nutch.analysis.NutchAnalyzer">*
>
> *      <implementation id="ChineseAnalyzer"
>                      class="org.apache.nutch.analysis.zh.ChineseAnalyzer
">
>        <parameter name="lang" value="zh" />
>      </implementation>*
>
> *   </extension>*
>
> *</plugin>*
>
> *Here are some excerpts from nute-site.xml*
>
> *<property>
>  <name>plugin.includes</name>
>
>
<value>protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|analysis-zh</value>
>  <description> indexing and search plugins.
>  </description>
> </property>*
>
> *Here are some excerpts from the hadoop log:*
>
> *2007-04-02 21:35:32,687 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository - Registered
Plugins:
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  CyberNeko HTML
> Parser (lib-nekohtml)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Site Query
Filter
> (query-site)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Html Parse
Plug-in
> (parse-html)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Regex URL
Filter
> Framework (lib-regex-filter)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Basic Indexing
> Filter (index-basic)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Basic
Summarizer
> Plug-in (summary-basic)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Text Parse
Plug-in
> (parse-text)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  JavaScript
Parser
> (parse-js)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Regex URL
Filter
> (urlfilter-regex)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Basic Query
Filter
> (query-basic)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  HTTP Framework
> (lib-http)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  URL Query
Filter
> (query-url)
> 2007-04-02 21:35:32,687 INFO  plugin.PluginRepository -  Chinese
Analysis
> Plug-in (analysis-zh)*
>
> *......*
>
> *2007-04-02 21:36:26,234 INFO  indexer.Indexer -  Indexing [
> http://2008.163.com/] with analyzer **
> [EMAIL PROTECTED]<
[EMAIL PROTECTED]>
> * (null)
> 2007-04-02 21:36:26,359 INFO  indexer.Indexer -  Indexing [
> http://auto.163.com/] with analyzer **
> [EMAIL PROTECTED]<
[EMAIL PROTECTED]>
> * (null)*
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to