Sorry for bringing this question up again.

I modified the plugin.xml under nutch_home/src/plugin/parse-js directory
to be:

  <extension id="com.p2r.nutch.filtering.P2RHtmlFilter"
              name="P2R Html Filter"
              point="org.apache.nutch.parse.HtmlParseFilter">
      <implementation id="JSParseFilter"
         class="com.p2r.nutch.filtering.P2RHtmlFilter">
      </implementation>
   </extension>

Please note that I only kept the implementation id as JSParseFilter
while modifying all the other parameters to point to my implementation
class of the HtmlParseFilter. Then P2RHtmlFilter code was successfully
called. 

If I changed the implementation id to "P2RHtmlFilter", which is defined
in the following plugin.xml:

<plugin 
        id="p2r-plugins" 
        name="P2R Plugins for Nutch"
    version="0.0.1" 
    provider-name="p2r.com">

   <runtime>
     <library name="p2r-plugins.jar">
       <export name="*"/>
     </library>
   </runtime>
   
   <requires>
      <import plugin="nutch-extensionpoints"/>
   </requires>
   
   <extension id="com.p2r.nutch.filtering.P2RHtmlFilter"
              name="P2R Html Filter"
              point="org.apache.nutch.parse.HtmlParseFilter">
      <implementation id="P2RHtmlFilter"
         class="com.p2r.nutch.filtering.P2RHtmlFilter">
      </implementation>
   </extension>
   

</plugin>

My nutch-site.xml already contains the plugin.includes property which
looks like this:

<property>
  <name>plugin.includes</name>
  <value>nutch-extensionpoints|p2r-plugins|parse-html|protocol-http|
urlfilter-regex|parse-(pdf|js)|index-(basic|anchor)|query-(basic|site|
url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|
regex|basic)</value>
</property>


Does anyone know where I have missed?

Thanks a lot!


On Tue, 2010-07-13 at 09:43 +0100, Julien Nioche wrote:
> Hi Jeff,
> 
> 1) I don't see the implementation id of "JSParseFilter" is used in the
> > parse-plugins.xml file under the $NUTCH_HOME\conf folder. Then how does
> > Nutch knows that this filter function should be called?
> >
> 
> parse-plugins.xml lists Parsers whereas JSParseFilter is a HTMLParseFilter.
> HTMLParseFilters get a DOM representation of the documents from the HTML or
> TikaParser.
> Nutch will automatically load the JSParseFilter along with other
> HTMLParseFilters provided that you list the corresponding plugin in
> plugin.includes
> 
> 
> > 2) I want to replace this filter with my own filter, and I wrote the
> > follow code:
> >
> > <extension id="com.mycompany.nutch.parse.MyParseFilter"
> >              name="Parse JS Filter"
> >              point="org.apache.nutch.parse.HtmlParseFilter">
> >      <implementation id="MyParseFilter"
> >         class="com.mycompany.nutch.parse.MyParseFilter">
> >      </implementation>
> >   </extension>
> >
> > and put it into the plugin.xml file under $NUTCH_HOME\src\myplugin
> > directory. But my filter is never called. Any ideas?
> >
> 
>  Have a look at the wiki page e.g
> http://wiki.apache.org/nutch/WritingPluginExample-0.9
> 
> 
> 
> 


Reply via email to