Dear nutchers,

I extended the ParseFilter extension point

public Parse filter(String url, WebPage page, Parse parse,
    HTMLMetaTags metaTags, DocumentFragment doc) {

>From what I understand, plugin parse-html should populate the
DocumentFragment doc.

Unfortunately, doc is always null. I tried this with my own plugin, as
well as with the nutch-shipped plugin microformats-reltag, which extends
the same point.

Both plugins are working, and they are called. I attached my debugger,
and both for my own plugin as well as for the reltag-plugin, doc is
always null. 

I checked parse-plugins.xml, yes, parse-html is called and my mime-types
are those which call parse-html
(extension-id="org.apache.nutch.parse.html.HtmlParser").

What am I missing?

Thanks,
Martin

Reply via email to