Hi Martin,
I am struggling to understand how the DocumentFragment (populated either by
private methods parseTagSoup or parseNeko depending on your config in
nutch-site.xml) is null!
What you don't mention is some problem you are having?
I can't DEBUG the code tonight but I am interested to see what is up here.
Lewis

On Thursday, May 23, 2013, Martin Aesch <[email protected]> wrote:
> Dear nutchers,
>
> I extended the ParseFilter extension point
>
> public Parse filter(String url, WebPage page, Parse parse,
>     HTMLMetaTags metaTags, DocumentFragment doc) {
>
> From what I understand, plugin parse-html should populate the
> DocumentFragment doc.
>
> Unfortunately, doc is always null. I tried this with my own plugin, as
> well as with the nutch-shipped plugin microformats-reltag, which extends
> the same point.
>
> Both plugins are working, and they are called. I attached my debugger,
> and both for my own plugin as well as for the reltag-plugin, doc is
> always null.
>
> I checked parse-plugins.xml, yes, parse-html is called and my mime-types
> are those which call parse-html
> (extension-id="org.apache.nutch.parse.html.HtmlParser").
>
> What am I missing?
>
> Thanks,
> Martin
>
>

-- 
*Lewis*

Reply via email to