Re: Multiple parsers

Ferdy Galema Thu, 08 Mar 2012 12:30:03 -0800

Thanks for correcting me on this one.

The endpoint is org.apache.nutch.parse.ParseFilter in Nutchgora.


On Thu, Mar 8, 2012 at 10:33 AM, Julien Nioche <
[email protected]> wrote:

> >
> > Do you mean running multiple parsers in a single parse action? That is
> > currently only possible for html types. Take a look at HtmlParseFilter
> for
> > that. You can chain multiple parsers for a single url, in addition to
> > regular html parsing. For other types it's not possible.
> >
>
> Not only for html docs mind you. The tika parser produces a normalised
> XHTML representation of the docs which is then passed on to the
> HTMLParseFilter implementations (I think I renamed the endpoint in
> Nutchgora some time ago)
>
>
>
> >
> > If this is about running a parse implementation on all urls regardless of
> > mimetype, you have to change the parser mappings in parse-plugins.xml
> > and the parser's plugin.xml. But again there is only support for running
> > one Parser on a single document.
> >
> > Ferdy.
> >
> > On Wed, Mar 7, 2012 at 2:34 PM, [email protected] <
> > [email protected]
> > > wrote:
> >
> > > Hi
> > > I've looked at nutch's code in ParseUtil and it seems that it was
> > designed
> > > so only one parses is eventually activated on a single url.
> > > What's the reason for this?
> > > What should I do if I want, in addition to the existing parsers, add a
> > > parser that will get a certain field out of the url, an run this
> > behaivour
> > > on all the urls?
> > > Do I have to add this code to all the parsers?
> > >
> > >
> > > thanks.
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Multiple-parsers-tp3806721p3806721.html
> > > Sent from the Nutch - User mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Re: Multiple parsers

Reply via email to