>
> Do you mean running multiple parsers in a single parse action? That is
> currently only possible for html types. Take a look at HtmlParseFilter for
> that. You can chain multiple parsers for a single url, in addition to
> regular html parsing. For other types it's not possible.
>

Not only for html docs mind you. The tika parser produces a normalised
XHTML representation of the docs which is then passed on to the
HTMLParseFilter implementations (I think I renamed the endpoint in
Nutchgora some time ago)



>
> If this is about running a parse implementation on all urls regardless of
> mimetype, you have to change the parser mappings in parse-plugins.xml
> and the parser's plugin.xml. But again there is only support for running
> one Parser on a single document.
>
> Ferdy.
>
> On Wed, Mar 7, 2012 at 2:34 PM, [email protected] <
> [email protected]
> > wrote:
>
> > Hi
> > I've looked at nutch's code in ParseUtil and it seems that it was
> designed
> > so only one parses is eventually activated on a single url.
> > What's the reason for this?
> > What should I do if I want, in addition to the existing parsers, add a
> > parser that will get a certain field out of the url, an run this
> behaivour
> > on all the urls?
> > Do I have to add this code to all the parsers?
> >
> >
> > thanks.
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Multiple-parsers-tp3806721p3806721.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to