> > Do you mean running multiple parsers in a single parse action? That is > currently only possible for html types. Take a look at HtmlParseFilter for > that. You can chain multiple parsers for a single url, in addition to > regular html parsing. For other types it's not possible. >
Not only for html docs mind you. The tika parser produces a normalised XHTML representation of the docs which is then passed on to the HTMLParseFilter implementations (I think I renamed the endpoint in Nutchgora some time ago) > > If this is about running a parse implementation on all urls regardless of > mimetype, you have to change the parser mappings in parse-plugins.xml > and the parser's plugin.xml. But again there is only support for running > one Parser on a single document. > > Ferdy. > > On Wed, Mar 7, 2012 at 2:34 PM, [email protected] < > [email protected] > > wrote: > > > Hi > > I've looked at nutch's code in ParseUtil and it seems that it was > designed > > so only one parses is eventually activated on a single url. > > What's the reason for this? > > What should I do if I want, in addition to the existing parsers, add a > > parser that will get a certain field out of the url, an run this > behaivour > > on all the urls? > > Do I have to add this code to all the parsers? > > > > > > thanks. > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Multiple-parsers-tp3806721p3806721.html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

