Thanks for correcting me on this one. The endpoint is org.apache.nutch.parse.ParseFilter in Nutchgora.
On Thu, Mar 8, 2012 at 10:33 AM, Julien Nioche < [email protected]> wrote: > > > > Do you mean running multiple parsers in a single parse action? That is > > currently only possible for html types. Take a look at HtmlParseFilter > for > > that. You can chain multiple parsers for a single url, in addition to > > regular html parsing. For other types it's not possible. > > > > Not only for html docs mind you. The tika parser produces a normalised > XHTML representation of the docs which is then passed on to the > HTMLParseFilter implementations (I think I renamed the endpoint in > Nutchgora some time ago) > > > > > > > If this is about running a parse implementation on all urls regardless of > > mimetype, you have to change the parser mappings in parse-plugins.xml > > and the parser's plugin.xml. But again there is only support for running > > one Parser on a single document. > > > > Ferdy. > > > > On Wed, Mar 7, 2012 at 2:34 PM, [email protected] < > > [email protected] > > > wrote: > > > > > Hi > > > I've looked at nutch's code in ParseUtil and it seems that it was > > designed > > > so only one parses is eventually activated on a single url. > > > What's the reason for this? > > > What should I do if I want, in addition to the existing parsers, add a > > > parser that will get a certain field out of the url, an run this > > behaivour > > > on all the urls? > > > Do I have to add this code to all the parsers? > > > > > > > > > thanks. > > > > > > > > > -- > > > View this message in context: > > > > > > http://lucene.472066.n3.nabble.com/Multiple-parsers-tp3806721p3806721.html > > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

