Hi, 1. this is your parser implementation, can be the tika-parser plugin; 2. in ParseSegment and also in Fetcher for parsing enabled fetchers (not default).
Cheers, On Friday 06 April 2012 15:05:06 amoum wrote: > Dear all, > > > I am trying to use Nutch as part of a Focused Crawler. > In order to create this fc I need to find in which classes (source code of > course) : > (1) the actual parsing is done (A.HREF) and add set some conditional > statements (need to check for example surrounding text) > (2) the urls are added in the queue > > I would appreciate any help in this matter. > > > Best, > Anastasia > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Class-in-the-code-that-handles-parsing- > of-html-files-and-selection-of-URLs-tp3890250p3890250.html Sent from the > Nutch - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex

