Hi,

1. this is your parser implementation, can be the tika-parser plugin;
2. in ParseSegment and also in Fetcher for parsing enabled fetchers (not 
default).

Cheers,

On Friday 06 April 2012 15:05:06 amoum wrote:
> Dear all,
> 
> 
> I am trying to use Nutch as part of a Focused Crawler.
> In order to create this fc I need to find in which classes (source code of
> course) :
> (1) the actual parsing is done (A.HREF) and add set some conditional
> statements (need to check for example surrounding text)
> (2) the urls are added in the queue
> 
> I would appreciate any help in this matter.
> 
> 
> Best,
> Anastasia
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Class-in-the-code-that-handles-parsing-
> of-html-files-and-selection-of-URLs-tp3890250p3890250.html Sent from the
> Nutch - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex

Reply via email to