You can find the anchor text in the LinkDB.

On Friday 24 December 2010 14:00:45 Nobin Mathew wrote:
> Hi,
> 
> I am Nobin, and I am working on a search engine based on nutch.
> 
> I have some questions regarding nutch, and will be very helpful for me
> if somebody can answer.
> 
> I am working on a plugin(anchor based url filter) where i need to have
> anchor text in CrawlDbFilter (nutch 1.2), but after going  through
> source, it seems getting anchor in  CrawlDbFilter will not be easy,
> because none of parameters in
> 
> public void map(Text key, CrawlDatum value,
> OutputCollector<Text, CrawlDatum> output,      Reporter reporter)
> 
> stores the anchor text,
> 
> is there any class through which i can access this anchor text?
> 
> 2)in nutch 2.0 (nutch base) i think there is a way to get this anchor text
> in
> 
> class GeneratorMapper
> 
> public void map(String reversedUrl, WebPage page,  Context context)
> 
> through the WebPage class.
> 
> But there is a problem, I think this Webpage object is for this url
> (reverse of reversedUrl), not it's parent (parent's webpage(page
> conatining this outlink),  only parent contain anchor text.
> 
> 3)what is the use of reprUrl member in WebPage class.
> 
> Thanks
> Nobin Mathew

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to