Hi,

I am Nobin, and I am working on a search engine based on nutch.

I have some questions regarding nutch, and will be very helpful for me
if somebody can answer.

I am working on a plugin(anchor based url filter) where i need to have
anchor text in CrawlDbFilter (nutch 1.2), but after going  through
source, it seems getting anchor in  CrawlDbFilter will not be easy,
because none of parameters in

public void map(Text key, CrawlDatum value,
OutputCollector<Text, CrawlDatum> output,      Reporter reporter)

stores the anchor text,

is there any class through which i can access this anchor text?

2)in nutch 2.0 (nutch base) i think there is a way to get this anchor text in

class GeneratorMapper

public void map(String reversedUrl, WebPage page,  Context context)

through the WebPage class.

But there is a problem, I think this Webpage object is for this url
(reverse of reversedUrl), not it's parent (parent's webpage(page
conatining this outlink),  only parent contain anchor text.

3)what is the use of reprUrl member in WebPage class.

Thanks
Nobin Mathew

Reply via email to