You can find the anchor text in the LinkDB. On Friday 24 December 2010 14:00:45 Nobin Mathew wrote: > Hi, > > I am Nobin, and I am working on a search engine based on nutch. > > I have some questions regarding nutch, and will be very helpful for me > if somebody can answer. > > I am working on a plugin(anchor based url filter) where i need to have > anchor text in CrawlDbFilter (nutch 1.2), but after going through > source, it seems getting anchor in CrawlDbFilter will not be easy, > because none of parameters in > > public void map(Text key, CrawlDatum value, > OutputCollector<Text, CrawlDatum> output, Reporter reporter) > > stores the anchor text, > > is there any class through which i can access this anchor text? > > 2)in nutch 2.0 (nutch base) i think there is a way to get this anchor text > in > > class GeneratorMapper > > public void map(String reversedUrl, WebPage page, Context context) > > through the WebPage class. > > But there is a problem, I think this Webpage object is for this url > (reverse of reversedUrl), not it's parent (parent's webpage(page > conatining this outlink), only parent contain anchor text. > > 3)what is the use of reprUrl member in WebPage class. > > Thanks > Nobin Mathew
-- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

