Re: Nutch not passing latest CrawlDatum to IndexingFilter plugin

2013-06-18 Thread Sebastian Nagel
Hi Liaokz, > After debugging, I could confirm that in CrawlDbReducer.java, Nutch really > return the latest CrawlDatum(at the line of "output.collect(key, result);" > the member "result" has the latest data). I suppose the latest CrawlDatum > is wrtten to CrawlDB. Isn't it right? No, or only parti

Nutch not passing latest CrawlDatum to IndexingFilter plugin

2013-06-17 Thread liaokz
Hello all I have written a plugin which implements the IndexingFilter interface. The plugin extracts some custom metadata fields from the CrawlDatum and adds them to NutchDocument to make the fields be indexed by Solr. To make debug easy, I config Nutch to crawl my local website, so the crawled r