Hi Liaokz,
> After debugging, I could confirm that in CrawlDbReducer.java, Nutch really
> return the latest CrawlDatum(at the line of "output.collect(key, result);"
> the member "result" has the latest data). I suppose the latest CrawlDatum
> is wrtten to CrawlDB. Isn't it right?
No, or only parti
Hello all
I have written a plugin which implements the IndexingFilter interface. The
plugin extracts some custom metadata fields from the CrawlDatum and adds
them to NutchDocument to make the fields be indexed by Solr.
To make debug easy, I config Nutch to crawl my local website, so the
crawled r
2 matches
Mail list logo