Hi everyone! I'm changing all the CrawlDatum and generating a new CrawlDB from a Hadoop Job that I created. Reading it with a "SequenceFile.Reader" I can see that everything is there and OK, including the metadata.
My problem is when I put this CrawlDB to be Indexed using nutch's Indexer. The CrawlDatum that I get from the IndexFilter is with the previous metadata and I discovered that it is (or might be) from segments/crawl_fetch (it was the only file from there that I saw the previous metadata so I think the problem is that de metadata from CrawlDatum is from the segments). Then, I created new segments using another Hadoop job with the generated CrawlDB to solve this problem but it didn't solve due when I try to generate the Index the metadata is null now. Do you have any other idea about how to do that? Or Do you know what I'm doing wrong? Best regards, Luan Cestari -- View this message in context: http://lucene.472066.n3.nabble.com/Metadata-is-from-segments-crawl-fetch-How-to-change-it-tp1860428p1860428.html Sent from the Nutch - User mailing list archive at Nabble.com.

