Hi everyone!

I'm changing all the CrawlDatum and generating a new CrawlDB from a Hadoop
Job that I created. Reading it with a "SequenceFile.Reader" I can see that
everything is there and OK, including the metadata.

My problem is when I put this CrawlDB to be Indexed using nutch's Indexer.
The CrawlDatum that I get from the IndexFilter is with the previous metadata
and I discovered that it is (or might be) from segments/crawl_fetch (it was
the only file from there that I saw the previous metadata so I think the
problem is that de metadata from CrawlDatum is from the segments). Then, I
created new segments using another Hadoop job with the generated CrawlDB to
solve this problem but it didn't solve due when I try to generate the Index
the metadata is null now.

Do you have any other idea about how to do that? Or Do you know what I'm
doing wrong?

Best regards,
Luan Cestari
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Metadata-is-from-segments-crawl-fetch-How-to-change-it-tp1860428p1860428.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to