Hello, I've tried to use the protocol-smb plugin with nutch. The nutch read and parsed the documents correctly, but afterward, when it hit the crawldb, crawl.CrawlDbReducer, i got a lot of 'crawl.CrawlDbReducer - Missing fetch and old value, signature=[B@34d0cdd0', which causing no documents get indexed with solr ...
Can anyone help me to pinpoint what was going on?? Thanks Here's the log file: 2012-08-29 13:54:52,641 INFO parse.ParseSegment - Parsing: smb://192.168.3.6/share/putusan/putusan_sidang_PUTUSAN 48-2011 TELAH baca.pdf 2012-08-29 13:54:53,576 INFO parse.ParseSegment - Parsing: smb://192.168.3.6/share/putusan/putusan_sidang_Putusan 55 PUU-2010-TELAH BACA.pdf 2012-08-29 13:54:53,612 INFO parse.ParseSegment - Parsing: smb://192.168.3.6/share/putusan/putusan_sidang_Putusan Sela 108 PHPU 2011.pdf 2012-08-29 13:54:53,930 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default 2012-08-29 13:54:55,087 INFO parse.ParseSegment - ParseSegment: finished at 2012-08-29 13:54:55, elapsed: 00:00:28 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: starting at 2012-08-29 13:54:55 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: db: crawl/crawldb 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: segments: [crawl/segments/20120829134849] 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: additions allowed: true 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: URL normalizing: true 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: URL filtering: true 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: 404 purging: false 2012-08-29 13:54:55,104 INFO crawl.CrawlDb - CrawlDb update: Merging segment data into db. 2012-08-29 13:54:55,584 INFO regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2012-08-29 13:54:55,765 INFO regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2012-08-29 13:54:56,121 INFO regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default 2012-08-29 13:54:56,160 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2012-08-29 13:54:56,160 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 2012-08-29 13:54:56,160 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 2012-08-29 13:54:56,198 WARN crawl.CrawlDbReducer - Missing fetch and old value, signature=[B@34d0cdd0 2012-08-29 13:54:56,199 WARN crawl.CrawlDbReducer - Missing fetch and old value, signature=[B@78782dc6 2012-08-29 13:54:56,199 WARN crawl.CrawlDbReducer - Missing fetch and old value, signature=[B@1a055ff4 -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-SMB-protocol-tp4003945.html Sent from the Nutch - User mailing list archive at Nabble.com.

