What version of Nutch is this? Lewis
On Wed, Aug 29, 2012 at 9:58 AM, xpow <[email protected]> wrote: > Hello, > > I've tried to use the protocol-smb plugin with nutch. The nutch read and > parsed the documents correctly, but afterward, when it hit the crawldb, > crawl.CrawlDbReducer, i got a lot of 'crawl.CrawlDbReducer - Missing fetch > and old value, signature=[B@34d0cdd0', which causing no documents get > indexed with solr ... > > Can anyone help me to pinpoint what was going on?? > > Thanks > > Here's the log file: > 2012-08-29 13:54:52,641 INFO parse.ParseSegment - Parsing: > smb://192.168.3.6/share/putusan/putusan_sidang_PUTUSAN 48-2011 TELAH > baca.pdf > 2012-08-29 13:54:53,576 INFO parse.ParseSegment - Parsing: > smb://192.168.3.6/share/putusan/putusan_sidang_Putusan 55 PUU-2010-TELAH > BACA.pdf > 2012-08-29 13:54:53,612 INFO parse.ParseSegment - Parsing: > smb://192.168.3.6/share/putusan/putusan_sidang_Putusan Sela 108 PHPU > 2011.pdf > 2012-08-29 13:54:53,930 INFO regex.RegexURLNormalizer - can't find rules > for scope 'outlink', using default > 2012-08-29 13:54:55,087 INFO parse.ParseSegment - ParseSegment: finished at > 2012-08-29 13:54:55, elapsed: 00:00:28 > 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: starting at > 2012-08-29 13:54:55 > 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: db: > crawl/crawldb > 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: segments: > [crawl/segments/20120829134849] > 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: additions > allowed: true > 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: URL > normalizing: true > 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: URL filtering: > true > 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: 404 purging: > false > 2012-08-29 13:54:55,104 INFO crawl.CrawlDb - CrawlDb update: Merging > segment data into db. > 2012-08-29 13:54:55,584 INFO regex.RegexURLNormalizer - can't find rules > for scope 'crawldb', using default > 2012-08-29 13:54:55,765 INFO regex.RegexURLNormalizer - can't find rules > for scope 'crawldb', using default > 2012-08-29 13:54:56,121 INFO regex.RegexURLNormalizer - can't find rules > for scope 'crawldb', using default > 2012-08-29 13:54:56,160 INFO crawl.FetchScheduleFactory - Using > FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule > 2012-08-29 13:54:56,160 INFO crawl.AbstractFetchSchedule - > defaultInterval=2592000 > 2012-08-29 13:54:56,160 INFO crawl.AbstractFetchSchedule - > maxInterval=7776000 > 2012-08-29 13:54:56,198 WARN crawl.CrawlDbReducer - Missing fetch and old > value, signature=[B@34d0cdd0 > 2012-08-29 13:54:56,199 WARN crawl.CrawlDbReducer - Missing fetch and old > value, signature=[B@78782dc6 > 2012-08-29 13:54:56,199 WARN crawl.CrawlDbReducer - Missing fetch and old > value, signature=[B@1a055ff4 > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-SMB-protocol-tp4003945.html > Sent from the Nutch - User mailing list archive at Nabble.com. -- Lewis

