Sorry, I forgot it. 1.5
Matteo 2012/8/29 Lewis John Mcgibbney <[email protected]>: > What version of Nutch is this? > > Lewis > > On Wed, Aug 29, 2012 at 9:58 AM, xpow <[email protected]> wrote: >> Hello, >> >> I've tried to use the protocol-smb plugin with nutch. The nutch read and >> parsed the documents correctly, but afterward, when it hit the crawldb, >> crawl.CrawlDbReducer, i got a lot of 'crawl.CrawlDbReducer - Missing fetch >> and old value, signature=[B@34d0cdd0', which causing no documents get >> indexed with solr ... >> >> Can anyone help me to pinpoint what was going on?? >> >> Thanks >> >> Here's the log file: >> 2012-08-29 13:54:52,641 INFO parse.ParseSegment - Parsing: >> smb://192.168.3.6/share/putusan/putusan_sidang_PUTUSAN 48-2011 TELAH >> baca.pdf >> 2012-08-29 13:54:53,576 INFO parse.ParseSegment - Parsing: >> smb://192.168.3.6/share/putusan/putusan_sidang_Putusan 55 PUU-2010-TELAH >> BACA.pdf >> 2012-08-29 13:54:53,612 INFO parse.ParseSegment - Parsing: >> smb://192.168.3.6/share/putusan/putusan_sidang_Putusan Sela 108 PHPU >> 2011.pdf >> 2012-08-29 13:54:53,930 INFO regex.RegexURLNormalizer - can't find rules >> for scope 'outlink', using default >> 2012-08-29 13:54:55,087 INFO parse.ParseSegment - ParseSegment: finished at >> 2012-08-29 13:54:55, elapsed: 00:00:28 >> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: starting at >> 2012-08-29 13:54:55 >> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: db: >> crawl/crawldb >> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: segments: >> [crawl/segments/20120829134849] >> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: additions >> allowed: true >> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: URL >> normalizing: true >> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: URL filtering: >> true >> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: 404 purging: >> false >> 2012-08-29 13:54:55,104 INFO crawl.CrawlDb - CrawlDb update: Merging >> segment data into db. >> 2012-08-29 13:54:55,584 INFO regex.RegexURLNormalizer - can't find rules >> for scope 'crawldb', using default >> 2012-08-29 13:54:55,765 INFO regex.RegexURLNormalizer - can't find rules >> for scope 'crawldb', using default >> 2012-08-29 13:54:56,121 INFO regex.RegexURLNormalizer - can't find rules >> for scope 'crawldb', using default >> 2012-08-29 13:54:56,160 INFO crawl.FetchScheduleFactory - Using >> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule >> 2012-08-29 13:54:56,160 INFO crawl.AbstractFetchSchedule - >> defaultInterval=2592000 >> 2012-08-29 13:54:56,160 INFO crawl.AbstractFetchSchedule - >> maxInterval=7776000 >> 2012-08-29 13:54:56,198 WARN crawl.CrawlDbReducer - Missing fetch and old >> value, signature=[B@34d0cdd0 >> 2012-08-29 13:54:56,199 WARN crawl.CrawlDbReducer - Missing fetch and old >> value, signature=[B@78782dc6 >> 2012-08-29 13:54:56,199 WARN crawl.CrawlDbReducer - Missing fetch and old >> value, signature=[B@1a055ff4 >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Nutch-SMB-protocol-tp4003945.html >> Sent from the Nutch - User mailing list archive at Nabble.com. > > > > -- > Lewis

