Can you point me to the plugin in the appropriate directory? I do not see the plugin in the plugins directory below
http://svn.apache.org/repos/asf/nutch/tags/release-1.5-rc4/src/plugin/ This is the code that was released with the 1.5 distribution Lewis On Wed, Aug 29, 2012 at 3:44 PM, Matteo Simoncini <[email protected]> wrote: > I'm not so familiar with SVN. Is this what you mean? > > http://svn.apache.org/repos/asf/nutch/branches/branch-1.5/ > > Matteo > > 2012/8/29 Lewis John Mcgibbney <[email protected]> > >> In the SVN area can you point me to the protocol plugin please? >> >> http://svn.apache.org/repos/asf/nutch/ >> >> Thank you >> >> Lewis >> >> On Wed, Aug 29, 2012 at 3:22 PM, Matteo Simoncini <[email protected]> >> wrote: >> > Sorry, I forgot it. >> > >> > 1.5 >> > >> > Matteo >> > >> > 2012/8/29 Lewis John Mcgibbney <[email protected]>: >> >> What version of Nutch is this? >> >> >> >> Lewis >> >> >> >> On Wed, Aug 29, 2012 at 9:58 AM, xpow <[email protected]> wrote: >> >>> Hello, >> >>> >> >>> I've tried to use the protocol-smb plugin with nutch. The nutch read >> and >> >>> parsed the documents correctly, but afterward, when it hit the crawldb, >> >>> crawl.CrawlDbReducer, i got a lot of 'crawl.CrawlDbReducer - Missing >> fetch >> >>> and old value, signature=[B@34d0cdd0', which causing no documents get >> >>> indexed with solr ... >> >>> >> >>> Can anyone help me to pinpoint what was going on?? >> >>> >> >>> Thanks >> >>> >> >>> Here's the log file: >> >>> 2012-08-29 13:54:52,641 INFO parse.ParseSegment - Parsing: >> >>> smb://192.168.3.6/share/putusan/putusan_sidang_PUTUSAN 48-2011 TELAH >> >>> baca.pdf >> >>> 2012-08-29 13:54:53,576 INFO parse.ParseSegment - Parsing: >> >>> smb://192.168.3.6/share/putusan/putusan_sidang_Putusan 55 >> PUU-2010-TELAH >> >>> BACA.pdf >> >>> 2012-08-29 13:54:53,612 INFO parse.ParseSegment - Parsing: >> >>> smb://192.168.3.6/share/putusan/putusan_sidang_Putusan Sela 108 PHPU >> >>> 2011.pdf >> >>> 2012-08-29 13:54:53,930 INFO regex.RegexURLNormalizer - can't find >> rules >> >>> for scope 'outlink', using default >> >>> 2012-08-29 13:54:55,087 INFO parse.ParseSegment - ParseSegment: >> finished at >> >>> 2012-08-29 13:54:55, elapsed: 00:00:28 >> >>> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: starting >> at >> >>> 2012-08-29 13:54:55 >> >>> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: db: >> >>> crawl/crawldb >> >>> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: segments: >> >>> [crawl/segments/20120829134849] >> >>> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: additions >> >>> allowed: true >> >>> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: URL >> >>> normalizing: true >> >>> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: URL >> filtering: >> >>> true >> >>> 2012-08-29 13:54:55,103 INFO crawl.CrawlDb - CrawlDb update: 404 >> purging: >> >>> false >> >>> 2012-08-29 13:54:55,104 INFO crawl.CrawlDb - CrawlDb update: Merging >> >>> segment data into db. >> >>> 2012-08-29 13:54:55,584 INFO regex.RegexURLNormalizer - can't find >> rules >> >>> for scope 'crawldb', using default >> >>> 2012-08-29 13:54:55,765 INFO regex.RegexURLNormalizer - can't find >> rules >> >>> for scope 'crawldb', using default >> >>> 2012-08-29 13:54:56,121 INFO regex.RegexURLNormalizer - can't find >> rules >> >>> for scope 'crawldb', using default >> >>> 2012-08-29 13:54:56,160 INFO crawl.FetchScheduleFactory - Using >> >>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule >> >>> 2012-08-29 13:54:56,160 INFO crawl.AbstractFetchSchedule - >> >>> defaultInterval=2592000 >> >>> 2012-08-29 13:54:56,160 INFO crawl.AbstractFetchSchedule - >> >>> maxInterval=7776000 >> >>> 2012-08-29 13:54:56,198 WARN crawl.CrawlDbReducer - Missing fetch and >> old >> >>> value, signature=[B@34d0cdd0 >> >>> 2012-08-29 13:54:56,199 WARN crawl.CrawlDbReducer - Missing fetch and >> old >> >>> value, signature=[B@78782dc6 >> >>> 2012-08-29 13:54:56,199 WARN crawl.CrawlDbReducer - Missing fetch and >> old >> >>> value, signature=[B@1a055ff4 >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> View this message in context: >> http://lucene.472066.n3.nabble.com/Nutch-SMB-protocol-tp4003945.html >> >>> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> >> >> >> >> >> >> -- >> >> Lewis >> >> >> >> -- >> Lewis >> -- Lewis

