I'm not so familiar with SVN. Is this what you mean?

http://svn.apache.org/repos/asf/nutch/branches/branch-1.5/

Matteo

2012/8/29 Lewis John Mcgibbney <[email protected]>

> In the SVN area can you point me to the protocol plugin please?
>
> http://svn.apache.org/repos/asf/nutch/
>
> Thank you
>
> Lewis
>
> On Wed, Aug 29, 2012 at 3:22 PM, Matteo Simoncini <[email protected]>
> wrote:
> > Sorry, I forgot it.
> >
> > 1.5
> >
> > Matteo
> >
> > 2012/8/29 Lewis John Mcgibbney <[email protected]>:
> >> What version of Nutch is this?
> >>
> >> Lewis
> >>
> >> On Wed, Aug 29, 2012 at 9:58 AM, xpow <[email protected]> wrote:
> >>> Hello,
> >>>
> >>> I've tried to use the protocol-smb plugin with nutch. The nutch read
> and
> >>> parsed the documents correctly, but afterward, when it hit the crawldb,
> >>> crawl.CrawlDbReducer, i got a lot of 'crawl.CrawlDbReducer - Missing
> fetch
> >>> and old value, signature=[B@34d0cdd0', which causing no documents get
> >>> indexed with solr ...
> >>>
> >>> Can anyone help me to pinpoint what was going on??
> >>>
> >>> Thanks
> >>>
> >>> Here's the log file:
> >>> 2012-08-29 13:54:52,641 INFO  parse.ParseSegment - Parsing:
> >>> smb://192.168.3.6/share/putusan/putusan_sidang_PUTUSAN 48-2011 TELAH
> >>> baca.pdf
> >>> 2012-08-29 13:54:53,576 INFO  parse.ParseSegment - Parsing:
> >>> smb://192.168.3.6/share/putusan/putusan_sidang_Putusan 55
> PUU-2010-TELAH
> >>> BACA.pdf
> >>> 2012-08-29 13:54:53,612 INFO  parse.ParseSegment - Parsing:
> >>> smb://192.168.3.6/share/putusan/putusan_sidang_Putusan Sela 108 PHPU
> >>> 2011.pdf
> >>> 2012-08-29 13:54:53,930 INFO  regex.RegexURLNormalizer - can't find
> rules
> >>> for scope 'outlink', using default
> >>> 2012-08-29 13:54:55,087 INFO  parse.ParseSegment - ParseSegment:
> finished at
> >>> 2012-08-29 13:54:55, elapsed: 00:00:28
> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: starting
> at
> >>> 2012-08-29 13:54:55
> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: db:
> >>> crawl/crawldb
> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: segments:
> >>> [crawl/segments/20120829134849]
> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: additions
> >>> allowed: true
> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: URL
> >>> normalizing: true
> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: URL
> filtering:
> >>> true
> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: 404
> purging:
> >>> false
> >>> 2012-08-29 13:54:55,104 INFO  crawl.CrawlDb - CrawlDb update: Merging
> >>> segment data into db.
> >>> 2012-08-29 13:54:55,584 INFO  regex.RegexURLNormalizer - can't find
> rules
> >>> for scope 'crawldb', using default
> >>> 2012-08-29 13:54:55,765 INFO  regex.RegexURLNormalizer - can't find
> rules
> >>> for scope 'crawldb', using default
> >>> 2012-08-29 13:54:56,121 INFO  regex.RegexURLNormalizer - can't find
> rules
> >>> for scope 'crawldb', using default
> >>> 2012-08-29 13:54:56,160 INFO  crawl.FetchScheduleFactory - Using
> >>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> >>> 2012-08-29 13:54:56,160 INFO  crawl.AbstractFetchSchedule -
> >>> defaultInterval=2592000
> >>> 2012-08-29 13:54:56,160 INFO  crawl.AbstractFetchSchedule -
> >>> maxInterval=7776000
> >>> 2012-08-29 13:54:56,198 WARN  crawl.CrawlDbReducer - Missing fetch and
> old
> >>> value, signature=[B@34d0cdd0
> >>> 2012-08-29 13:54:56,199 WARN  crawl.CrawlDbReducer - Missing fetch and
> old
> >>> value, signature=[B@78782dc6
> >>> 2012-08-29 13:54:56,199 WARN  crawl.CrawlDbReducer - Missing fetch and
> old
> >>> value, signature=[B@1a055ff4
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-SMB-protocol-tp4003945.html
> >>> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >>
> >>
> >> --
> >> Lewis
>
>
>
> --
> Lewis
>

Reply via email to