Sorry whrong message. I made a mess.

I apologize.

Matteo

2012/8/29 Matteo Simoncini <[email protected]>

> I'm not so familiar with SVN. Is this what you mean?
>
> http://svn.apache.org/repos/asf/nutch/branches/branch-1.5/
>
> Matteo
>
> 2012/8/29 Lewis John Mcgibbney <[email protected]>
>
>> In the SVN area can you point me to the protocol plugin please?
>>
>> http://svn.apache.org/repos/asf/nutch/
>>
>> Thank you
>>
>> Lewis
>>
>> On Wed, Aug 29, 2012 at 3:22 PM, Matteo Simoncini <[email protected]>
>> wrote:
>> > Sorry, I forgot it.
>> >
>> > 1.5
>> >
>> > Matteo
>> >
>> > 2012/8/29 Lewis John Mcgibbney <[email protected]>:
>> >> What version of Nutch is this?
>> >>
>> >> Lewis
>> >>
>> >> On Wed, Aug 29, 2012 at 9:58 AM, xpow <[email protected]> wrote:
>> >>> Hello,
>> >>>
>> >>> I've tried to use the protocol-smb plugin with nutch. The nutch read
>> and
>> >>> parsed the documents correctly, but afterward, when it hit the
>> crawldb,
>> >>> crawl.CrawlDbReducer, i got a lot of 'crawl.CrawlDbReducer - Missing
>> fetch
>> >>> and old value, signature=[B@34d0cdd0', which causing no documents get
>> >>> indexed with solr ...
>> >>>
>> >>> Can anyone help me to pinpoint what was going on??
>> >>>
>> >>> Thanks
>> >>>
>> >>> Here's the log file:
>> >>> 2012-08-29 13:54:52,641 INFO  parse.ParseSegment - Parsing:
>> >>> smb://192.168.3.6/share/putusan/putusan_sidang_PUTUSAN 48-2011 TELAH
>> >>> baca.pdf
>> >>> 2012-08-29 13:54:53,576 INFO  parse.ParseSegment - Parsing:
>> >>> smb://192.168.3.6/share/putusan/putusan_sidang_Putusan 55
>> PUU-2010-TELAH
>> >>> BACA.pdf
>> >>> 2012-08-29 13:54:53,612 INFO  parse.ParseSegment - Parsing:
>> >>> smb://192.168.3.6/share/putusan/putusan_sidang_Putusan Sela 108 PHPU
>> >>> 2011.pdf
>> >>> 2012-08-29 13:54:53,930 INFO  regex.RegexURLNormalizer - can't find
>> rules
>> >>> for scope 'outlink', using default
>> >>> 2012-08-29 13:54:55,087 INFO  parse.ParseSegment - ParseSegment:
>> finished at
>> >>> 2012-08-29 13:54:55, elapsed: 00:00:28
>> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update:
>> starting at
>> >>> 2012-08-29 13:54:55
>> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: db:
>> >>> crawl/crawldb
>> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update:
>> segments:
>> >>> [crawl/segments/20120829134849]
>> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update:
>> additions
>> >>> allowed: true
>> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: URL
>> >>> normalizing: true
>> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: URL
>> filtering:
>> >>> true
>> >>> 2012-08-29 13:54:55,103 INFO  crawl.CrawlDb - CrawlDb update: 404
>> purging:
>> >>> false
>> >>> 2012-08-29 13:54:55,104 INFO  crawl.CrawlDb - CrawlDb update: Merging
>> >>> segment data into db.
>> >>> 2012-08-29 13:54:55,584 INFO  regex.RegexURLNormalizer - can't find
>> rules
>> >>> for scope 'crawldb', using default
>> >>> 2012-08-29 13:54:55,765 INFO  regex.RegexURLNormalizer - can't find
>> rules
>> >>> for scope 'crawldb', using default
>> >>> 2012-08-29 13:54:56,121 INFO  regex.RegexURLNormalizer - can't find
>> rules
>> >>> for scope 'crawldb', using default
>> >>> 2012-08-29 13:54:56,160 INFO  crawl.FetchScheduleFactory - Using
>> >>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
>> >>> 2012-08-29 13:54:56,160 INFO  crawl.AbstractFetchSchedule -
>> >>> defaultInterval=2592000
>> >>> 2012-08-29 13:54:56,160 INFO  crawl.AbstractFetchSchedule -
>> >>> maxInterval=7776000
>> >>> 2012-08-29 13:54:56,198 WARN  crawl.CrawlDbReducer - Missing fetch
>> and old
>> >>> value, signature=[B@34d0cdd0
>> >>> 2012-08-29 13:54:56,199 WARN  crawl.CrawlDbReducer - Missing fetch
>> and old
>> >>> value, signature=[B@78782dc6
>> >>> 2012-08-29 13:54:56,199 WARN  crawl.CrawlDbReducer - Missing fetch
>> and old
>> >>> value, signature=[B@1a055ff4
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Nutch-SMB-protocol-tp4003945.html
>> >>> Sent from the Nutch - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >>
>> >> --
>> >> Lewis
>>
>>
>>
>> --
>> Lewis
>>
>
>

Reply via email to