[
https://issues.apache.org/jira/browse/NUTCH-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699879#comment-13699879
]
Hudson commented on NUTCH-1600:
---
Integrated in Nutch-trunk #2268 (See
[https://builds.apache.org/job/Nutch-trunk/2268/])
NUTCH-1600 Injector overwrite does not always work properly (Revision
1499684)
Result = SUCCESS
markus : http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1499684
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java
> Injector overwrite does not always work properly
>
>
> Key: NUTCH-1600
> URL: https://issues.apache.org/jira/browse/NUTCH-1600
> Project: Nutch
> Issue Type: Bug
> Components: injector
>Affects Versions: 1.7
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.8
>
> Attachments: NUTCH-1600-1.8.patch
>
>
> db.injector.update works as it should but db.injector.overwrite doesn't
> always seem to properly overwrite the record. This issue exists for some time
> and we've already fixed it in our dist of Nutch.
> This record just has been updated (interval).
> {code}
> Injector: starting at 2013-07-03 10:34:15
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: seeds
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 9
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2013-07-03 10:34:21, elapsed: 00:00:05
> URL: url
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Fri Jul 05 12:11:44 CEST 2013
> Modified time: Fri Jun 28 12:11:44 CEST 2013
> Retries since fetch: 0
> Retry interval: 604800 seconds (7 days)
> Score: 0.0
> Signature: ba29ef3e680323a6d0da74c156800e03
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> If we now overwrite the record, nothing happens. With this patch installed it
> overwrites the record as it should and also logs update & overwrite switches
> to console:
> {code}
> Injector: starting at 2013-07-03 10:36:30
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: seeds
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 9
> Injector: Merging injected urls into crawl db.
> Injector: overwrite: true
> Injector: update: false
> Injector: finished at 2013-07-03 10:36:36, elapsed: 00:00:05
> URL: url
> Version: 7
> Status: 1 (db_unfetched)
> Fetch time: Wed Jul 03 10:36:30 CEST 2013
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 14000 seconds (0 days)
> Score: 1.0
> Signature: null
> Metadata: fixedInterval: 14000.0
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira