[jira] [Commented] (NUTCH-1600) Injector overwrite does not always work properly

2013-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699879#comment-13699879
 ] 

Hudson commented on NUTCH-1600:
---

Integrated in Nutch-trunk #2268 (See 
[https://builds.apache.org/job/Nutch-trunk/2268/])
NUTCH-1600 Injector overwrite does not always work properly (Revision 
1499684)

 Result = SUCCESS
markus : http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1499684
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java


> Injector overwrite does not always work properly
> 
>
> Key: NUTCH-1600
> URL: https://issues.apache.org/jira/browse/NUTCH-1600
> Project: Nutch
>  Issue Type: Bug
>  Components: injector
>Affects Versions: 1.7
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.8
>
> Attachments: NUTCH-1600-1.8.patch
>
>
> db.injector.update works as it should but db.injector.overwrite doesn't 
> always seem to properly overwrite the record. This issue exists for some time 
> and we've already fixed it in our dist of Nutch.
> This record just has been updated (interval).
> {code}
> Injector: starting at 2013-07-03 10:34:15
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: seeds
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 9
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2013-07-03 10:34:21, elapsed: 00:00:05
> URL: url
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Fri Jul 05 12:11:44 CEST 2013
> Modified time: Fri Jun 28 12:11:44 CEST 2013
> Retries since fetch: 0
> Retry interval: 604800 seconds (7 days)
> Score: 0.0
> Signature: ba29ef3e680323a6d0da74c156800e03
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> If we now overwrite the record, nothing happens. With this patch installed it 
> overwrites the record as it should and also logs update & overwrite switches 
> to console:
> {code}
> Injector: starting at 2013-07-03 10:36:30
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: seeds
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 9
> Injector: Merging injected urls into crawl db.
> Injector: overwrite: true
> Injector: update: false
> Injector: finished at 2013-07-03 10:36:36, elapsed: 00:00:05
> URL: url
> Version: 7
> Status: 1 (db_unfetched)
> Fetch time: Wed Jul 03 10:36:30 CEST 2013
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 14000 seconds (0 days)
> Score: 1.0
> Signature: null
> Metadata: fixedInterval: 14000.0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1600) Injector overwrite does not always work properly

2013-07-03 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699034#comment-13699034
 ] 

lufeng commented on NUTCH-1600:
---

test work fine. 
+1

> Injector overwrite does not always work properly
> 
>
> Key: NUTCH-1600
> URL: https://issues.apache.org/jira/browse/NUTCH-1600
> Project: Nutch
>  Issue Type: Bug
>  Components: injector
>Affects Versions: 1.7
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.8
>
> Attachments: NUTCH-1600-1.8.patch
>
>
> db.injector.update works as it should but db.injector.overwrite doesn't 
> always seem to properly overwrite the record. This issue exists for some time 
> and we've already fixed it in our dist of Nutch.
> This record just has been updated (interval).
> {code}
> Injector: starting at 2013-07-03 10:34:15
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: seeds
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 9
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2013-07-03 10:34:21, elapsed: 00:00:05
> URL: url
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Fri Jul 05 12:11:44 CEST 2013
> Modified time: Fri Jun 28 12:11:44 CEST 2013
> Retries since fetch: 0
> Retry interval: 604800 seconds (7 days)
> Score: 0.0
> Signature: ba29ef3e680323a6d0da74c156800e03
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> If we now overwrite the record, nothing happens. With this patch installed it 
> overwrites the record as it should and also logs update & overwrite switches 
> to console:
> {code}
> Injector: starting at 2013-07-03 10:36:30
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: seeds
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 9
> Injector: Merging injected urls into crawl db.
> Injector: overwrite: true
> Injector: update: false
> Injector: finished at 2013-07-03 10:36:36, elapsed: 00:00:05
> URL: url
> Version: 7
> Status: 1 (db_unfetched)
> Fetch time: Wed Jul 03 10:36:30 CEST 2013
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 14000 seconds (0 days)
> Score: 1.0
> Signature: null
> Metadata: fixedInterval: 14000.0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira