[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again

2008-02-24 Thread Emmanuel Joke (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Joke updated NUTCH-578:


Attachment: NUTCH-578.patch

I've got the same error for page with an HTTP status code = 503.

I found the issue in the CrawlDbReduce class. The fetchtime was not refresh 
correctly according to the DB Status.
My patch fix this issue.

 URL fetched with 403 is generated over and over again
 -

 Key: NUTCH-578
 URL: https://issues.apache.org/jira/browse/NUTCH-578
 Project: Nutch
  Issue Type: Bug
  Components: generator
Affects Versions: 1.0.0
 Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I 
 have checked out the most recent version of the trunk as of Nov 20, 2007
Reporter: Nathaniel Powell
 Fix For: 1.0.0

 Attachments: crawl-urlfilter.txt, NUTCH-578.patch, nutch-site.xml, 
 regex-normalize.xml, urls.txt


 I have not changed the following parameter in the nutch-default.xml:
 property
   namedb.fetch.retry.max/name
   value3/value
   descriptionThe maximum number of times a url that has encountered
   recoverable errors is generated for fetch./description
 /property
 However, there is a URL which is on the site that I'm crawling, 
 www.teachertube.com, which keeps being generated over and over again for 
 almost every segment (many more times than 3):
 fetch of http://www.teachertube.com/images/ failed with: Http code=403, 
 url=http://www.teachertube.com/images/
 This is a bug, right?
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again

2008-02-24 Thread Emmanuel Joke (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Joke updated NUTCH-578:


Attachment: NUTCH-578_v2.patch

Actually i just realised that the setPageRetrySchedule in AbstractSchedule was 
not correctly defined.

This patch fix this issue too.

 URL fetched with 403 is generated over and over again
 -

 Key: NUTCH-578
 URL: https://issues.apache.org/jira/browse/NUTCH-578
 Project: Nutch
  Issue Type: Bug
  Components: generator
Affects Versions: 1.0.0
 Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I 
 have checked out the most recent version of the trunk as of Nov 20, 2007
Reporter: Nathaniel Powell
 Fix For: 1.0.0

 Attachments: crawl-urlfilter.txt, NUTCH-578.patch, 
 NUTCH-578_v2.patch, nutch-site.xml, regex-normalize.xml, urls.txt


 I have not changed the following parameter in the nutch-default.xml:
 property
   namedb.fetch.retry.max/name
   value3/value
   descriptionThe maximum number of times a url that has encountered
   recoverable errors is generated for fetch./description
 /property
 However, there is a URL which is on the site that I'm crawling, 
 www.teachertube.com, which keeps being generated over and over again for 
 almost every segment (many more times than 3):
 fetch of http://www.teachertube.com/images/ failed with: Http code=403, 
 url=http://www.teachertube.com/images/
 This is a bug, right?
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Nutch-trunk #369

2008-02-24 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/369/changes

--
[...truncated 4599 lines...]

copy-generated-lib:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex
 

init:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test
 

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlfilter-suffix
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes
 
[javac] Note: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-suffix/src/java/org/apache/nutch/urlfilter/suffix/SuffixURLFilter.java
  uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

jar:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/urlfilter-suffix.jar
 

deps-test:

deploy:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix
 
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix
 

copy-generated-lib:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix
 

init:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/test
 

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlfilter-validator
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes
 

jar:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/urlfilter-validator.jar
 

deps-test:

deploy:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator
 
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator
 

copy-generated-lib:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator
 

init:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test
 

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-basic
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes
 

jar:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/urlnormalizer-basic.jar
 

deps-test:

deploy:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic
 
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic
 

copy-generated-lib:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic
 

init:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test
 

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-pass
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes
 

jar:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/urlnormalizer-pass.jar
 

deps-test:

deploy:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass
 
 [copy] Copying 1 file to