[jira] [Updated] (NUTCH-578) URL fetched with 403 is generated over and over again
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-578: Priority: Critical (was: Major) URL fetched with 403 is generated over and over again - Key: NUTCH-578 URL: https://issues.apache.org/jira/browse/NUTCH-578 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.0.0 Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I have checked out the most recent version of the trunk as of Nov 20, 2007 Reporter: Nathaniel Powell Assignee: Markus Jelsma Priority: Critical Fix For: 1.9 Attachments: NUTCH-578.patch, NUTCH-578_v2.patch, NUTCH-578_v3.patch, NUTCH-578_v4.patch, NUTCH-578_v5.patch, crawl-urlfilter.txt, nutch-site.xml, regex-normalize.xml, urls.txt I have not changed the following parameter in the nutch-default.xml: property namedb.fetch.retry.max/name value3/value descriptionThe maximum number of times a url that has encountered recoverable errors is generated for fetch./description /property However, there is a URL which is on the site that I'm crawling, www.teachertube.com, which keeps being generated over and over again for almost every segment (many more times than 3): fetch of http://www.teachertube.com/images/ failed with: Http code=403, url=http://www.teachertube.com/images/ This is a bug, right? Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-578) URL fetched with 403 is generated over and over again
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-578: -- Attachment: NUTCH-578_v5.patch URL fetched with 403 is generated over and over again - Key: NUTCH-578 URL: https://issues.apache.org/jira/browse/NUTCH-578 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.0.0 Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I have checked out the most recent version of the trunk as of Nov 20, 2007 Reporter: Nathaniel Powell Assignee: Markus Jelsma Fix For: 1.6 Attachments: crawl-urlfilter.txt, NUTCH-578.patch, NUTCH-578_v2.patch, NUTCH-578_v3.patch, NUTCH-578_v4.patch, NUTCH-578_v5.patch, nutch-site.xml, regex-normalize.xml, urls.txt I have not changed the following parameter in the nutch-default.xml: property namedb.fetch.retry.max/name value3/value descriptionThe maximum number of times a url that has encountered recoverable errors is generated for fetch./description /property However, there is a URL which is on the site that I'm crawling, www.teachertube.com, which keeps being generated over and over again for almost every segment (many more times than 3): fetch of http://www.teachertube.com/images/ failed with: Http code=403, url=http://www.teachertube.com/images/ This is a bug, right? Thanks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-578) URL fetched with 403 is generated over and over again
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-578: Fix Version/s: (was: 1.5) 1.6 URL fetched with 403 is generated over and over again - Key: NUTCH-578 URL: https://issues.apache.org/jira/browse/NUTCH-578 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.0.0 Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I have checked out the most recent version of the trunk as of Nov 20, 2007 Reporter: Nathaniel Powell Assignee: Markus Jelsma Fix For: 1.6 Attachments: NUTCH-578.patch, NUTCH-578_v2.patch, NUTCH-578_v3.patch, NUTCH-578_v4.patch, crawl-urlfilter.txt, nutch-site.xml, regex-normalize.xml, urls.txt I have not changed the following parameter in the nutch-default.xml: property namedb.fetch.retry.max/name value3/value descriptionThe maximum number of times a url that has encountered recoverable errors is generated for fetch./description /property However, there is a URL which is on the site that I'm crawling, www.teachertube.com, which keeps being generated over and over again for almost every segment (many more times than 3): fetch of http://www.teachertube.com/images/ failed with: Http code=403, url=http://www.teachertube.com/images/ This is a bug, right? Thanks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-578) URL fetched with 403 is generated over and over again
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-578: Fix Version/s: (was: 1.4) (was: 2.0) 1.5 URL fetched with 403 is generated over and over again - Key: NUTCH-578 URL: https://issues.apache.org/jira/browse/NUTCH-578 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.0.0 Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I have checked out the most recent version of the trunk as of Nov 20, 2007 Reporter: Nathaniel Powell Assignee: Markus Jelsma Fix For: 1.5 Attachments: NUTCH-578.patch, NUTCH-578_v2.patch, NUTCH-578_v3.patch, NUTCH-578_v4.patch, crawl-urlfilter.txt, nutch-site.xml, regex-normalize.xml, urls.txt I have not changed the following parameter in the nutch-default.xml: property namedb.fetch.retry.max/name value3/value descriptionThe maximum number of times a url that has encountered recoverable errors is generated for fetch./description /property However, there is a URL which is on the site that I'm crawling, www.teachertube.com, which keeps being generated over and over again for almost every segment (many more times than 3): fetch of http://www.teachertube.com/images/ failed with: Http code=403, url=http://www.teachertube.com/images/ This is a bug, right? Thanks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira