[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann updated NUTCH-578:
Fix Version/s: (was: 1.1)
- pushing this out per http://bit.ly/c7tBv9
URL fetched
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Serykh Evgeniy updated NUTCH-578:
-
Attachment: NUTCH-578_v4.patch
URL fetched with 403 is generated over and over again
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Serykh Evgeniy updated NUTCH-578:
-
Attachment: (was: NUTCH-578_v4.patch)
URL fetched with 403 is generated over and over again
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Serykh Evgeniy updated NUTCH-578:
-
Attachment: NUTCH-578_v4.patch
URL fetched with 403 is generated over and over again
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Lihachev updated NUTCH-578:
--
Attachment: NUTCH-578_v3.patch
changes in CrawlDbReducer already applied in trunk, so patch
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-578:
Attachment: NUTCH-578.patch
I've got the same error for page with an HTTP status code = 503.
I
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-578:
Attachment: NUTCH-578_v2.patch
Actually i just realised that the setPageRetrySchedule in
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nathaniel Powell updated NUTCH-578:
---
Attachment: nutch-site.xml
For your reference, this is how I have the nutch-site set up.
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nathaniel Powell updated NUTCH-578:
---
Attachment: regex-normalize.xml
Another file I customized.
URL fetched with 403 is
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nathaniel Powell updated NUTCH-578:
---
Attachment: crawl-urlfilter.txt
File I customized for this crawl.
URL fetched with 403 is
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nathaniel Powell updated NUTCH-578:
---
Attachment: (was: nutch-site.xml)
URL fetched with 403 is generated over and over again
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nathaniel Powell updated NUTCH-578:
---
Attachment: nutch-site.xml
This is the nutch-site that I used to run the crawl.
URL
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nathaniel Powell updated NUTCH-578:
---
Attachment: nutch-site.xml
Use wget to download this file. This is the nutch-site.xml that I
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nathaniel Powell updated NUTCH-578:
---
Attachment: (was: nutch-site.xml)
URL fetched with 403 is generated over and over again
14 matches
Mail list logo