[
https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925308#action_12925308
]
Markus Jelsma commented on NUTCH-824:
-
You're correct, no patch has been submitted and
[
https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-824:
Affects Version/s: 2.0
1.3
1.2
Fix Version/s:
Hi,
i have problem with the option If-Modified-Since with Nutch.
I want crawl on a web syte every day, so i have in nutch-site.html the
right setting of property db.fetch.interval.default.
But i want to limit Nutch to fetch only page that changed using the
If-Modified-Since header.
I found some
[
https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925318#action_12925318
]
Markus Jelsma commented on NUTCH-901:
-
Applied patch and added Mattmann's test to
[
https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-900:
Attachment: NUTCH-900-1.3.patch
This patch is for branch-1.3 and fixes a typo in http.content.limit
Hi Xiao,
FWIR there is adaptive refetch interval support in Nutch currently -
or are you looking for something different?
Regards,
-- Ken
On Oct 27, 2010, at 1:42am, xiao yang wrote:
I want to modify the schedule of crawler to make it more real-time.
Some web pages are frequently
[
https://issues.apache.org/jira/browse/NUTCH-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925543#action_12925543
]
Andrzej Bialecki commented on NUTCH-926:
-
bq. Nutch continues to crawl the WRONG
Sub pages are not getting crawled
-
Key: NUTCH-927
URL: https://issues.apache.org/jira/browse/NUTCH-927
Project: Nutch
Issue Type: Bug
Components: injector
Affects Versions: 2.0
See https://hudson.apache.org/hudson/job/Nutch-trunk/1289/
--
[...truncated 925 lines...]
A
src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/HTMLMetaProcessor.java
A
Segmentation
Key: NUTCH-928
URL: https://issues.apache.org/jira/browse/NUTCH-928
Project: Nutch
Issue Type: Bug
Components: injector
Affects Versions: 2.0
Reporter: Rameez Raja
I need to create
[
https://issues.apache.org/jira/browse/NUTCH-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rameez Raja updated NUTCH-928:
--
Description:
Is there any configuration needed to create segments for each URL rather than
for each
11 matches
Mail list logo