[
https://issues.apache.org/jira/browse/NUTCH-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861420#comment-13861420
]
Markus Jelsma commented on NUTCH-1693:
--
+1, but this should also be ported to trunk.
[
https://issues.apache.org/jira/browse/NUTCH-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1693:
-
Fix Version/s: 1.8
Assignee: Markus Jelsma
TextMD5Signatue compute on textual content
[
https://issues.apache.org/jira/browse/NUTCH-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1693:
-
Attachment: NUTCH-1693-trunk.patch
Patch for trunk. This patch works identical to the original
[
https://issues.apache.org/jira/browse/NUTCH-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861434#comment-13861434
]
Markus Jelsma commented on NUTCH-1693:
--
In any case, i think both 2x and 1x should
Hi - Are they exact duplicates? If you inject http://nutch.apache.org/ a
thousand times, it is added only once, and crawled only once, until it is
scheduled to crawl again.
-Original message-
From: Bin Wangbinwang...@gmail.com
Sent: Thursday 2nd January 2014 23:13
To:
[
https://issues.apache.org/jira/browse/NUTCH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861452#comment-13861452
]
Markus Jelsma commented on NUTCH-356:
-
Thanks, i have pushed it to our production
[
https://issues.apache.org/jira/browse/NUTCH-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861468#comment-13861468
]
Markus Jelsma commented on NUTCH-1691:
--
To test whether -D override works you have to
[
https://issues.apache.org/jira/browse/NUTCH-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861471#comment-13861471
]
Markus Jelsma commented on NUTCH-1647:
--
Hmm http.redirect.max already works on the
[
https://issues.apache.org/jira/browse/NUTCH-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861502#comment-13861502
]
lufeng commented on NUTCH-1691:
---
like urlfilter-prefix plugin, we can move WARN code to
Hi,
I tried to modify the code here to parse the nutch content data...
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java?view=markup
And in the end of this email is a prototype that I have written to run map
reduce to calculate the HTML content length of
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862237#comment-13862237
]
Tejas Patil commented on NUTCH-1465:
Hi [~wastl-nagel],
Yes. I think that it should be
11 matches
Mail list logo