[ http://issues.apache.org/jira/browse/NUTCH-339?page=all ]
Andrzej Bialecki updated NUTCH-339:
Attachment: patch4-fixed.txt
Sorry, the patch was incomplete - please try patch4-fixed.txt instead.
Refactor nutch to allow fetcher improvements
I agree with you that documentation is vital not the just extending the
current version but also for any plugins and patches created. I have been
spending almost two weeks trying to adapt nutch to my project but I spend
more time in reading code and trying to understand what they do before I can
did you erver browse this: http://wiki.media-style.com/display/
nutchDocu/Home
Nothing big, but it will give you some ideas, also about plugins.
On 25.11.2006, at 06:32, Armel T. Nene wrote:
I agree with you that documentation is vital not the just extending
the
current version but also for
[
http://issues.apache.org/jira/browse/NUTCH-408?page=comments#action_12452610 ]
nutch.newbie commented on NUTCH-408:
Yes, I have gone through the media style documentation and it is a good start.
and there are also some very good
[ http://issues.apache.org/jira/browse/NUTCH-409?page=all ]
Doug Cook updated NUTCH-409:
Attachment: shortcircuit.patch
Add short circuit notion to filters to speedup mixed site/subsite crawling
Done. See http://issues.apache.org/jira/browse/NUTCH-409
This is my first Nutch contribution, so hopefully I've got it right ;-) Any
suggestions/questions/feedback welcome.
Hope this is useful to others.
D
scott green wrote:
Hi Doug,
Your idea about PrefixURLFilter and
[
http://issues.apache.org/jira/browse/NUTCH-409?page=comments#action_12452617 ]
Doug Cook commented on NUTCH-409:
-
I should also note that this approach is still not optimal (though it is faster
for my usage pattern). I'm still running the