Sorry, realized I needed to qualify: plugin framework is nice, but I mean
customizing non-extension point fetcher behaviour.
k
On Tue, 23 Aug 2005 00:02:26 -0400, Kelvin Tan wrote:
> One of the areas the Nutch Crawler can use with improvement is in
> the fact that its really difficult to extend
I've been working on some changes to crawling to facilitate its use as a
non-whole-web crawler, and would like to gauge interest on this list about
including it somewhere in the Nutch repo, hopefully before the map-red brance
gets merged in.
It is basically a partial re-write of the whole fetch
Check out this searchable archive: http://www.nabble.com/Nutch-f362.html hosted
by Nabble - it archives all Nutch mailing lists into a single searchable forum
- you can cross search all lists or drill down and search. the Nabble people
uses lucene for search.
--
Sent from the Nutch - Dev forum
Doug:
Thanks for the update and clarification. It surely helps us in which
areas we can contribute.
> Long term, Nutch is what we make it. Developers needs drive the
> project, not a master plan.
>
I couldn't agree more.
--
Best Regards
Zaheed Haque
Zaheed Haque wrote:
1. How do you see the 0.7 version evolving beside maintenance update?
Will it have a life of its own? I mean 0.7 is very good for intranet
use or mid-size public site. Why would you want to use mapred version
when you don't need it? (Maybe I don't know enough :-)
Using MapRe
Hello all:
I came across the following while browsing the mailing list archive.
http://marc.theaimsgroup.com/?l=nutch-developers&m=111228583625203&w=4
I am interested to know about the current status on tools and cleanup.
I am not very tech savvy to read through the code and understand these
mys
Hello,
I am going to perform some manipulations on extracted text presented as
array of strings, I need some advice. Need to retrieve Strings, store it
(some Strings can be repeated in a file few times), sort, calculate
statistics, store sorted subset in another file, etc.
Which class is better de
crawl-urlfilter.txt is "bin/nutch crawl" specific. If you want to use
each step separatelly - you ar ein fact doing "Whole Web crawling"
from tutorial - so you need to modify regex-urlfilter.txt instead.
Regards
Piotr
On 8/22/05, Michael Ji <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> When I use intra