Re: Fetcher for constrained crawls

2005-08-22 Thread Kelvin Tan
Sorry, realized I needed to qualify: plugin framework is nice, but I mean customizing non-extension point fetcher behaviour. k On Tue, 23 Aug 2005 00:02:26 -0400, Kelvin Tan wrote: > One of the areas the Nutch Crawler can use with improvement is in > the fact that its really difficult to extend

Fetcher for constrained crawls

2005-08-22 Thread Kelvin Tan
I've been working on some changes to crawling to facilitate its use as a non-whole-web crawler, and would like to gauge interest on this list about including it somewhere in the Nutch repo, hopefully before the map-red brance gets merged in. It is basically a partial re-write of the whole fetch

Re: Searchable mailing lists on nutch.org?

2005-08-22 Thread Will (sent by Nabble.com)
Check out this searchable archive: http://www.nabble.com/Nutch-f362.html hosted by Nabble - it archives all Nutch mailing lists into a single searchable forum - you can cross search all lists or drill down and search. the Nabble people uses lucene for search. -- Sent from the Nutch - Dev forum

Re: Mapred/0.7

2005-08-22 Thread Zaheed Haque
Doug: Thanks for the update and clarification. It surely helps us in which areas we can contribute. > Long term, Nutch is what we make it. Developers needs drive the > project, not a master plan. > I couldn't agree more. -- Best Regards Zaheed Haque

Re: Mapred/0.7

2005-08-22 Thread Doug Cutting
Zaheed Haque wrote: 1. How do you see the 0.7 version evolving beside maintenance update? Will it have a life of its own? I mean 0.7 is very good for intranet use or mid-size public site. Why would you want to use mapred version when you don't need it? (Maybe I don't know enough :-) Using MapRe

Mapred/0.7

2005-08-22 Thread Zaheed Haque
Hello all: I came across the following while browsing the mailing list archive. http://marc.theaimsgroup.com/?l=nutch-developers&m=111228583625203&w=4 I am interested to know about the current status on tools and cleanup. I am not very tech savvy to read through the code and understand these mys

Extracted Data Manipulation - org.apache.nutch.io, MapRed?

2005-08-22 Thread Fuad Efendi
Hello, I am going to perform some manipulations on extracted text presented as array of strings, I need some advice. Need to retrieve Strings, store it (some Strings can be repeated in a file few times), sort, calculate statistics, store sorted subset in another file, etc. Which class is better de

Re: crawl-urlfilter.txt mechanics

2005-08-22 Thread Piotr Kosiorowski
crawl-urlfilter.txt is "bin/nutch crawl" specific. If you want to use each step separatelly - you ar ein fact doing "Whole Web crawling" from tutorial - so you need to modify regex-urlfilter.txt instead. Regards Piotr On 8/22/05, Michael Ji <[EMAIL PROTECTED]> wrote: > > Hi, > > When I use intra