On Tue, Mar 02, 2004 at 02:26:09PM -0800, Doug Cutting wrote:
> 
> They have different bugs.  Fetcher.java doesn't observe robots.txt, but 
> it is simple and fast.  RequestScheduler.java & friends (including 
> OutputThread.java) implement robots.txt plus lots of other politeness 
> options, but also frequently hang.  No one has yet fixed this, and the 
> fellow who wrote that code is no longer working on Nutch.  Where we go 
> depends on where contributors take us: we could add robots.txt support 
> to Fetcher.java, or someone could fix the hangs in RequestScheduler.  Or 
> someone could contribute an all new fetcher.
> 
> Until this is resolved we should probably maintain both.

I will create a new patch after trying Fetcher.java.
One other thing (besides Content-Type) I would like to add is
Last-Modified. On many occasions I find it imperative.
What do you think? Can google and others do search by Last-Modified?

John


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to