On Tue, Mar 02, 2004 at 02:26:09PM -0800, Doug Cutting wrote: > > They have different bugs. Fetcher.java doesn't observe robots.txt, but > it is simple and fast. RequestScheduler.java & friends (including > OutputThread.java) implement robots.txt plus lots of other politeness > options, but also frequently hang. No one has yet fixed this, and the > fellow who wrote that code is no longer working on Nutch. Where we go > depends on where contributors take us: we could add robots.txt support > to Fetcher.java, or someone could fix the hangs in RequestScheduler. Or > someone could contribute an all new fetcher. > > Until this is resolved we should probably maintain both.
I will create a new patch after trying Fetcher.java. One other thing (besides Content-Type) I would like to add is Last-Modified. On many occasions I find it imperative. What do you think? Can google and others do search by Last-Modified? John ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
