Dennis Kubes wrote: > I am currently implementing a patch for the older 0.8 code that allows > pages with crawl delay > x seconds to be ignored where the number of > seconds is configurable. What do you think the best way to return > from the HttpBase would be? Would it be to throw an HttpException or > return a ProtocolStatus with say GONE or something like that?
In the latest patch in NUTCH-339 I added a ProtocolStatus.WOULDBLOCK, and a section in Fetcher2 which is supposed to handle that - although after I removed the block/unblockAddr from lib-http there is no code in that patch that uses this status code. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
