Todd Lipcon wrote:
The issue I see with decreasing max crawl delay is that it essentially
blacklists those hosts. Even if I can only crawl these hosts 1/10th as fast,
I'd still like to have them in my index. I guess this is where the hostdb
will help once that jira is implemented, so this kind
One way is you can try to enable debug logging in log4j so you can see the
headers that httpclient is passing back and forth to the webserver.
On Thu, Dec 11, 2008 at 10:29 AM, George Herlin [EMAIL PROTECTED] wrote:
I have read that if one sets the plugin.includes property to use
You can print the request headers to verify cookies.
I have seen the source code.You can add some codes in the file
src/plugin/progocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/httpresponse.java.
在 2008-12-11四的 16:29 +0100,George Herlin写道:
I have read that if one sets the