may your page size is bigger than the setuped limit. See conf/nutch-
*.xml
Am 05.02.2006 um 18:39 schrieb Raghavendra Prabhu:
Hi Stefan
One more thing which i am seeing is some outlinks are not parsed
properly.
I tried using both the html parser (neko and tagsoup)
I know that this may not be due to protocol-http but is there a
chance that
this may be also due to same reason ?
Thanks for the answer .
Rgds
Prabhu
On 2/5/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
I personal prefer protocol-http.
Am 05.02.2006 um 18:26 schrieb Raghavendra Prabhu:
Hi Stefan
My bandwidth is limited .
But i am able to crawl other links with the same host (so he is not
denying
i guess)
Is it because of the protocol-httpclient(shud i use protocol-http)
Rgds
Prabhu
On 2/5/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
Is the host in your web-browser available?
Does this host block your ip, since he understand nutch as a DOS
attack?
Is you bandwidth limited?
Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu:
Hi
I am running a crawl using protocol-httpclient
I get a
java.io.IOException: java.net.SocketTimeoutException: Read
timed out
Can someone tell me the reason why i get the error
After that the crawl hangs and is simply in the same state
Rgds
Prabhu
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general