may your page size is bigger than the setuped limit. See conf/nutch- *.xml

Am 05.02.2006 um 18:39 schrieb Raghavendra Prabhu:

Hi Stefan

One more thing which i am seeing is some outlinks are not parsed properly.

I tried using both the html parser (neko and tagsoup)

I know that this may not be due to protocol-http but is there a chance that
this may be also due to same reason ?

Thanks for the answer .

Rgds
Prabhu


On 2/5/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:

I personal prefer protocol-http.

Am 05.02.2006 um 18:26 schrieb Raghavendra Prabhu:

Hi Stefan

My bandwidth is limited .

But i am able to crawl other links with the same host (so he is not
denying
i guess)

Is it because of the protocol-httpclient(shud i use protocol-http)

Rgds
Prabhu


On 2/5/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:

Is the host in your web-browser available?
Does this host block your ip, since he understand nutch as a DOS
attack?
Is you bandwidth limited?

Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu:

Hi

I am running a crawl using protocol-httpclient

I get a
java.io.IOException: java.net.SocketTimeoutException: Read timed out

Can someone tell me the reason why i get the error

After that the crawl hangs and is simply in the same state

Rgds
Prabhu







-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to