Hi Stefan

One more thing which i am seeing is some outlinks are not parsed properly.

I tried using both the html parser (neko and tagsoup)

I know that this may not be due to protocol-http but is  there a chance that
this may be also due to same reason ?

Thanks for the answer .

Rgds
Prabhu


On 2/5/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
>
> I personal prefer protocol-http.
>
> Am 05.02.2006 um 18:26 schrieb Raghavendra Prabhu:
>
> > Hi Stefan
> >
> > My bandwidth is limited .
> >
> > But i am able to crawl other links with the same host (so he is not
> > denying
> > i guess)
> >
> > Is it because of the protocol-httpclient(shud i use protocol-http)
> >
> > Rgds
> > Prabhu
> >
> >
> > On 2/5/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
> >>
> >> Is the host in your web-browser available?
> >> Does this host block your ip, since he understand nutch as a DOS
> >> attack?
> >> Is you bandwidth limited?
> >>
> >> Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu:
> >>
> >>> Hi
> >>>
> >>> I am running a crawl using protocol-httpclient
> >>>
> >>> I get a
> >>> java.io.IOException: java.net.SocketTimeoutException: Read timed out
> >>>
> >>> Can someone tell me the reason why i get the error
> >>>
> >>> After that the crawl hangs and is simply in the same state
> >>>
> >>> Rgds
> >>> Prabhu
> >>
> >>
>
>

Reply via email to