Hi Stefan One more thing which i am seeing is some outlinks are not parsed properly.
I tried using both the html parser (neko and tagsoup) I know that this may not be due to protocol-http but is there a chance that this may be also due to same reason ? Thanks for the answer . Rgds Prabhu On 2/5/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > > I personal prefer protocol-http. > > Am 05.02.2006 um 18:26 schrieb Raghavendra Prabhu: > > > Hi Stefan > > > > My bandwidth is limited . > > > > But i am able to crawl other links with the same host (so he is not > > denying > > i guess) > > > > Is it because of the protocol-httpclient(shud i use protocol-http) > > > > Rgds > > Prabhu > > > > > > On 2/5/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > >> > >> Is the host in your web-browser available? > >> Does this host block your ip, since he understand nutch as a DOS > >> attack? > >> Is you bandwidth limited? > >> > >> Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu: > >> > >>> Hi > >>> > >>> I am running a crawl using protocol-httpclient > >>> > >>> I get a > >>> java.io.IOException: java.net.SocketTimeoutException: Read timed out > >>> > >>> Can someone tell me the reason why i get the error > >>> > >>> After that the crawl hangs and is simply in the same state > >>> > >>> Rgds > >>> Prabhu > >> > >> > >
