Re: Question about crawler and Etag

T. Bharath Tue, 10 Jul 2001 23:59:25 -0700
Since you specify Etags  i guess this happens with servers supporting HTTP1.1
When you send a Connection: Keep-Alive header to these  servers that support
this feature(HTTP1.1), it  holds  the connection alive in the hope that it will
be used for another  request.But since you do not, the connection is held  till
the server timesout . and your  crawler is held till then(socket blocks).Servers

that do not support the feature  return immediately.Try removing Keep-alive and
check or once the content-length or the chunked length is downloaded close the
connection from your end

Bazuka wrote:

> This question is not related to Wget - so this newsgroup is probably not the
> right place to post this message. However, I am posting this msg here hoping
> that someone here might be able to help me.
>
> I have just written a bare-bones crawler in C++. It seems to run just fine
> except when getting data from some URLs. I noticed that the crawler took
> about 30 seconds to return these pages to me (all other retrievals take less
> than a second). The only thing common between these (30-sec) URLs is that
> their response header-field contains "Etag". I would be grateful if someone
> could help me with this problem : why do server responses with Etags take
> longer ? Am I missing out on something in the request ?
>
> This is the request I am sending out :
>
> GET / HTTP/1.0
> User-Agent:Tester
> Host: srds.cs.umn.edu
> Connection: Keep-Alive
>
> I have also tried to retrieve those URLs using Wget and they don't take long
> ...
> thanks
Re: Question about crawler and Etag

Reply via email to