Since you specify Etags i guess this happens with servers supporting HTTP1.1
When you send a Connection: Keep-Alive header to these servers that support
this feature(HTTP1.1), it holds the connection alive in the hope that it will
be used for another request.But since you do not, the connection is held till
the server timesout . and your crawler is held till then(socket blocks).Servers
that do not support the feature return immediately.Try removing Keep-alive and
check or once the content-length or the chunked length is downloaded close the
connection from your end
Bazuka wrote:
> This question is not related to Wget - so this newsgroup is probably not the
> right place to post this message. However, I am posting this msg here hoping
> that someone here might be able to help me.
>
> I have just written a bare-bones crawler in C++. It seems to run just fine
> except when getting data from some URLs. I noticed that the crawler took
> about 30 seconds to return these pages to me (all other retrievals take less
> than a second). The only thing common between these (30-sec) URLs is that
> their response header-field contains "Etag". I would be grateful if someone
> could help me with this problem : why do server responses with Etags take
> longer ? Am I missing out on something in the request ?
>
> This is the request I am sending out :
>
> GET / HTTP/1.0
> User-Agent:Tester
> Host: srds.cs.umn.edu
> Connection: Keep-Alive
>
> I have also tried to retrieve those URLs using Wget and they don't take long
> ...
> thanks