RE: Timeout problems with web crawling

Karl Wright Tue, 23 Apr 2013 05:18:28 -0700

Do you have the ability to use wireshark or tcpdump on this machine? If
so, can you set up a crawl with only that URL, and compare and contrast
fetches vs. Curl? There must be some key difference.


Karl

Sent from my Windows Phone
From: Erlend Garåsen
Sent: 4/23/2013 8:03 AM
To: [email protected]
Subject: Re: Timeout problems with web crawling
On 23.04.13 13.48, Erlend Garåsen wrote:

> -bash-3.2$ curl -vvv -H "User-Agent: Mozilla/5.0
> (ApacheManifoldCFWebCrawler; [email protected])"
> "http://www.ibsen.uio.no/REGINFO_peAGa.xhtml?bokstav=G|1366644879398+299979"

A small typo in the URL, so the correct command is:
curl -vvv -H "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler;
[email protected])"
"http://www.ibsen.uio.no/REGINFO_peAGa.xhtml?bokstav=G";

But same result. An immediate response.

Erlend

-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

RE: Timeout problems with web crawling

Reply via email to