Trouble with Python retriever

Bill Nalen Wed, 05 Feb 2003 11:38:40 -0800

I hope this is appropriate for the dev list. I'm having trouble fetching internal pages from our web servers. Here's the error message I'm getting:
Retrieval failed: 404 -- [Errno socket error] (10035, 'The socket operation could not complete without blocking').
I get this message in the urllib.py URLopener open_http method. It fails on the h.putrequest('GET', selector) line which I believe is the first attempt to do a connect on the socket. Thing is that I wouldn't care if it blocked.

I have a single page with 8-10 links on it. The first page plucks fine, but none of the links get plucked, they all return the message above.
The urls all work if I use them individually from plucker, ie if I sepecify a link that doesn't work as the start url, then it works fine, but any links from that page fail too. I can browse the page fine, and the links all work fine from the browser. Some of the pages that fail are static html pages, some are dynamically generated from asp pages. Plucking pages from outside sources works fine. The servers are all IIS if that makes a difference. Our internal servers are authenticated with Autosocks (I'm assuming this is the problem). I'm Plucking from Windows 2000.

I've tried searching the web for help with this problem. I ran across a timeout.py file that I applied to the urlopener, but that generated a different error (10061). I'm wondering if the socket is not getting closed properly or something and the socket is being reused.

Also, the Jython version of the Python parser works fine. As does the JPluck parser. I'm using the Python and the parser installed with the Plucker Desktop 1.2.

My hope is that some Python guru could help direct me in the right place, but any help would be appreciated.

Bill

Trouble with Python retriever

Reply via email to