It's probably me, actually, I was hoping someone who spot my error.
I am attempting to use cookielib, and running into difficulties.
I have been following this recipe - http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302930
as an example, as the official documentation is a bit sparse, but it seems rather easy.
However, as my code will demonstrate -
>>> import re
>>> import urllib2
>>> import cookielib
>>>
>>> a = re.compile('href\=\"showthread.php\?s\=.+?pagenumber=(?P<pagenum>\d+?)\"', re.IGNORECASE)
>>>
>>> Jar = cookielib.MozillaCookieJar(filename = 'c:/cookies.txt')
>>> opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(Jar))
>>> urllib2.install_opener(opener)
Now, that's all by the recipe I linked too. No exceptions, so I figured it was good.
>>> f = urllib2.urlopen('http://www.gpforums.co.nz/forumdisplay.php?s=&forumid=7029 ')
>>> j = f.read()
>>> ww = a.finditer(j)
>>> print ww.next().group()
href="">
Now, that's an issue. When I'm in a cookied session in Firefox, that link would be
showthread.php?s=&threadid=267930&pagenumber=2
Hmm... so I check by requesting an url that needs a cookie to get into -
>>> f = urllib2.urlopen(' http://www.gpforums.co.nz/newthread.php?s=&action="">')
>>> print f.read()
<lots snipped>
You are not logged in, or you do not have permission to access this page. This could be due to one of several reasons:
</lots>
Now, I'm using the exact same cookies.txt ol Firefox uses, so I'm a little perplexed. I check to see if I've actually got a cookie -
>>> print Jar
<_MozillaCookieJar.MozillaCookieJar[<Cookie bblastvisit=1113481269 for .gpforums.co.nz/>, <Cookie sessionhash=f6cba21ed58837ab935a564e6b9c3b05 for .gpforums.co.nz/>, <Cookie bblastvisit=1113481269 for .www.gpforums.co.nz/>, <Cookie sessionhash=f6cba21ed58837ab935a564e6b9c3b05 for .www.gpforums.co.nz/>]>
Which is exactly how that cookie looks, both in my cookies.txt, and when I packet sniff it going out.
I also tried it the way shown in the recipe, including changing the User-Agent -
>>> txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
>>> print txheaders
{'User-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
>>> theurl = 'http://www.gpforums.co.nz/newthread.php?s=&action="" '
>>> req = urllib2.Request(theurl, data = "" headers = txheaders)
>>> handle = urllib2.urlopen(req)
>>> g = handle.read()
>>> print g
<lots snipped>
You are not logged in, or you do not have permission to access this page. This could be due to one of several reasons:
</lots>
So yeah, I'm at a loss, no doubt my mistake is painfully obvious when pointed out, but any pointing would be greatly appreciated.
Regards,
Liam Clarke
<packet captures follow>
GET /newthread.php?s=&action="" HTTP/1.1\r\n
Request Method: GET
Request URI: /newthread.php?s=&action="">
Request Version: HTTP/1.1
Accept-Encoding: identity\r\n
Host: www.gpforums.co.nz\r\n
Cookie: bblastvisit=1113481269; sessionhash=f6cba21ed58837ab935a564e6b9c3b05; bblastvisit=1113481269; sessionhash=f6cba21ed58837ab935a564e6b9c3b05\r\n
Connection: close\r\n
User-agent: Python-urllib/2.4\r\n
\r\n
...and the response
Hypertext Transfer Protocol
HTTP/1.1 200 OK\r\n
Request Version: HTTP/1.1
Response Code: 200
Date: Thu, 14 Apr 2005 12:44:12 GMT\r\n
Server: Apache/2.0.46 (CentOS)\r\n
Accept-Ranges: bytes\r\n
X-Powered-By: PHP/4.3.2\r\n
Set-Cookie: sessionhash=43bcebcf4dba6878802b25cb126ed1f7; path=/; domain=gpforums.co.nz\r\n
Set-Cookie: sessionhash=43bcebcf4dba6878802b25cb126ed1f7; path=/; domain=www.gpforums.co.nz\r\n
Set-Cookie: sessionhash=43bcebcf4dba6878802b25cb126ed1f7; path=/; domain=gpforums.co.nz\r\n
Set-Cookie: sessionhash=43bcebcf4dba6878802b25cb126ed1f7; path=/; domain=www.gpforums.co.nz\r\n
--
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.'
_______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
