hello one and all,
i am using web2py to scrape a website. so web2py is a client in this case,
not so much the server, but it is running on centos and apache. i am
trying to scrape a website and at this point, just trying to login. if i
trace all of the network traffic in firefox, i will load the login page,
clear the transactions, fill out the page and hit the submit button. the
first transaction is a POST of type json with is associated request
headers, cookies, and parameters (which is of type json).
so i try to replicate all of this in firefox network tracing under a web2py
(2.14.3) controller using python 2.7 urllib2 with the CookieJar,
build_opener, etc., as shown below. i can not for the life of me figure
out why when i replicate the firefox tracing, under Part 2, that i get an
HTTPError 500. am i missing something? i don't know what else to look for
the replicate the environment. perhaps something with the XMLHttpRequest
cross-domain thing, but this isn't a cross domain? or, HTTP
Authentication, but Part 1 loads fine, but not Part 2? so some advice is
appreciated. thanx in advance, lucas
#part 1
#establish connection and session cookie...
http = urllib2.HTTPHandler(debuglevel=1)
https = urllib2.HTTPSHandler(debuglevel=1)
cookies = CookieJar()
handlers = [http, https, urllib2.HTTPCookieProcessor(cookies)]
site = urllib2.build_opener(*handlers)
site.addheaders = [('Host', "core.website.com"), ('User-agent',
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101
Firefox/45.0"), ('Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),
('Accept-Language', "en-US,en;q=0.5"), ('Accept-Encoding', "gzip, deflate,
br"), ('Connection', "keep-alive")]
urllib2.install_opener(site)
try:
html1 = site.open('https://core.website.com/CoreCms.aspx')
except urllib2.HTTPError, e:
return HTML(BODY(str(e.code)+": "+str(e)))
except urllib2.URLError, e:
return HTML(BODY(str(e)))
html1.close()
#part 2
#replicate the login as from firefox...
postdata = {'username':'[email protected]','password':'passwd'}
jsondata = json.dumps(postdata)
jsondata = jsondata.encode('utf-8')
site.add_header = ('Referer', "https://core.website.com/CoreCms.aspx")
site.add_header = ('X-Requested-With', "XMLHttpRequest")
site.add_header = ('Content-Type', "application/json; charset=utf-8")
site.add_header = ('Content-Length', len(jsondata))
try:
json1 =
site.open('https://core.website.com/CoreWebSvc.asmx/InteractiveLogin',
jsondata)
data = json.loads(json1.read())
return HTML(BODY("%s" % data))
except urllib2.HTTPError, e:
return HTML(BODY(c2, BR(), "E1 HTTPError: "+str(e.code)+",
"+str(e), BR(), c4))
except urllib2.URLError, e:
return HTML(BODY("E2 URLError: "+str(e.code)+", "+str(e)))
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.