hello one and all,

i am using web2py to scrape a website.  so web2py is a client in this case, 
not so much the server, but it is running on centos and apache.  i am 
trying to scrape a website and at this point, just trying to login.  if i 
trace all of the network traffic in firefox, i will load the login page, 
clear the transactions, fill out the page and hit the submit button.  the 
first transaction is a POST of type json with is associated request 
headers, cookies, and parameters (which is of type json).

so i try to replicate all of this in firefox network tracing under a web2py 
(2.14.3) controller using python 2.7 urllib2 with the CookieJar, 
build_opener, etc., as shown below.  i can not for the life of me figure 
out why when i replicate the firefox tracing, under Part 2, that i get an 
HTTPError 500.  am i missing something?  i don't know what else to look for 
the replicate the environment.  perhaps something with the XMLHttpRequest 
cross-domain thing, but this isn't a cross domain?  or, HTTP 
Authentication, but Part 1 loads fine, but not Part 2?  so some advice is 
appreciated.  thanx in advance, lucas


    #part 1
    #establish connection and session cookie...
    http = urllib2.HTTPHandler(debuglevel=1)
    https = urllib2.HTTPSHandler(debuglevel=1)
    cookies = CookieJar()
    handlers = [http, https, urllib2.HTTPCookieProcessor(cookies)]
    site = urllib2.build_opener(*handlers)
    site.addheaders = [('Host', "core.website.com"), ('User-agent', 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 
Firefox/45.0"), ('Accept', 
"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"), 
('Accept-Language', "en-US,en;q=0.5"), ('Accept-Encoding', "gzip, deflate, 
br"), ('Connection', "keep-alive")]
    urllib2.install_opener(site)
    try:
        html1 = site.open('https://core.website.com/CoreCms.aspx')
    except urllib2.HTTPError, e:
        return HTML(BODY(str(e.code)+": "+str(e)))
    except urllib2.URLError, e:
        return HTML(BODY(str(e)))
    html1.close()
    #part 2
    #replicate the login as from firefox...
    postdata = {'username':'[email protected]','password':'passwd'}
    jsondata = json.dumps(postdata)
    jsondata = jsondata.encode('utf-8')
    site.add_header = ('Referer', "https://core.website.com/CoreCms.aspx";)
    site.add_header = ('X-Requested-With', "XMLHttpRequest")
    site.add_header = ('Content-Type', "application/json; charset=utf-8")
    site.add_header = ('Content-Length', len(jsondata))
    try:
        json1 = 
site.open('https://core.website.com/CoreWebSvc.asmx/InteractiveLogin', 
jsondata)
        data = json.loads(json1.read())
        return HTML(BODY("%s" % data))
    except urllib2.HTTPError, e:
        return HTML(BODY(c2, BR(), "E1 HTTPError: "+str(e.code)+", 
"+str(e), BR(), c4))
    except urllib2.URLError, e:
        return HTML(BODY("E2 URLError: "+str(e.code)+", "+str(e)))


-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to