I haven't used Twill for screen scraping or parsing, but I do use it
extensively for functional testing (and unit testing, actually) of
web2py apps.
But for scraping, I could see how you could use Twill's python API to
go to a page, login, and call show() to get back the html for a page
(which you'd then pass to an XML parser (assuming that it's well-
formed, or after passing it through some sort of tidy functionality)).
So, assuming you have twill installed, from a web2py controller, you
would do something like:
from twill import *
from twill.commands import *
# Go to a url
go('http://en.wikipedia.org/wiki/Web2py')
# Use formvalue() and submit() functions to log in
xhtml = show() # Capture the contents of the html page in a variable
# Send the variable to a DOM parser, or use regexps, or whatever you
like
On Nov 11, 5:06 pm, David <[email protected]> wrote:
> Hey guys,
>
> I've been studying up on working with scraping/parsing and remote
> logins for sites that don't have APIs and I came across Twill.
>
> Have any of you used it to automate things like login and screen/html
> parsing?
>
> It would be nice to be able to login to a remote site via a model/
> controller and pull a small clip of html and stick it on a view
> somewhere.
>
> I've got it working nicely on the shell and it seems quite promising
> but it doesn't readily appear to me how I would use something like
> this from inside web2py.
>
> Are there any examples that I can have a look at while I am still
> learning about web2py?
>
> Thanks in advance!
>
> - David
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---