Thank you very much. This is helpful!
Actually, what I wanna do is
1) read a *<table>DATA</table>* from a exteranl page and insert it to the
database.
2) Using a cron /scheduler to update the database table periodically if the
source web page had changed
Could you give me a idea how to get going with 1)?
Say I have done this:
* fetch the rows (<td>CONTENT</td>) of the table into the elements
collection
* stripped unwanted parts off and separated the columns
Now how do I push into the table?
Here's what I plan to do:
* create a model corresponding table columns and data (all strings, numbers)
* read the data sliced from the elements collection by a for loop and
assign it to the database table fields.
Would you confirm this approach?
Is there a more efficient way?
Thank you and kind regards,
Timmie
Am Freitag, 3. Mai 2013 13:48:14 UTC+2 schrieb Anthony:
>
> That same code works in a controller -- it was merely being demonstrated
> in a shell. Instead of urllib.urlopen, you can now use fetch (which also
> works on GAE):
>
> from gluon.tools import fetch
> page = TAG(fetch('http://www.web2py.com'))
> page.elements('div') # gives you a list of all DIV elements in the page
> (as web2py DIV helper objects)
>
> Actually, at the moment, the above will generate an error because
> apparently there is an unbalanced <a> tag somewhere on the web2py.compage.
>
> Anthony
>
> On Friday, May 3, 2013 3:15:55 AM UTC-4, Timmie wrote:
>>
>> Hello,
>> is there an example how to use this:
>>
>> scraping utils
>> https://groups.google.com/forum/?fromgroups=#!topic/web2py/skcc2ql3zOs
>>
>> in a controller?
>>
>> Especially the first lines (fetching the page and getting it into an
>> element) is what I am looking for.
>>
>>
>> The above example is made for the shell access.
>>
>> Thanks and kind regards,
>> Timmie
>>
>>
>>
--
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.