Re: [Python] Parsing della pagina HTML

Valerio Maggio Tue, 16 Jul 2013 05:35:12 -0700

On Jul 16, 2013, at 12:09 PM, Nicola Larosa <n...@teknico.net> wrote:


> Qui c'è la doc di text_content:
> 
> <http://lxml.de/lxmlhtml.html#html-element-methods>
> 
> Nello stesso posto trovi le doc di:
> 
> - find_class (se conosci la classe CSS degli elementi che ti
>  interessano);
> - get_element_by_id (se conosci l'id dell'elemento che ti interessa):
> - cssselect (per usare selettori CSS, molto potenti);
> - un accenno a xpath, documentata altrove
>  <http://lxml.de/xpathxslt.html#xpath>, anche molto potente.
> 
> L'esempio usa find_class <http://lxml.de/lxmlhtml.html#examples>.


In alternativa, per queste attività di web scraping io ho sempre utilizzato
BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/)

Btw:
[…]
Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, 
allowing you to try out different parsing strategies or trade speed for 
flexibility.
[…]

--
Valerio

_______________________________________________
Python mailing list
Python@lists.python.it
http://lists.python.it/mailman/listinfo/python

Re: [Python] Parsing della pagina HTML

Rispondere a