[web2py] Re: parsehtml

mdipierro Mon, 24 May 2010 13:27:51 -0700

Good suggestion. Now you can do

    >>> from gluon.html import web2pyHTMLParser
    >>> tree = web2pyHTMLParser('hello<div a="b">world</
div>').tree
    >>> tree.element(_a='b')
['_c']=5
    >>>
str(tree)
    'hello<div a="b" c="5">world</div>'


works great!



On May 24, 5:11 am, Iceberg <[email protected]> wrote:
> I did not try but I assume the builtin python module HTMLParser
> already handle at least (1) tags like <input />, not sure about (2)
> and (3).
>
> On May24, 4:32am, mdipierro <[email protected]> wrote:
>
> > hmmm.... somehow I did not save comments in the file.
>
> > This does not handle well:
>
> > 1) tags like <input />
> > 2) attributes that contain > in quotes <a onclick="if(a>b)alert()">
> > 3) attributes that contain escaped quotes <a onclick="var a=\"x\"">
>
> > On May 23, 10:46 am, Massimo Di Pierro <[email protected]>
> > wrote:
>
> > > Anybody interested in helping with this?
>
> > > It scrapes an html files and converts into a tree hierarchy of web2py  
> > > helpers
>
> > > '<div>xxx</div>' -> DIV('xxx')
>
> > > It kind of works but fails at three exceptions described in the file.
>
> > > Massimo
>
> > >  parsehtml.py
> > > 1KViewDownload

[web2py] Re: parsehtml

Reply via email to