Good suggestion. Now you can do
>>> from gluon.html import web2pyHTMLParser
>>> tree = web2pyHTMLParser('hello<div a="b">world</
div>').tree
>>> tree.element(_a='b')
['_c']=5
>>>
str(tree)
'hello<div a="b" c="5">world</div>'
works great!
On May 24, 5:11 am, Iceberg <[email protected]> wrote:
> I did not try but I assume the builtin python module HTMLParser
> already handle at least (1) tags like <input />, not sure about (2)
> and (3).
>
> On May24, 4:32am, mdipierro <[email protected]> wrote:
>
> > hmmm.... somehow I did not save comments in the file.
>
> > This does not handle well:
>
> > 1) tags like <input />
> > 2) attributes that contain > in quotes <a onclick="if(a>b)alert()">
> > 3) attributes that contain escaped quotes <a onclick="var a=\"x\"">
>
> > On May 23, 10:46 am, Massimo Di Pierro <[email protected]>
> > wrote:
>
> > > Anybody interested in helping with this?
>
> > > It scrapes an html files and converts into a tree hierarchy of web2py
> > > helpers
>
> > > '<div>xxx</div>' -> DIV('xxx')
>
> > > It kind of works but fails at three exceptions described in the file.
>
> > > Massimo
>
> > > parsehtml.py
> > > 1KViewDownload