[web2py] Re: TAG helper can not parse html

Anthony Tue, 20 May 2014 06:05:35 -0700

No, TAG is only a basic parser and not robust against errors in the HTML. 
You should probably use a more sophisticated tool, such as Beautiful Soup 
(which is built on top of the lxml and html5lib parsers). The standard 
library also includes the HTMLParser module, but you may run into similar 
problems with that.


Anthony

On Tuesday, May 20, 2014 8:14:37 AM UTC-4, yamandu wrote:
>
> I am trying to parse a HTML with the TAG helper from a fetched URL using 
> urllib.
> The HTML is broken in some parts, it has end span tags without respective 
> start span tags.
>
> TAG helper gives error: unable to balance span tag.
>
> I tested it. Open tags not closed are parsed, but not closed tags without 
> open.
>
> Would be there a work around for this?
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: TAG helper can not parse html

Reply via email to