[web2py] Re: TAG helper can not parse html

2014-05-20 Thread Anthony
No, TAG is only a basic parser and not robust against errors in the HTML. 
You should probably use a more sophisticated tool, such as Beautiful Soup 
(which is built on top of the lxml and html5lib parsers). The standard 
library also includes the HTMLParser module, but you may run into similar 
problems with that.

Anthony

On Tuesday, May 20, 2014 8:14:37 AM UTC-4, yamandu wrote:

 I am trying to parse a HTML with the TAG helper from a fetched URL using 
 urllib.
 The HTML is broken in some parts, it has end span tags without respective 
 start span tags.

 TAG helper gives error: unable to balance span tag.

 I tested it. Open tags not closed are parsed, but not closed tags without 
 open.

 Would be there a work around for this?


-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
web2py-users group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [web2py] Re: TAG helper can not parse html

2014-05-20 Thread Carlos Costa
Yeah, the error is thrown by HTMLParser, TAG is build on top of it.
I will try some other tools like Beautifull Soup.

Thanks.


2014-05-20 10:04 GMT-03:00 Anthony abasta...@gmail.com:

 No, TAG is only a basic parser and not robust against errors in the HTML.
 You should probably use a more sophisticated tool, such as Beautiful Soup
 (which is built on top of the lxml and html5lib parsers). The standard
 library also includes the HTMLParser module, but you may run into similar
 problems with that.

 Anthony


 On Tuesday, May 20, 2014 8:14:37 AM UTC-4, yamandu wrote:

 I am trying to parse a HTML with the TAG helper from a fetched URL using
 urllib.
 The HTML is broken in some parts, it has end span tags without respective
 start span tags.

 TAG helper gives error: unable to balance span tag.

 I tested it. Open tags not closed are parsed, but not closed tags without
 open.

 Would be there a work around for this?

  --
 Resources:
 - http://web2py.com
 - http://web2py.com/book (Documentation)
 - http://github.com/web2py/web2py (Source code)
 - https://code.google.com/p/web2py/issues/list (Report Issues)
 ---
 You received this message because you are subscribed to the Google Groups
 web2py-users group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to web2py+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.




-- 
Att.

Carlos J. Costa
Cientista da Computação
Esp. Gestão em Telecom

EL MELECH NEEMAN!
אָמֵן

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
web2py-users group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.