[issue1486713] HTMLParser : A auto-tolerant parsing mode

2011-11-16 Thread kxroberto
kxroberto kxrobe...@users.sourceforge.net added the comment: Well in many browsers for example there is a internal warning and error log (window). Which yet does not (need to) claim to be a official W3C checker. It has positive effect on web stabilization. For example just looking now I see

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2011-11-16 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: The strict/tolerant mode mainly works by using either a strict or a tolerant regex. If the markup is invalid, the strict regex doesn't match and it gives an error. The tolerant regex will match both valid and invalid markup at the same

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2011-11-16 Thread kxroberto
kxroberto kxrobe...@users.sourceforge.net added the comment: The old patch warned already the majority of real cases - except the missing white space between attributes. The tolerant regex will match both: locatestarttagend_tolerant: The main and frequent issue on the web here is the

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2011-11-16 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Note that the regex and the way the parser considers the commas changed in 16ed15ff0d7c (it now considers them as the name of a value-less attribute), so adding a group for the comma is no longer doable. In theory, the approach you

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2011-11-16 Thread kxroberto
kxroberto kxrobe...@users.sourceforge.net added the comment: 16ed15ff0d7c was not in current stable py3.2 so I missed it.. When the comma is now raised as attribute name, then the problem is anyway moved to the higher level anyway - and is/can be handled easily there by usual methods. (still

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2011-11-16 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: 16ed15ff0d7c was not in current stable py3.2 so I missed it.. It's also in 3.2 and 2.7 (but it's quite recent, so if you didn't pull recently you might have missed it). When the comma is now raised as attribute name, then the problem is

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2011-11-15 Thread kxroberto
kxroberto kxrobe...@users.sourceforge.net added the comment: I looked at the new patch http://hg.python.org/lookup/r86952 for Py3 (regarding the extended tolerance and local backporting to Python2.7): What I miss are the calls of a kind of self.warning(msg,i,k) function in non-strict/tolerant

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2011-11-15 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: The HTMLParser is not suitable for validation, even the strict mode allows some non valid markup (and it might be removed soon). Also I don't think it's easy to call a self.warnings() without trying the strict mode first. The tolerant

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-12-03 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: A note for the curious: I changed the keyword name from 'tolerant' to 'strict' because the stdlib has other examples of 'strict' as a keyword, but the word 'tolerant' appears nowhere in the documentation and certainly not as a keyword.

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-12-02 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I have committed a version of this patch, without the warnings, using the keyword 'strict=True' as the default, and with a couple added heuristics from other similar issues, in r86952. kxroberto, if you want to supply your full name,

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-11-20 Thread Neil Muller
Neil Muller drnlmuller+b...@gmail.com added the comment: #975556 and #1046092 look like they should also be superseded by this. -- nosy: +Neil Muller ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1486713

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-09-04 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: See also issue 1058305, which may be a duplicate. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1486713 ___

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-27 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: For anyone who does want to work on this (and I do, but it will be quite a while before I can) see also issue 6191. -- ___ Python tracker rep...@bugs.python.org

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-26 Thread kxroberto
kxroberto kxrobe...@users.sourceforge.net added the comment: I'm not working with Py3. don't how much that module is different in 3. unless its going into a py2 version, I'll leave the FR so far to the py3 community -- ___ Python tracker

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-24 Thread kxroberto
kxroberto kxrobe...@users.sourceforge.net added the comment: for me a parser which cannot be feed with HTML from outside (which I cannot edit myself) has not much use at all. attached my current patch (vs. py26) - many changes meanwhile. and a test case. I've put the default to strict mode,

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-24 Thread kxroberto
Changes by kxroberto kxrobe...@users.sourceforge.net: -- versions: +Python 2.6, Python 2.7 Added file: http://bugs.python.org/file18624/test_htmlparser_tolerant.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1486713

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-24 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: 2.6 is now in security-fix-only mode. Since this is a new feature, it can only go into 3.2. Can you provide a patch against py3k trunk? I've only glanced at the patch briefly, but one thing that concerns me is 'warning file'. I

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-24 Thread Terry J. Reedy
Terry J. Reedy tjre...@udel.edu added the comment: I agree that a tolerant mode would be good (and often requested). String encoding and decoding also have strict and forgiving modes, so this seems close to a policy. Unit tests with example snippets that properly fail strict mode and pass the

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-22 Thread Mark Lawrence
Mark Lawrence breamore...@yahoo.co.uk added the comment: I think this should be closed as have other similar requests in the last few days. -- nosy: +BreamoreBoy, fdrake ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1486713

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-22 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I disagree (and might disagree with those other closings but I haven't noticed them I guess). BeautifulSoup does *not* cover this ground, it is broken in 3.x because of the lack of a tolerant HTML parser in the stdlib (it used to use

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2010-08-08 Thread Terry J. Reedy
Terry J. Reedy tjre...@udel.edu added the comment: This needs to be checked for applicability to 3.x. Do beautifulsoup and other programs cover this ground (tolerant parsing of junk html)? -- nosy: +terry.reedy versions: +Python 3.2 -Python 2.7, Python 3.1

[issue1486713] HTMLParser : A auto-tolerant parsing mode

2009-03-20 Thread Daniel Diniz
Changes by Daniel Diniz aja...@gmail.com: -- stage: - test needed type: - feature request versions: +Python 2.7, Python 3.1 -Python 2.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1486713