kxroberto kxrobe...@users.sourceforge.net added the comment:
Well in many browsers for example there is a internal warning and error log
(window). Which yet does not (need to) claim to be a official W3C checker. It
has positive effect on web stabilization.
For example just looking now I see
Ezio Melotti ezio.melo...@gmail.com added the comment:
The strict/tolerant mode mainly works by using either a strict or a tolerant
regex. If the markup is invalid, the strict regex doesn't match and it gives
an error. The tolerant regex will match both valid and invalid markup at the
same
kxroberto kxrobe...@users.sourceforge.net added the comment:
The old patch warned already the majority of real cases - except the missing
white space between attributes.
The tolerant regex will match both:
locatestarttagend_tolerant: The main and frequent issue on the web here is the
Ezio Melotti ezio.melo...@gmail.com added the comment:
Note that the regex and the way the parser considers the commas changed in
16ed15ff0d7c (it now considers them as the name of a value-less attribute), so
adding a group for the comma is no longer doable.
In theory, the approach you
kxroberto kxrobe...@users.sourceforge.net added the comment:
16ed15ff0d7c was not in current stable py3.2 so I missed it..
When the comma is now raised as attribute name, then the problem is anyway
moved to the higher level anyway - and is/can be handled easily there by usual
methods.
(still
Ezio Melotti ezio.melo...@gmail.com added the comment:
16ed15ff0d7c was not in current stable py3.2 so I missed it..
It's also in 3.2 and 2.7 (but it's quite recent, so if you didn't pull recently
you might have missed it).
When the comma is now raised as attribute name, then the problem is
kxroberto kxrobe...@users.sourceforge.net added the comment:
I looked at the new patch http://hg.python.org/lookup/r86952 for Py3 (regarding
the extended tolerance and local backporting to Python2.7):
What I miss are the calls of a kind of self.warning(msg,i,k) function in
non-strict/tolerant
Ezio Melotti ezio.melo...@gmail.com added the comment:
The HTMLParser is not suitable for validation, even the strict mode allows some
non valid markup (and it might be removed soon).
Also I don't think it's easy to call a self.warnings() without trying the
strict mode first. The tolerant
R. David Murray rdmur...@bitdance.com added the comment:
A note for the curious: I changed the keyword name from 'tolerant' to 'strict'
because the stdlib has other examples of 'strict' as a keyword, but the word
'tolerant' appears nowhere in the documentation and certainly not as a keyword.
R. David Murray rdmur...@bitdance.com added the comment:
I have committed a version of this patch, without the warnings, using the
keyword 'strict=True' as the default, and with a couple added heuristics from
other similar issues, in r86952.
kxroberto, if you want to supply your full name,
Neil Muller drnlmuller+b...@gmail.com added the comment:
#975556 and #1046092 look like they should also be superseded by this.
--
nosy: +Neil Muller
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1486713
R. David Murray rdmur...@bitdance.com added the comment:
See also issue 1058305, which may be a duplicate.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1486713
___
R. David Murray rdmur...@bitdance.com added the comment:
For anyone who does want to work on this (and I do, but it will be quite a
while before I can) see also issue 6191.
--
___
Python tracker rep...@bugs.python.org
kxroberto kxrobe...@users.sourceforge.net added the comment:
I'm not working with Py3. don't how much that module is different in 3.
unless its going into a py2 version, I'll leave the FR so far to the py3
community
--
___
Python tracker
kxroberto kxrobe...@users.sourceforge.net added the comment:
for me a parser which cannot be feed with HTML from outside (which I cannot
edit myself) has not much use at all.
attached my current patch (vs. py26) - many changes meanwhile.
and a test case.
I've put the default to strict mode,
Changes by kxroberto kxrobe...@users.sourceforge.net:
--
versions: +Python 2.6, Python 2.7
Added file: http://bugs.python.org/file18624/test_htmlparser_tolerant.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1486713
R. David Murray rdmur...@bitdance.com added the comment:
2.6 is now in security-fix-only mode. Since this is a new feature, it can only
go into 3.2.
Can you provide a patch against py3k trunk?
I've only glanced at the patch briefly, but one thing that concerns me is
'warning file'. I
Terry J. Reedy tjre...@udel.edu added the comment:
I agree that a tolerant mode would be good (and often requested). String
encoding and decoding also have strict and forgiving modes, so this seems close
to a policy.
Unit tests with example snippets that properly fail strict mode and pass the
Mark Lawrence breamore...@yahoo.co.uk added the comment:
I think this should be closed as have other similar requests in the last few
days.
--
nosy: +BreamoreBoy, fdrake
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1486713
R. David Murray rdmur...@bitdance.com added the comment:
I disagree (and might disagree with those other closings but I haven't noticed
them I guess). BeautifulSoup does *not* cover this ground, it is broken in 3.x
because of the lack of a tolerant HTML parser in the stdlib (it used to use
Terry J. Reedy tjre...@udel.edu added the comment:
This needs to be checked for applicability to 3.x.
Do beautifulsoup and other programs cover this ground (tolerant parsing of junk
html)?
--
nosy: +terry.reedy
versions: +Python 3.2 -Python 2.7, Python 3.1
Changes by Daniel Diniz aja...@gmail.com:
--
stage: - test needed
type: - feature request
versions: +Python 2.7, Python 3.1 -Python 2.4
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1486713
22 matches
Mail list logo