[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Attached another file with a dict that contains the 2231 HTML5 entities listed at http://www.w3.org/TR/html5/named-character-references.html The dict is like: html5namedcharref = { 'Aacute;': '\xc1', 'Aacute': '\xc1',

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Here is a proper patch, still using the html5namedcharref name. HTMLParser should also be updated to use this dict. -- keywords: +patch stage: patch review - commit review Added file: http://bugs.python.org/file26109/issue3.diff

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: Removed file: http://bugs.python.org/file26109/issue3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3 ___

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: Added file: http://bugs.python.org/file26110/issue3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3 ___

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: How about calling it just html5, or HTML5? That it is about entities already follows from the module name. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Here's a new patch that uses the html5 name for the dict, if there aren't other comments I'll commit it. -- Added file: http://bugs.python.org/file26113/issue3-2.diff ___ Python tracker

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Roundup Robot
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 2b54e25d6ecb by Ezio Melotti in branch 'default': #3: add a new html5 dictionary containing the named character references defined by the HTML5 standard and the equivalent Unicode character(s) to the

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- resolution: - fixed stage: commit review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: The ';' is not part of the entity name but an SGML delimiter, like ''; the strings in the dict should not include it (like in the other dict they don’t). -- ___ Python tracker rep...@bugs.python.org

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: BTW in the doc you may point to collections.ChainMap to explain to people how to make one dict with HTML 4 and HTML 5 entities. (Note that I assume there are two dicts, but I only skimmed the diff.) --

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: The problem is that the standard allows some charref to end without a ';', but not all of them. So both Eacuteric and Eacute;ric will be parsed as Éric, but only alpha;centauri will result in αcentauri -- alphacentauri will be returned

[issue11113] html.entities mapping dicts need updating?

2012-06-23 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: The explanations make sense, don’t change anything. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3 ___

[issue11113] html.entities mapping dicts need updating?

2011-11-29 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: http://www.w3.org/TR/html5/named-character-references.html lists 2152 HTML 5 entities (see also attached file for a dict generated from that table). Currently html.entities only has 252 entities, organized in 3 dicts: 1) name - intvalue

[issue11113] html.entities mapping dicts need updating?

2011-11-29 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: 1) the current approach of having a dict with name - intvalue doesn't work anymore, and a name - valuelist should be used instead; 2) the reverse dict for this would have to use tuples as keys, but I'm not sure how useful would that

[issue11113] html.entities mapping dicts need updating?

2011-11-28 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- assignee: - ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3 ___ ___

[issue11113] html.entities mapping dicts need updating?

2011-07-20 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Having them in different mappings would be good, but I expect that for most real world application a single mappings that includes them all is the way to go. If I'm parsing a supposedly HTML page that contains an apos; I'd rather have it

[issue11113] html.entities mapping dicts need updating?

2011-06-15 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: Ah, this changes the situation. I suppose it’s too late to stop pretending that HTML and XHTML are nearly the same thing (IOW change the doc), so apos needs to be defined for XHTML. IMO, we need a way to have the right entity references for

[issue11113] html.entities mapping dicts need updating?

2011-06-14 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: I just closed #12329 as a duplicate of this bug. It requested the addition of the apos named entity reference. TTBOMK, the html module (or htmlentitydefs in 2.x) doesn’t claim to support XHTML TTBOMK; an XML parser should be used for XHTML.

[issue11113] html.entities mapping dicts need updating?

2011-06-14 Thread Hans Peter de Koning
Hans Peter de Koning h...@xs4all.nl added the comment: The reason I raised #12329 was that the v2.7.1 documentation in http://docs.python.org/library/htmllib.html#module-htmlentitydefs says: ... The definition provided here contains all the entities defined by XHTML 1.0 ... The only diff

[issue11113] html.entities mapping dicts need updating?

2011-06-14 Thread Hans Peter de Koning
Hans Peter de Koning h...@xs4all.nl added the comment: BTW, the HTMLParser module (as well as html.parser in 3.x) does claim to parse both HTML and XHTML, see http://docs.python.org/library/htmlparser.html#module-HTMLParser . -- ___ Python tracker

[issue11113] html.entities mapping dicts need updating?

2011-06-14 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3 ___ ___

[issue11113] html.entities mapping dicts need updating?

2011-02-06 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: I don't see the need for a parameter to support different sets of entities. Just supporting the ones from HTML 5 seems like the right thing. -- nosy: +eric.smith ___ Python tracker

[issue11113] html.entities mapping dicts need updating?

2011-02-06 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: To make my intent explicit: an updated mapping could generate references invalid for 4.01. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3

[issue11113] html.entities mapping dicts need updating?

2011-02-06 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: Ah. I hadn't thought of generating them, only parsing them. In that case, then yes, it's an issue for generation. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3

[issue11113] html.entities mapping dicts need updating?

2011-02-04 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Supporting the ones in HTML 5 would be fine with me. Supporting those of xml-entity-names would be inappropriate - it's not clear (to me, at least) that all of them are really meant for use in HTML. -- nosy: +loewis

[issue11113] html.entities mapping dicts need updating?

2011-02-04 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: Agreed with Martin. I wonder if we should provide a means to use only HTML 4.01 entity references (say with a function parameter html5 defaulting to True) or we should just update the mapping. -- nosy: +eric.araujo stage: - needs

[issue11113] html.entities mapping dicts need updating?

2011-02-03 Thread Brian Jones
New submission from Brian Jones bkjo...@gmail.com: In Python 3.2b2, html.entities.codepoint2name and name2codepoint only support the 252 HTML entity names defined in the HTML 4 spec from 1997. I'm wondering if there's a reason not to support W3C Recommendation 'XML Entity Definitions for