[issue843590] 'macintosh' encoding alias for 'mac_roman'
Ned Deily n...@acm.org added the comment: Martin, the typo was fixed subsequently by r84231. -- nosy: +ned.deily ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Martin von Gagern martin.vgag...@gmx.net added the comment: Maybe I'm missing something here, but r84229 looks to me like aliasing 'macintosh' to itself, instead of to 'mac_roman'. 'csmacintosh' and 'mac' are not included at all, without any comment as to why they have been omitted. Makes me wonder why my issue843590_alias.patch wasn't applied as it is, but recreated instead. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Marc-Andre Lemburg m...@egenix.com added the comment: Benjamin Peterson wrote: Benjamin Peterson benja...@python.org added the comment: r84229 Thanks, Benjamin ! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Changes by Amaury Forgeot d'Arc amaur...@gmail.com: -- keywords: +easy resolution: - accepted ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Benjamin Peterson benja...@python.org added the comment: r84229 -- nosy: +benjamin.peterson status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Marc-Andre Lemburg m...@egenix.com added the comment: Mark Lawrence wrote: Mark Lawrence breamore...@yahoo.co.uk added the comment: @Marc-Andre as there's no comments since your last post would you like to take this forward, cheers. I'm fine with adding the alias, but currently don't have any cycles left to actually do the checkins, add the Misc/NEWS entry, update the docs, etc. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Mark Lawrence breamore...@yahoo.co.uk added the comment: @Marc-Andre as there's no comments since your last post would you like to take this forward, cheers. -- nosy: +BreamoreBoy stage: - patch review versions: +Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Marc-Andre Lemburg m...@egenix.com added the comment: Here's another reference I found: http://developer.apple.com/legacy/mac/library/documentation/mac/Text/Text-30.html It appears that the macintosh encoding is the same as the MacRoman one, but without the character D9-FF. The document also suggests that it's a really old encoding. Here's a comparison of various Mac Roman mappings: http://www.haible.de/bruno/charsets/conversion-tables/Mac-Roman.html These include the macintosh charset name as well. For all practical purposes, it appears to be safe to alias macintosh to mac-roman and also add the other suggested aliases from the IANA registry. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Martin von Gagern martin.vgag...@gmx.net added the comment: Find attached (issue843590_rfc.patch) an implementation of the macintosh encoding as the RFC defines it. I don't suggest its inclusion; I would prefer the alias of this implementation, but either one is better than no 'macintosh' encoding at all. So if you really want that, here it is. -- keywords: +patch Added file: http://bugs.python.org/file15896/issue843590_rfc.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Martin von Gagern martin.vgag...@gmx.net added the comment: And this patch (issue84359_alias.patch) is the alternative, 'macintosh' as an alias to 'mac_roman' as originally requested, along with a bunch of aliases registered with IANA. I'd prefer this approach over the preceding one, and hope someone will maybe review this for inclusion. -- Added file: http://bugs.python.org/file15897/issue843590_alias.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Martin von Gagern martin.vgag...@gmx.net added the comment: I did some further investigations here. Apple doesn't seem likely to offer any authoritative reference for the macintosh encoding, because all they ever seem to talk about is Roman. The only source for macintosh I could find is this RFC 1345, with the listed differences. The RFC states the Unicode 1.0 standard as its source. Yesterday I went to the library and thumbed through that volume. That, too, talks about the different macintosh encodings, one of which is called Roman and matches the one from current Unicode standards, except for 0xdb which used to be the currency sign back then but is euro now. On 2009-02-09 I also tried to ask Keld Simonsen, the author of the RFC, about this whole issue. I got no reply so far. On the whole, I get the impression that the macintosh encoding from RFC 1345 is pretty much without actual use. I see no real world application which actually uses it as it is defined, as most users intend it as the IANA-registered name for mac-roman. Python has two options, I believe. We could either do this by the book, and implement an encoding as it was defined, even though there is no known real world applicaton of that exact charset. Or we could be pragmatic, and postulate that the RFC is simply wrong, and every real world occurrence of macintosh intends to refer to mac-romand, in which case an alias would be appropriate. I would say, let's be pragmatic. When converting from unicode to macintosh, it might be possible to accomodate both mappings, and in this way avoid unmappable characters. As this doesn't deal well with the switched dashes, I guess I'd rather not do this, in order to avoid subtle issues from going undetected. It might be a good idea, however, to map both currecny sign and euro to the same byte, and choose one when mapping back to unicode. I don't think I can contribute much more information to this issue, and seeing as it has been open for years without much input, I take it neither will others. So I guess it is time to make a choice based on the information available. By the book, or pragmatic? ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue843590] 'macintosh' encoding alias for 'mac_roman'
Martin von Gagern martin.vgag...@gmx.net added the comment: I had my first indication to rather use macintosh instead of mac_roman from Wikipedia http://en.wikipedia.org/wiki/Mac_OS_Roman which states that the charset part of a MIME content-type specification should be maciontosh. I'm not quoting this as any kind of authority, but rather to point out that it is likely for people to use this. I did a comparison of http://tools.ietf.org/rfc/rfc1345.txt (RFC) and ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT (UNI) using the attached perl script. The results: 3 codepoints unused in RFC but defined in UNI: f0, f6, f7 1 codepoint unused in UNI but defined in RFC: 7f 2 codepoints with slightly different character names, same meaning 9 codepoints with actually different definitions: a5: rfc 2219 BULLET OPERATOR uni 2022 BULLET c4: rfc e023 DUTCH GUILDER SIGN (IBM437 159) uni 0192 LATIN SMALL LETTER F WITH HOOK c6: rfc 0394 GREEK CAPITAL LETTER DELTA uni 2206 INCREMENT c9: rfc 22ef MIDLINE HORIZONTAL ELLIPSIS uni 2026 HORIZONTAL ELLIPSIS d0: rfc 2014 EM DASH uni 2013 EN DASH d1: rfc 2013 EN DASH uni 2014 EM DASH d7: rfc 25c6 BLACK DIAMOND uni 25ca LOZENGE db: rfc 00a4 CURRENCY SIGN uni 20ac EURO SIGN f8: rfc 203e OVERLINE uni 00af MACRON a5 and c6 could be different interpretations of symbols that look pretty much the same. The introduction of the euro sign instead of the generic currency sign seems to be a recent modification documented in UNI. The change of the order of the dashes seems really confusing. Notice also this line in the RFC: rem source: The Unicode Standard ver1.0, ISBN 0-201-56788-1, Oct 1991 So it looks like the RFC used the unicode definition as its source. What part of it I'm not sure, and where the differences come I'm even less sure. My next steps: * Look for further references, e.g. from apple, and compare them as well * Try some things out on a mac, see how it behaves in real life * Compare all this to the current python implementation * Write a patch to either provide an alias or a new charset macintosh Help welcome. -- nosy: +gagern Added file: http://bugs.python.org/file12982/compare.pl ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue843590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com