[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-09-02 Thread Ned Deily

Ned Deily n...@acm.org added the comment:

Martin, the typo was fixed subsequently by r84231.

--
nosy: +ned.deily

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-08-30 Thread Martin von Gagern

Martin von Gagern martin.vgag...@gmx.net added the comment:

Maybe I'm missing something here, but r84229 looks to me like aliasing 
'macintosh' to itself, instead of to 'mac_roman'. 'csmacintosh' and 'mac' are 
not included at all, without any comment as to why they have been omitted. 
Makes me wonder why my issue843590_alias.patch wasn't applied as it is, but 
recreated instead.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-08-21 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Benjamin Peterson wrote:
 
 Benjamin Peterson benja...@python.org added the comment:
 
 r84229

Thanks, Benjamin !

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-08-20 Thread Amaury Forgeot d'Arc

Changes by Amaury Forgeot d'Arc amaur...@gmail.com:


--
keywords: +easy
resolution:  - accepted

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-08-20 Thread Benjamin Peterson

Benjamin Peterson benja...@python.org added the comment:

r84229

--
nosy: +benjamin.peterson
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-08-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Mark Lawrence wrote:
 
 Mark Lawrence breamore...@yahoo.co.uk added the comment:
 
 @Marc-Andre as there's no comments since your last post would you like to 
 take this forward, cheers.

I'm fine with adding the alias, but currently don't have any cycles
left to actually do the checkins, add the Misc/NEWS entry, update
the docs, etc.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-08-18 Thread Mark Lawrence

Mark Lawrence breamore...@yahoo.co.uk added the comment:

@Marc-Andre as there's no comments since your last post would you like to take 
this forward, cheers.

--
nosy: +BreamoreBoy
stage:  - patch review
versions: +Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-01-18 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Here's another reference I found:

http://developer.apple.com/legacy/mac/library/documentation/mac/Text/Text-30.html

It appears that the macintosh encoding is the same as the MacRoman one, but 
without the character D9-FF. The document also suggests that it's a really old 
encoding.

Here's a comparison of various Mac Roman mappings:

http://www.haible.de/bruno/charsets/conversion-tables/Mac-Roman.html

These include the macintosh charset name as well.

For all practical purposes, it appears to be safe to alias macintosh to 
mac-roman and also add the other suggested aliases from the IANA registry.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-01-15 Thread Martin von Gagern

Martin von Gagern martin.vgag...@gmx.net added the comment:

Find attached (issue843590_rfc.patch) an implementation of the macintosh 
encoding as the RFC defines it. I don't suggest its inclusion; I would prefer 
the alias of this implementation, but either one is better than no 'macintosh' 
encoding at all. So if you really want that, here it is.

--
keywords: +patch
Added file: http://bugs.python.org/file15896/issue843590_rfc.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2010-01-15 Thread Martin von Gagern

Martin von Gagern martin.vgag...@gmx.net added the comment:

And this patch (issue84359_alias.patch) is the alternative, 'macintosh' as an 
alias to 'mac_roman' as originally requested, along with a bunch of aliases 
registered with IANA. I'd prefer this approach over the preceding one, and hope 
someone will maybe review this for inclusion.

--
Added file: http://bugs.python.org/file15897/issue843590_alias.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2009-02-26 Thread Martin von Gagern

Martin von Gagern martin.vgag...@gmx.net added the comment:

I did some further investigations here. Apple doesn't seem likely to
offer any authoritative reference for the macintosh encoding, because
all they ever seem to talk about is Roman. The only source for
macintosh I could find is this RFC 1345, with the listed differences.
The RFC states the Unicode 1.0 standard as its source. Yesterday I went
to the library and thumbed through that volume. That, too, talks about
the different macintosh encodings, one of which is called Roman and
matches the one from current Unicode standards, except for 0xdb which
used to be the currency sign back then but is euro now. On 2009-02-09 I
also tried to ask Keld Simonsen, the author of the RFC, about this whole
issue. I got no reply so far.

On the whole, I get the impression that the macintosh encoding from
RFC 1345 is pretty much without actual use. I see no real world
application which actually uses it as it is defined, as most users
intend it as the IANA-registered name for mac-roman.

Python has two options, I believe. We could either do this by the book,
and implement an encoding as it was defined, even though there is no
known real world applicaton of that exact charset. Or we could be
pragmatic, and postulate that the RFC is simply wrong, and every real
world occurrence of macintosh intends to refer to mac-romand, in which
case an alias would be appropriate. I would say, let's be pragmatic.

When converting from unicode to macintosh, it might be possible to
accomodate both mappings, and in this way avoid unmappable characters.
As this doesn't deal well with the switched dashes, I guess I'd rather
not do this, in order to avoid subtle issues from going undetected. It
might be a good idea, however, to map both currecny sign and euro to the
same byte, and choose one when mapping back to unicode.

I don't think I can contribute much more information to this issue, and
seeing as it has been open for years without much input, I take it
neither will others. So I guess it is time to make a choice based on the
information available. By the book, or pragmatic?

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue843590] 'macintosh' encoding alias for 'mac_roman'

2009-02-08 Thread Martin von Gagern

Martin von Gagern martin.vgag...@gmx.net added the comment:

I had my first indication to rather use macintosh instead of
mac_roman from Wikipedia http://en.wikipedia.org/wiki/Mac_OS_Roman
which states that the charset part of a MIME content-type specification
should be maciontosh. I'm not quoting this as any kind of authority, but
rather to point out that it is likely for people to use this.

I did a comparison of http://tools.ietf.org/rfc/rfc1345.txt (RFC) and
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT (UNI)
using the attached perl script. The results:
3 codepoints unused in RFC but defined in UNI: f0, f6, f7
1 codepoint unused in UNI but defined in RFC: 7f
2 codepoints with slightly different character names, same meaning
9 codepoints with actually different definitions:

 a5: rfc 2219 BULLET OPERATOR
 uni 2022 BULLET
 c4: rfc e023 DUTCH GUILDER SIGN (IBM437 159)
 uni 0192 LATIN SMALL LETTER F WITH HOOK
 c6: rfc 0394 GREEK CAPITAL LETTER DELTA
 uni 2206 INCREMENT
 c9: rfc 22ef MIDLINE HORIZONTAL ELLIPSIS
 uni 2026 HORIZONTAL ELLIPSIS
 d0: rfc 2014 EM DASH
 uni 2013 EN DASH
 d1: rfc 2013 EN DASH
 uni 2014 EM DASH
 d7: rfc 25c6 BLACK DIAMOND
 uni 25ca LOZENGE
 db: rfc 00a4 CURRENCY SIGN
 uni 20ac EURO SIGN
 f8: rfc 203e OVERLINE
 uni 00af MACRON

a5 and c6 could be different interpretations of symbols that look pretty
much the same. The introduction of the euro sign instead of the generic
currency sign seems to be a recent modification documented in UNI. The
change of the order of the dashes seems really confusing.

Notice also this line in the RFC:
rem source: The Unicode Standard ver1.0, ISBN 0-201-56788-1, Oct 1991
So it looks like the RFC used the unicode definition as its source. What
part of it I'm not sure, and where the differences come I'm even less sure.

My next steps:
* Look for further references, e.g. from apple, and compare them as well
* Try some things out on a mac, see how it behaves in real life
* Compare all this to the current python implementation
* Write a patch to either provide an alias or a new charset macintosh
Help welcome.

--
nosy: +gagern
Added file: http://bugs.python.org/file12982/compare.pl

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue843590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com