[issue6058] Add cp65001 to encodings/aliases.py

2011-10-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I added a cp65001 codec to Python 3.3: see issue #13216.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-11-07 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Different tests proved that cp65001 can *not* be set as an alias to utf-8, and 
that's why I'm closing this issue.

Anyway, I don't think that cp65001 is configured by default on any Windows 
setup. It is only set by the user, using the chcp command, to try to display 
unicode characters in the Windows console: but it is not possible to display 
any unicode character in this console (see issue #1602). And chcp command 
should not be used in the Windows console because it does not only change the 
ANSI code page: it changes also the console code page, which is wrong (the 
console still expect text encoded to the previous code page).

It is possible to implement a codec for cp65001 using utf-8 existing codec in 
surrogatepass mode, or by using MultiByteToWideChar() / WideCharToMultiByte() 
with codepage=CP_UTF8. But I don't think that we need cp65001 at all.

If you need cp65001 for a good reason and you would like to implement a cp65001 
Python codec, open a new issue.

If you consider that we should use _O_U8TEXT or  _O_U16TEXT, open another new 
issue.

_O_U8TEXT or  _O_U16TEXT might improve unicode support if Python output is 
redirected to a pipe, but I don't think that it would help to display unicode 
character in the Windows console. I also fear that it breaks existing code and 
any function not aware of this special mode.

--
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-11-03 Thread David Sankel

Changes by David Sankel cam...@gmail.com:


--
nosy: +David.Sankel

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-11-03 Thread Michael Foord

Changes by Michael Foord mich...@voidspace.org.uk:


--
nosy:  -michael.foord

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-10-23 Thread David-Sarah Hopwood

David-Sarah Hopwood david-sa...@jacaranda.org added the comment:

This problem causes {{{os.getcwdu()}}} to fail when the console code page is 
set to 65001 (always, I think):
{{{
t:\ver

Microsoft Windows [Version 6.0.6002]

t:\chcp
Active code page: 65001

t:\python -c import os; print os.getcwdu()
Traceback (most recent call last):
  File string, line 1, in module
LookupError: unknown encoding: cp65001

t:\chcp 1252
Active code page: 1252

t:\python -c import os; print os.getcwdu()
t:\
}}}

Incidentally, I don't agree that this codepage needs to be distinguished from 
UTF-8. The deviations in the Microsoft codec are just their bugs. There is only 
one correct way to encode/decode UTF-8, and cp65001 is supposed to be UTF-8 
according to Microsoft (e.g. 
http://msdn.microsoft.com/en-us/library/86hf4sb8%28en-US,VS.80%29.aspx ).

--
nosy: +davidsarah

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-10-23 Thread David-Sarah Hopwood

David-Sarah Hopwood david-sa...@jacaranda.org added the comment:

I said: There is only one correct way to encode/decode UTF-8. This is true 
modulo differences in the treatment of initial byte order marks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-10-23 Thread David-Sarah Hopwood

David-Sarah Hopwood david-sa...@jacaranda.org added the comment:

I meant to say that the os.getcwdu() test in msg119440 was done with Windows 
native Python 2.6.2.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-10-23 Thread David-Sarah Hopwood

David-Sarah Hopwood david-sa...@jacaranda.org added the comment:

Oops, false alarm. python -c import os; print repr(os.getcwdu()) works as 
expected, so the exception is part of issue 1602.

(My command about there being no need to distinguish this codepage from UTF-8 
stands.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-07-09 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
versions:  -Python 2.6, Python 2.7, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-05-21 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Would it be possible to implement a cp65001 codec in Python using 
MultiByteToWideChar() / WideCharToMultiByte() with codepage=CP_UTF8?

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-01-13 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

I created two scripts for exporting the IronPython findings and checking them 
in CPython.

These are the results:

Checking code Page 28591 against encoding 'iso-8859-1' using file 
'iso-8859-1.map'

0 errors

Checking code Page 28592 against encoding 'iso-8859-2' using file 
'iso-8859-2.map'

0 errors

Checking code Page 28593 against encoding 'iso-8859-3' using file 
'iso-8859-3.map'

0 errors

Checking code Page 28594 against encoding 'iso-8859-4' using file 
'iso-8859-4.map'

0 errors

Checking code Page 28595 against encoding 'iso-8859-5' using file 
'iso-8859-5.map'

0 errors

Checking code Page 1201 against encoding 'utf-16-be' using file 'utf-16-be.map'

2048 errors

Checking code Page 1200 against encoding 'utf-16-le' using file 'utf-16-le.map'

2048 errors

Checking code Page 65000 against encoding 'utf-7' using file 'utf-7.map'

21 errors

Checking code Page 65001 against encoding 'utf-8' using file 'utf-8.map'

2048 errors

Result:

We can add aliases for the various ISO mappings, but not for the UTF ones. .NET 
encodes the surrogates differently than Python's codecs and
it also produces different results for UTF-7 than Python's codec.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-01-13 Thread Marc-Andre Lemburg

Changes by Marc-Andre Lemburg m...@egenix.com:


Added file: http://bugs.python.org/file15858/export-encodings.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-01-13 Thread Marc-Andre Lemburg

Changes by Marc-Andre Lemburg m...@egenix.com:


Added file: http://bugs.python.org/file15859/check-encodings.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-01-13 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

What we could do is add new codecs based on the .NET tables for cp65000 et al.

However, before doing this, I'd like to know where these code page settings can 
occur and what exact names are used for them. If they only appear in .NET and 
IronPython, I don't think it's worth adding extra codecs for the MS UTF 
variants.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2010-01-12 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
priority:  - high
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-22 Thread Stefan Krah

Stefan Krah stefan-use...@bytereef.org added the comment:

I wrote a small C application that converts all possible
wchar_t to multibyte strings, using code page 65001.

Usage:

cl.exe gen65001.c
python check65001.py

Except for the newline character and a sequence from
55296-57343, this code page matches UFT-8.


Note, however, that cp65001 is a pseudo code page:

http://www.postgresql.org/docs/faqs.FAQ_windows.html#2.6


For instance, setlocale will not work:

http://blogs.msdn.com/michkap/archive/2006/03/13/550191.aspx

--
nosy: +skrah
Added file: http://bugs.python.org/file15661/gen65001.c

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-22 Thread Stefan Krah

Changes by Stefan Krah stefan-use...@bytereef.org:


Added file: http://bugs.python.org/file15662/check65001.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-22 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

This report is really about the issues reported in #1602 and #7441, i.e.
where console output fails if the terminal encoding is 65001. Rather
than adding the alias, I would prefer to find out why terminal output
fails in that code page.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-22 Thread Christos Georgi ou

Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net added the 
comment:

re Martin's question, I can offer the indirect wisdom of Michael Kaplan
in this blog post:

http://blogs.msdn.com/michkap/archive/2008/03/18/8306597.aspx

where he mentions that the easiest way to output unicode text in the
Windows console, is:

int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n);
return 0;
}

_setmode being the special call needed.

I haven't tested with any _O_U8TEXT (if such a thing exists), I don't do
Windows anymore, therefore I can't provide a patch.

It also seems that Python —when stdin/stdout/stderr is under control of
a Windows console— doesn't use plain *printf functions. The example code
I offered in one of the other issues (dumb stdout doing plain .write as
UTF-8) runs and displays fine.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-22 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

I also wonder whether stdin/stdout/stderr should be streams on Windows
that use WriteConsole instead of WriteFile. Then the entire issue of
console CP would go away for Unicode output.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-21 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

(I tried running your script under IronPython 2.6 with Mono but I got a
bunch of errors; since I don't know IronPython at all I can't really
investigate)

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Could you provide some official reference defining the alias ?

Thanks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Nevermind, I found this reference:

http://msdn.microsoft.com/en-us/library/system.text.encoding(VS.80).aspx

Looks like we could add a few more aliases for other encodings as well.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-07 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 http://msdn.microsoft.com/en-us/library/system.text.encoding(VS.80).aspx
 
 Looks like we could add a few more aliases for other encodings as well.

I wouldn't trust this table. Microsoft is on record of implementing the
code pages with slight variations compared to other references for some
encodings (in particular the Asian ones). So unless there is an actual
documented need for a certain alias (and preferably a demonstration that
Microsoft's interpretation of the code page is the same as Python's),
I would advise against adding such aliases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Martin v. Löwis wrote:
 
 Martin v. Löwis mar...@v.loewis.de added the comment:
 
 http://msdn.microsoft.com/en-us/library/system.text.encoding(VS.80).aspx

 Looks like we could add a few more aliases for other encodings as well.
 
 I wouldn't trust this table. Microsoft is on record of implementing the
 code pages with slight variations compared to other references for some
 encodings (in particular the Asian ones). So unless there is an actual
 documented need for a certain alias (and preferably a demonstration that
 Microsoft's interpretation of the code page is the same as Python's),
 I would advise against adding such aliases.

Fair enough.

Could someone with some IronPython/.NET foo check whether the
code pages are the same as the Python codecs ?

The above page has some sample code to get started and IronPython
provides easy access to both the .NET codecs and the Python ones.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com



::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-07 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +michael.foord

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Here's a script for IronPython 2.6 that checks a few encoders.

Since IronPython doesn't appear to come with the full set of Python
codecs and it's also not clear whether the implemented codecs actually
match the default Python ones, I'm not sure how reliable this output is.

It's probably better to dump the encoded data to a file and compare
against a CPython run.

Anyway, here's the output:

Code Page 65000 vs. encoding 'utf-7'

0 errors

Code Page 65001 vs. encoding 'utf-8'

0 errors

Code Page 1200 vs. encoding 'utf-16-le'

0 errors

Code Page 1201 vs. encoding 'utf-16-be'

0 errors

Code Page 28591 vs. encoding 'iso-8859-1'

0 errors

--
Added file: http://bugs.python.org/file15477/testnetcodecs.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-12-05 Thread flox

Changes by flox la...@yahoo.fr:


--
versions: +Python 2.6, Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-05-26 Thread Χρήστος Γεωργίου (Christos g...@psf.upfronthosting.co.za, eorgiou)

New submission from Χρήστος Γεωργίου (Christos Georgiou) 
t...@users.sourceforge.net:

Add 'cp65001' (Microsoft term for UTF-8) as an alias to 'utf_8'

--
components: Library (Lib), Unicode
files: alias_cp65001.diff
keywords: patch
messages: 88060
nosy: tzot
severity: normal
status: open
title: Add cp65001 to encodings/aliases.py
type: feature request
versions: Python 2.7
Added file: http://bugs.python.org/file14013/alias_cp65001.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-05-26 Thread Χρήστος Γεωργίου (Christos g...@psf.upfronthosting.co.za, eorgiou)

Changes by Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net:


--
components: +Windows

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-05-26 Thread Χρήστος Γεωργίου (Christos g...@psf.upfronthosting.co.za, eorgiou)

Changes by Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net:


Removed file: http://bugs.python.org/file14013/alias_cp65001.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-05-26 Thread Χρήστος Γεωργίου (Christos g...@psf.upfronthosting.co.za, eorgiou)

Changes by Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net:


Added file: http://bugs.python.org/file14014/alias_cp65001.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-05-19 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +lemburg, loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6058] Add cp65001 to encodings/aliases.py

2009-05-18 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6058
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com