Benjamin Peterson benja...@python.org added the comment:
I'm just going to close this and say use 3.3.
--
nosy: +benjamin.peterson
resolution: - out of date
status: open - closed
___
Python tracker rep...@bugs.python.org
STINNER Victor victor.stin...@haypocalc.com added the comment:
This issue has been fixed in Python 3.3 thanks to the PEP 393.
--
nosy: +haypo
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10521
Ezio Melotti ezio.melo...@gmail.com added the comment:
It can still be fixed on 2.7/3.2 though.
--
versions: +Python 2.7
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10521
___
Ezio Melotti ezio.melo...@gmail.com added the comment:
I agree that s.center(char, n).encode('utf-8') should be the same on both the
builds -- even if their len() will be different -- for the following reasons:
1) the string will eventually be encoded, and if they the result is the same on
Terry J. Reedy tjre...@udel.edu added the comment:
After reading the additional messages here and on a similar issue Alexander
opened after this, I seem the point of wanting to make the difference between
the two types of builds as transparent as sensibly possible. From that
viewpoint,
Terry J. Reedy tjre...@udel.edu added the comment:
As a practical matter, I think that for at least the next decade, people are at
least as likely to want to fill with a composed, multi-BMP-codepoint 'char'
(grapheme) as with a non-BMP char. So to me, failure with the latter is no
worse than
Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
On Fri, Nov 26, 2010 at 6:37 PM, Terry J. Reedy rep...@bugs.python.org wrote:
Terry J. Reedy tjre...@udel.edu added the comment:
As a practical matter, I think that for at least the next decade, people are
at least as
Eric Smith e...@trueblade.com added the comment:
I think these macros would be a reasonable approach. I think str.center, etc.
should support non-BMP chars, because to not do so can raise an exception.
Supporting composed graphemes seems like another problem altogether. And while
we could fix
New submission from Alexander Belopolsky belopol...@users.sourceforge.net:
'xyz'.center(20, '\U00100140')
Traceback (most recent call last):
File stdin, line 1, in module
TypeError: The fill character must be exactly one character long
str.ljust and str.rjust are similarly affected.
Antoine Pitrou pit...@free.fr added the comment:
The question is, what should it do with such an input? Pretend it's a single
char (but other chars in the source string won't get the same treatment)? Treat
it as a two-char string (but then center() and friends should logically be
extended to
Eric Smith e...@trueblade.com added the comment:
str.__format__ and friends (int, float, complex) also have this same problem.
For example, when they're computing the fill character:
format('', 'x^')
''
format('', '\U00100140^')
Traceback (most recent call last):
File stdin, line 1, in
Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
On Wed, Nov 24, 2010 at 10:33 AM, Antoine Pitrou rep...@bugs.python.org wrote:
..
The question is, what should it do with such an input?
I think the rule for such functions should be that if
input.encode('utf-8') is the
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
nosy: +ezio.melotti
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10521
___
___
Marc-Andre Lemburg m...@egenix.com added the comment:
Alexander Belopolsky wrote:
New submission from Alexander Belopolsky belopol...@users.sourceforge.net:
'xyz'.center(20, '\U00100140')
Traceback (most recent call last):
File stdin, line 1, in module
TypeError: The fill character
Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
On Wed, Nov 24, 2010 at 3:37 PM, Marc-Andre Lemburg
rep...@bugs.python.org wrote:
..
I don't think we should change that for the formatting methods.
That's a reasonable position. What about
'Lo'
'\N{OLD ITALIC LETTER
Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
On Wed, Nov 24, 2010 at 3:37 PM, Marc-Andre Lemburg
rep...@bugs.python.org wrote:
..
I don't think we should change that for the formatting methods.
That's a reasonable position. What about
unicodedata.category('\N{OLD
Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
Here is another str method not ready for non-BMP chars:
u = '\U00010140'
u.translate({ord(u):ord('A')})
'ŀ'
(expected 'A')
u = 'B'
u.translate({ord(u):ord('A')})
'A'
--
Ezio Melotti ezio.melo...@gmail.com added the comment:
I think that methods like str.isalpha can and should be fixed. Since
_PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be
changed to convert normal chars and surrogates pairs to a Py_UCS4 before
calling
Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
Here is another proof of concept patch for the isalpha issue that introduces a
higher level abstraction macro - Py_UNICODE_NEXT. It should be possible to
reuse this macro in all isxyz methods and other places where
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
nosy: +amaury.forgeotdarc
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10521
___
___
Amaury Forgeot d'Arc amaur...@gmail.com added the comment:
issue9200 already proposes a similar change to str.is* methods.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10521
___
21 matches
Mail list logo