On Tue, Mar 17, 2009 at 5:52 AM, Victor Stinner <victor.stin...@haypocalc.com> wrote: > I realised with the issue #3446 that getarg('c') (get a byte) accepts not only > a byte string of 1 byte, but also an unicode string of 1 character (if the > character code is in [0; 255]). I don't think that it's a good idea to accept > unicode here. Example: b"x".center(5, "\xe9") should be a TypeError.
Agreed. > The "C" format (get a character) has the opposite problem: it accepts both > byte and unicode, whereas byte should be rejected. Example: > mmap.write_byte('é') should be a TypeError. YEah, mmap should be defined exclusively in terms of bytes. > The problem was already discuss in the email thread "What type of object > mmap.read_byte should return on py3k?" started by Hirokazu Yamamoto related > to issue #5391. > > Short history: > - r55109: Guido changes 'c' format to accept unicode (struni branch). > getarg('c') => char accepts byte and character > - r56044: walter.doerwald changes the 'c' format to return an int (an > unicode character) for datetime.datetime.isoformat(). > getarg('c') => int accepts byte and character > - r56140: Revert r56044 and creates 'C' format > getarg('c') => char accepts byte and character > getarg('C') => int accepts byte and character > > So we have: > - getarg('c') -> one byte (integer in [0; 255]) > - getarg('C') -> one character (code in [0; INTMAX]) > Note: Why not using Py_UNICODE instead of int? > > Usage of "C" format: > datetime.datetime.isoformat(sep) > array.array(type, data): type > > Usage of "c" format: > msvcrt.putch(char) > msvcrt.ungetch(char) ISTM that putch() and ungetch() are text operations so should use 'C'. > <mmap object>.write_byte(char) -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com