New submission from Alexander Pyhalov: When Python 2.6 (or 2.7) compiled with _XOPEN_SOURCE=600 on illumos string.lowercase and string.uppercase contain garbage when UTF-8 locale is used. (OpenIndiana bug report - https://www.illumos.org/issues/4411 ). The reason is that with UTF-8 locale islower()/isupper() and similar functions are not expected to work with non-ascii symbols. So, code like
n = 0; for (c = 0; c < 256; c++) { if (islower(c)) buf[n++] = c; } is expected to fail, because it calls islower on illegal UTF-8 symbols (with codes 128-255). It should be converted to something like n = 0; for (c = 0; c < 256; c++) { if (isascii(c) && islower(c)) buf[n++] = c; } or to n = 0; for (c = 0; c < 128; c++) { if (islower(c)) buf[n++] = c; } Before doing this you should check if locale is UTF-8. However, almost all non-C locales on illumos are UTF-8. Example of incorrect behavior: Python 2.6.9 (unknown, Nov 12 2013, 13:54:48) [GCC 4.7.3] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> import string >>> string.lowercase 'abcdefghijklmnopqrstuvwxyz\\xaa\\xb5\\xba\\xdf\\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff' >>> string.uppercase 'ABCDEFGHIJKLMNOPQRSTUVWXYZ\\xc0\\xc1\\xc2\\xc3\\xc4\\xc5\\xc6\\xc7\\xc8\\xc9\\xca\\xcb\\xcc\\xcd\\xce\\xcf\\xd0\\xd1\\xd2\\xd3\\xd4\\xd5\\xd6\\xd8\\xd9\\xda\\xdb\\xdc\\xdd\\xde' >>> ---------- components: Unicode messages: 206786 nosy: Alexander.Pyhalov, ezio.melotti, haypo priority: normal severity: normal status: open title: string.lowercase and string.uppercase can contain garbage type: behavior versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20049> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com