On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano <st...@pearwood.info> wrote: .. >> is more important than to assure users that once their program >> accepted some text as a number, they can assume that the text is >> ASCII. > > Seems like a pretty foolish assumption, if you ask me, pretty much akin to > assuming that if string.isalpha() returns true that string is ASCII. >
It is not to 99.9% of Python users whose code is written for 2.x. Their strings are byte strings and string.isdigit() does imply ASCII even if string.isalpha() does not in many locales. .. > The fact that this is (apparently) only being raised now means that it isn't > actually a problem in real life. I'd even say that it's a feature, and that > if Python didn't support non-Arabic numerals, it should. > I raised this problem because I found a bug that is related to this feature. The bug is also a regression from 2.x. In 2.7: >>> float(u'1234\xa1') .. ValueError: invalid literal for float(): 1234? The last character is lost, but the error message is still meaningful. In 3.x, however: >>> float('1234\xa1') .. ValueError See http://bugs.python.org/issue10557 While investigating this issue I found that by the time the string gets to the number parser (_Py_dg_strtod), all non-ascii characters are dropped by PyUnicode_EncodeDecimal() so it cannot produce meaningful diagnostic. Of course, PyUnicode_EncodeDecimal(), can be fixed by making it pass non-ascii chars through as UTF-8 bytes, but I was wondering if preserving the ability to parse exotic numerals was worth the effort. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com