Terry Reedy wrote: > On 11/29/2010 10:19 AM, M.-A. Lemburg wrote: >> Nick Coghlan wrote: >>> On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg<m...@egenix.com> wrote: >>>> If we would go down that road, we would also have to disable other >>>> Unicode features based on locale, e.g. whether to apply non-ASCII >>>> case mappings, what to consider whitespace, etc. >>>> >>>> We don't do that for a good reason: Unicode is supposed to be >>>> universal and not limited to a single locale. >>> >>> Because parsing numbers is about more than just the characters used >>> for the individual digits. There are additional semantics associated >>> with digit ordering (for any number) and decimal separators and >>> exponential notation (for floating point numbers) and those vary by >>> locale. We deliberately chose to make the builtin numeric parsers >>> unaware of all of those things, and assuming that we can simply parse >>> other digits as if they were their ASCII equivalents and otherwise >>> assume a C locale seems questionable. >> >> Sure, and those additional semantics are locale dependent, even >> between ASCII-only locales. However, that does not apply to the >> basic building blocks, the decimal digits themselves. >> >>> If the existing semantics can be adequately defined, documented and >>> defended, then retaining them would be fine. However, the language >>> reference needs to define the behaviour properly so that other >>> implementations know what they need to support and what can be chalked >>> up as being just an implementation accident of CPython. (As a point in >>> the plus column, both decimal.Decimal and fractions.Fraction were able >>> to handle the '١٢٣٤.٥٦' example in a manner consistent with the int >>> and float handling) >> >> The support is built into the C API, so there's not really much >> surprise there. >> >> Regarding documentation, we'd just have to add that numbers may >> be made up of an Unicode code point in the category "Nd". >> >> See http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf, section >> 4.6 for details.... >> >> """ >> Decimal digits form a large subcategory of numbers consisting of those >> digits that can be >> used to form decimal-radix numbers. They include script-specific >> digits, but exclude char- >> acters such as Roman numerals and Greek acrophonic numerals. (Note >> that<1, 5> = 15 = >> fifteen, but<I, V> = IV = four.) Decimal digits also exclude the >> compatibility subscript or >> superscript digits to prevent simplistic parsers from misinterpreting >> their values in context. >> """ >> >> int(), float() and long() (in Python2) are such simplistic >> parsers. > > Since you are the knowledgable advocate of the current behavior, perhaps > you could open an issue and propose a doc patch, even if not .rst > formatted.
Good suggestion. I tried to collect as much context as possible: http://bugs.python.org/issue10610 I'll leave the rst-magic to someone else, but will certainly help if you have more questions about the details. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 02 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com