Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Vlastimil Brom
2010/12/7 Alexander Belopolsky alexander.belopol...@gmail.com: On Sat, Dec 4, 2010 at 5:58 PM, Martin v. Löwis mar...@v.loewis.de wrote: I actually wonder if Python's re module can claim to provide even Basic Unicode Support. Do you really wonder? Most definitely it does not. Were you more

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Alexander Belopolsky
On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom vlastimil.b...@gmail.com wrote: .. It seems, e.g. in Perl, there are some omissions too http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-Support-Level Do you know of any re engine fully complying to to tr18, even at the first

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Martin v. Löwis
Am 07.12.2010 04:03, schrieb Alexander Belopolsky: On Sat, Dec 4, 2010 at 5:58 PM, Martin v. Löwis mar...@v.loewis.de wrote: I actually wonder if Python's re module can claim to provide even Basic Unicode Support. Do you really wonder? Most definitely it does not. Were you more optimistic

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Alexander Belopolsky
On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom vlastimil.b...@gmail.com wrote: .. Do you know of any re engine fully complying to to tr18, even at the first level: Basic Unicode Support? ICU Regular Expressions conform to Unicode Technical Standard #18 , Unicode Regular Expressions, level 1,

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Vlastimil Brom
2010/12/7 Alexander Belopolsky alexander.belopol...@gmail.com: On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom vlastimil.b...@gmail.com wrote: .. Do you know of any re engine fully complying to to tr18, even at the first level: Basic Unicode Support? ICU Regular Expressions conform to

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-06 Thread Alexander Belopolsky
On Sat, Dec 4, 2010 at 5:58 PM, Martin v. Löwis mar...@v.loewis.de wrote: I actually wonder if Python's re module can claim to provide even Basic Unicode Support. Do you really wonder? Most definitely it does not. Were you more optimistic four years ago?

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-04 Thread Stephen J. Turnbull
Antoine Pitrou writes: Le vendredi 03 décembre 2010 à 13:58 +0900, Stephen J. Turnbull a écrit : Antoine Pitrou writes: The legacy format argument looks like a red herring to me. When converting from a format to another it is the programmer's job to his/her job right.

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-04 Thread Antoine Pitrou
Le samedi 04 décembre 2010 à 17:13 +0900, Stephen J. Turnbull a écrit : Antoine Pitrou writes: Le vendredi 03 décembre 2010 à 13:58 +0900, Stephen J. Turnbull a écrit : Antoine Pitrou writes: The legacy format argument looks like a red herring to me. When converting

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-04 Thread Alexander Belopolsky
On Fri, Dec 3, 2010 at 12:10 AM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: .. I don't think decimal module should support non-European decimal digits.  The only place where it can make some sense is in int() because here we have a fighting chance of producing a reasonable

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-04 Thread Martin v. Löwis
I actually wonder if Python's re module can claim to provide even Basic Unicode Support. Do you really wonder? Most definitely it does not. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-03 Thread Neil Hodgson
Stephen J. Turnbull: Will it accept Arabic on input?  (Han might be too much to ask for since Unicode considers Han digits to be impure.) I couldn't find a direct way to input Arabic digits into OO Calc, the normal use of Alt+number didn't work in Calc although it did in WordPad where

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-03 Thread M.-A. Lemburg
Alexander Belopolsky wrote: On Thu, Dec 2, 2010 at 5:58 PM, M.-A. Lemburg m...@egenix.com wrote: .. I will change my mind on this issue when you present a machine-readable file with Arabic-Indic numerals and a program capable of reading it and show that this program uses the same number

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-03 Thread Antoine Pitrou
Le vendredi 03 décembre 2010 à 13:58 +0900, Stephen J. Turnbull a écrit : Antoine Pitrou writes: The legacy format argument looks like a red herring to me. When converting from a format to another it is the programmer's job to his/her job right. Uhmm, the argument *for* this

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Neil Hodgson
Stephen J. Turnbull: Here's why: '''print %d % some_integer''' doesn't now, and never will (unless Kristan gets his Python 2.8wink), produce Arabic or Han numerals.  Not in any language I know of, not in Microsoft Excel, and definitely not in Python 2. While I don't have Excel to test

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Georg Brandl
Am 01.12.2010 23:39, schrieb Martin v. Löwis: As of today, What’s New In Python 3.2 [1] does not even mention the unicodedata upgrade to 6.0.0. One reason was that I was instructed not to change What's New a few years ago. Maybe all past, present and future whatsnew maintainers can agree on

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Lennart Regebro
2010/12/2 Stephen J. Turnbull step...@xemacs.org: Because that works, but print(T1234) doesn't (it prints ASCII).  You can't round-trip, but users will want/expect that. You should be able to round-trip, absolutely. I don't think you should expect print() to do that. str(56) possibly. :)

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Antoine Pitrou
On Wed, 1 Dec 2010 22:28:49 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: Both my personal observations when travelling from Turkey to India and Wikipedia say yes. When representing a number in Arabic, the lowest-valued position is placed on the right, so the order of

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 8:36 AM, Antoine Pitrou solip...@pitrou.net wrote: On Wed, 1 Dec 2010 22:28:49 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: .. This matches my limited research on this topic as well.  However, I am not sure that when these codes are embedded in Arabic

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Antoine Pitrou
Le jeudi 02 décembre 2010 à 11:41 -0500, Alexander Belopolsky a écrit : Note that my point is not to find the correct answer here, but to demonstrate that we as a group don't have the expertise to get parsing of Arabic text right. I don't understand why you think Arabic or Hebrew text is any

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 11:56 AM, Antoine Pitrou solip...@pitrou.net wrote: Le jeudi 02 décembre 2010 à 11:41 -0500, Alexander Belopolsky a écrit : Note that my point is not to find the correct answer here, but to demonstrate that we as a group don't have the expertise to get parsing of Arabic

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Antoine Pitrou
Le jeudi 02 décembre 2010 à 13:14 -0500, Alexander Belopolsky a écrit : I don't understand why you think Arabic or Hebrew text is any different from Western text. Surely right-to-left isn't more conceptually complicated than left-to-right, is it? No, but a mix of LTR and RTL is

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Am 02.12.2010 03:01, schrieb Ben Finney: Stephen J. Turnbull step...@xemacs.org writes: Furthermore, he provided good *objective* reason (excessive cost, to which I can also testify, in several different input methods for Japanese) why numbers simply would not be input that way. What's

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Maybe all past, present and future whatsnew maintainers can agree on these rules, which I copied directly from whatsnew/3.2.rst? I don't think all past maintainers can (I'm pretty certain that AMK would disagree), but if that's the current policy, I can certainly try following it (I didn't know

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Martin v. Löwis wrote: Now, one may wonder what precisely a possibly signed floating point number is, but most likely, this refers to floatnumber ::= pointfloat | exponentfloat pointfloat::= [intpart] fraction | intpart . exponentfloat ::= (intpart | pointfloat) exponent intpart

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Georg Brandl
Am 02.12.2010 20:40, schrieb Martin v. Löwis: Maybe all past, present and future whatsnew maintainers can agree on these rules, which I copied directly from whatsnew/3.2.rst? I don't think all past maintainers can Yes, and the same goes for the future ones, since they may not even know yet

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing system in which '١٢٣٤.٥٦e4' means 12345600.0. I'm not sure what you're after here. That

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Martin v. Löwis wrote: [...] For direct entry by an interactive user, yes. Why are some people in this discussion thinking only of direct entry by an interactive user? Ultimately, somebody will have entered the data. I don't think you really believe that all data processed by a computer was

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Arabic numerals are being used a lot nowadays in Asian countries, but that doesn't mean that the native script versions are not being used anymore. I never claimed that people are not using their local scripts to enter numbers. However, none of your examples is about Chinese numerals using an

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Steven D'Aprano
Martin v. Löwis wrote: Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing system in which '١٢٣٤.٥٦e4' means 12345600.0. I'm not sure what

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 1:55 PM, Antoine Pitrou solip...@pitrou.net wrote: .. I don't think so.  str.split() and str.splitlines() are also defined in conformance to the SPEC, AFAIK.  They certainly try to. You are joking, right? Where exactly does Unicode specify something like this:

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Antoine Pitrou
Le jeudi 02 décembre 2010 à 16:34 -0500, Alexander Belopolsky a écrit : On Thu, Dec 2, 2010 at 1:55 PM, Antoine Pitrou solip...@pitrou.net wrote: .. I don't think so. str.split() and str.splitlines() are also defined in conformance to the SPEC, AFAIK. They certainly try to. You are

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Am 02.12.2010 22:30, schrieb Steven D'Aprano: Martin v. Löwis wrote: Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing system in which

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg m...@egenix.com wrote: .. Have you tried Google ? I tried google at I could not find any plain text or HTML file that would use Arabic-Indic numerals. What was interesting, though that a search for quran unicode (without quotes). Brought me to

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Mark Dickinson
On Thu, Dec 2, 2010 at 8:23 PM, Martin v. Löwis mar...@v.loewis.de wrote: In the case of number parsing, I think Python would be better if float() rejected non-ASCII strings, and any support for such parsing should be redone correctly in a different place (preferably along with printing of

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Eric Smith
On 12/2/2010 4:48 PM, Martin v. Löwis wrote: Am 02.12.2010 22:30, schrieb Steven D'Aprano: Martin v. Löwis wrote: Then these users should speak up and indicate their need, or somebody should speak up and confirm that there are users who actually want '١٢٣٤.٥٦' to denote 1234.56. To my

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Eric Smith wrote: The current behavior should go nowhere; it is not useful. Something very similar to the current behavior (but done correctly) should go into the locale module. I agree with everything Martin says here. I think the basic premise is: you won't find strings in the wild that

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Alexander Belopolsky wrote: On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg m...@egenix.com wrote: .. Have you tried Google ? I tried google at I could not find any plain text or HTML file that would use Arabic-Indic numerals. What was interesting, though that a search for quran unicode

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Terry Reedy wrote: On 11/29/2010 10:19 AM, M.-A. Lemburg wrote: Nick Coghlan wrote: On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburgm...@egenix.com wrote: If we would go down that road, we would also have to disable other Unicode features based on locale, e.g. whether to apply non-ASCII case

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 5:58 PM, M.-A. Lemburg m...@egenix.com wrote: .. I will change my mind on this issue when you present a machine-readable file with Arabic-Indic numerals and a program capable of reading it and show that this program uses the same number parsing algorithm as Python's

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
Am 02.12.2010 23:43, schrieb M.-A. Lemburg: Eric Smith wrote: The current behavior should go nowhere; it is not useful. Something very similar to the current behavior (but done correctly) should go into the locale module. I agree with everything Martin says here. I think the basic premise

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Eric Smith
On 12/2/2010 5:43 PM, M.-A. Lemburg wrote: Eric Smith wrote: The current behavior should go nowhere; it is not useful. Something very similar to the current behavior (but done correctly) should go into the locale module. I agree with everything Martin says here. I think the basic premise is:

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Martin v. Löwis
The point is that we support all of Unicode in Python, not just a fragment, and therefore the numeric constructors support all of Unicode. That conclusion is as false today as it was in Python 1.6, but only now people start caring about that. a) we don't support all of Unicode in numeric

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread M.-A. Lemburg
Eric Smith wrote: On 12/2/2010 5:43 PM, M.-A. Lemburg wrote: Eric Smith wrote: The current behavior should go nowhere; it is not useful. Something very similar to the current behavior (but done correctly) should go into the locale module. I agree with everything Martin says here. I think

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg m...@egenix.com wrote: .. Some examples: http://www.bdl.gov.lb/circ/intpdf/int123.pdf I looked at this one more closely. While I cannot understand what it says, It appears that Arabic numerals are used in dates. It looks like Python want be able

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Steven D'Aprano
Stephen J. Turnbull wrote: Steven D'Aprano writes: With full respect to haiyang kang, hear-say from one person can hardly be described as strong evidence That's *disrespectful* nonsense. What Haiyang reported was not hearsay, it's direct observation of what he sees around him and

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Terry Reedy
On 12/2/2010 6:54 PM, Alexander Belopolsky wrote: On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburgm...@egenix.com wrote: .. Some examples: http://www.bdl.gov.lb/circ/intpdf/int123.pdf I looked at this one more closely. While I cannot understand what it says, It appears that Arabic numerals

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Stephen J. Turnbull
Lennart Regebro writes: 2010/12/2 Stephen J. Turnbull step...@xemacs.org: T1000 = float('一.◯◯◯') That was already discussed here, and it's clear that unicode does not consider these characters to be something you can use in a decimal number, and hence it's not broken. Huh? IOW,

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread haiyang kang
Furthermore, data can well originate from texts that were written hundreds or even thousands of years ago, so there is plenty of material available for processing. humm..., for this, i think we need a special tuned language processing system to handle this, and one subsystem for one language

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Stephen J. Turnbull
Neil Hodgson writes: While I don't have Excel to test with, OpenOffice.org Calc will display in Arabic or Han numerals using the NatNum format codes. Display is different from input, but at least this is concrete evidence. Will it accept Arabic on input? (Han might be too much to ask

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Stephen J. Turnbull
Antoine Pitrou writes: The legacy format argument looks like a red herring to me. When converting from a format to another it is the programmer's job to his/her job right. Uhmm, the argument *for* this feature proposed by several people is that Python's numeric constructors do it

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Alexander Belopolsky
On Thu, Dec 2, 2010 at 4:57 PM, Mark Dickinson dicki...@gmail.com wrote: .. (the decimal spec requires that non-European digits be accepted). Mark, I think *requires* is too strong of a word to describe what the spec says. The decimal module documentation refers to two authorities: 1. IBM’s

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
Terry Reedy wrote: On 11/30/2010 10:05 AM, Alexander Belopolsky wrote: My general answers to the questions you have raised are as follows: 1. Each new feature release should use the latest version of the UCD as of the first beta release (or perhaps a week or so before). New chars are new

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
Martin v. Löwis wrote: Am 30.11.2010 21:24, schrieb Ben Finney: haiyang kang corn...@gmail.com writes: I think it is a little ugly to have code like this: num = float(一.一), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
Terry Reedy wrote: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese ASCII numerals or Arabic cursive numerals in for i in range(...) for example. I do not

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano
Martin v. Löwis wrote: Am 30.11.2010 23:43, schrieb Terry Reedy: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese ASCII numerals or Arabic cursive numerals in

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Lennart Regebro
On Tue, Nov 30, 2010 at 09:23, Stephen J. Turnbull step...@xemacs.org wrote: Sure you can.  In Python program text, all keywords will be ASCII Yes, yes, sure, but not the contents of variables, I see no reason not to make a similar promise for numeric literals. Wait what, literas? The example

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburg m...@egenix.com wrote: .. With Python 3.1: exec('\u0CF1 = 1') Traceback (most recent call last):  File stdin, line 1, in module  File string, line 1    ೱ = 1      ^ SyntaxError: invalid character in identifier but with Python 3.2a4:

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Terry Reedy
On 12/1/2010 12:55 PM, Alexander Belopolsky wrote: On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburgm...@egenix.com wrote: .. With Python 3.1: exec('\u0CF1 = 1') Traceback (most recent call last): File stdin, line 1, inmodule File string, line 1 ೱ = 1 ^ SyntaxError: invalid

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
And here, my observation stands: if they wanted to, they currently couldn't - at least not for real numbers (and also not for integers if they want to use grouping). So the presumed application of this feature doesn't actually work, despite the presence of the feature it was supposedly meant

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
I think the OP (haiyang kang) already indicated that he finds it quite unlikely that anybody would possibly want to enter that. Who's talking about *entering* it into the program at a keyboard directly, though? Input to a program can come from all kinds of crazy sources. Just because it

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
As of today, What’s New In Python 3.2 [1] does not even mention the unicodedata upgrade to 6.0.0. One reason was that I was instructed not to change What's New a few years ago. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano
Martin v. Löwis wrote: I think the OP (haiyang kang) already indicated that he finds it quite unlikely that anybody would possibly want to enter that. Who's talking about *entering* it into the program at a keyboard directly, though? Input to a program can come from all kinds of crazy sources.

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 5:36 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. Note that I'm not saying this is common. Nor am I saying it's a desirable situation. I'm saying it is a feasible use case, to be dismissed only if there is strong evidence that it's not used by existing Python code.

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano
Martin v. Löwis wrote: And here, my observation stands: if they wanted to, they currently couldn't - at least not for real numbers (and also not for integers if they want to use grouping). So the presumed application of this feature doesn't actually work, despite the presence of the feature it

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Lennart Regebro writes: On Tue, Nov 30, 2010 at 09:23, Stephen J. Turnbull step...@xemacs.org wrote: Sure you can.  In Python program text, all keywords will be ASCII Yes, yes, sure, but not the contents of variables, Irrelevant, you're not converting these to a string

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 7:17 PM, Steven D'Aprano st...@pearwood.info wrote: .. we should continue to support the existing behaviour. None of the arguments against it seem convincing to me, particularly since the opponents of the current behaviour admit that there is a use-case for it, but they

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Steven D'Aprano writes: With full respect to haiyang kang, hear-say from one person can hardly be described as strong evidence That's *disrespectful* nonsense. What Haiyang reported was not hearsay, it's direct observation of what he sees around him and personal experience, plus

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Ben Finney
Stephen J. Turnbull step...@xemacs.org writes: Furthermore, he provided good *objective* reason (excessive cost, to which I can also testify, in several different input methods for Japanese) why numbers simply would not be input that way. What's left is copy/paste via the mouse. For direct

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Terry Reedy
On 12/1/2010 7:44 PM, Alexander Belopolsky wrote: it. The argument was that if there was a use case for parsing Eastern Arabic numerals, it would be better served by a module written by someone who speaks one of the Arabic languages and knows the details of how Eastern Arabic numerals are

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 10:11 PM, Terry Reedy tjre...@udel.edu wrote: On 12/1/2010 7:44 PM, Alexander Belopolsky wrote: it.  The argument was that if there was a use case for parsing Eastern Arabic numerals, it would be better served by a module written by someone who speaks one of the Arabic

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Ben Finney writes: Input from an existing text file, as I said earlier. Or any other way of text data making its way into a Python program. Direct entry at the console is a red herring. I don't think it is. Not at all. Here's why: '''print %d % some_integer''' doesn't now, and never

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Lennart Regebro
On Sun, Nov 28, 2010 at 21:24, Alexander Belopolsky alexander.belopol...@gmail.com wrote: While we have little choice but to follow UCD in defining str.isidentifier(), I think Python can promise users more stability in what it treats as space or as a digit in its builtins. Why? I can see this

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Hagen Fürstenau
During PEP 3003 discussion, it was suggested to handle it on a case by case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP 3003. It's covered by As the standard library is not directly tied to the language definition it is not covered by this moratorium. How is this

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Stephen J. Turnbull
Lennart Regebro writes: *I* think it is more important. In python 3, you can never ever assume anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8wink) for the forseeable future. I see no reason not to

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread haiyang kang
hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:一, 壹, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print 一 I think it is a

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Steven D'Aprano
haiyang kang wrote: hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:一, 壹, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print 一

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Steven D'Aprano
Stephen J. Turnbull wrote: Lennart Regebro writes: *I* think it is more important. In python 3, you can never ever assume anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8wink) for the forseeable

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
On Wed, 01 Dec 2010 00:23:22 +1100 Steven D'Aprano st...@pearwood.info wrote: But I think there is a good case for allowing the constructors int, float and complex to continue to accept numeric *strings* with non-ASCII digits. The code already exists, there's probably people out there who

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano st...@pearwood.info wrote: .. But you should be able to write: text = input(Enter a number using your preferred digits: ) num = float(text) without caring whether the user enters 一.一 or 1.1 or something else. I find it ironic that people who

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread haiyang kang
But you should be able to write: text = input(Enter a number using your preferred digits: ) num = float(text) without caring whether the user enters 一.一 or 1.1 or something else. yes. from logical point of view, this can happen. But i really doubt that if really there are users who would

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 4:13 PM, Martin v. Löwis mar...@v.loewis.de wrote: - Should Python documentation refer to the specific version of Unicode that it supports? You mean, mention it somewhere? Sure (although it would be nice if the documentation generator would automatically extract it

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang corn...@gmail.com wrote: But you should be able to write: text = input(Enter a number using your preferred digits: ) num = float(text) without caring whether the user enters 一.一 or 1.1 or something else. yes. from logical point of view, this

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Stefan Krah
Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang corn...@gmail.com wrote: But you should be able to write: text = input(Enter a number using your preferred digits: ) num = float(text) without caring whether the user enters 一.一 or

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: .. Still, if it's not detrimental and it it's not difficult to support, then why do you care? It is difficult to support.  A fix for issue10557 would be much simpler if we did not support non-European

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Michael Foord
On 30/11/2010 16:40, Alexander Belopolsky wrote: [snip...] And of course, unicodedata.digit('\U0001D7CE') 0 but int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 12:40 PM, Michael Foord fuzzy...@voidspace.org.uk wrote: .. If you think non-ASCII digits are not difficult to support, please contribute to the following tracker issues: Would moving this functionality to the locale module make the issues any easier to fix? Sure,

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Sure, if we code it in Python, supporting it will by much easier: def normalize_digits(s): digits = {m.group(1) for m in re.finditer('(\d)', s)} trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} return s.translate(trtab) normalize_digits('١٢٣٤.٥٦') '1234.56' I

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 1:29 PM, Antoine Pitrou solip...@pitrou.net wrote: .. I am not sure this belongs to the locale module, however.  It seems to me, something like 'unicodealgo' for unicode algorithms would be more appropriate. It could simply be in unicodedata if you split the

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 09:15, schrieb Hagen Fürstenau: During PEP 3003 discussion, it was suggested to handle it on a case by case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP 3003. It's covered by As the standard library is not directly tied to the language definition it is not

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Would moving this functionality to the locale module make the issues any easier to fix? You could delegate it to the C library, so: yes. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:16 +0100, Martin v. Löwis a écrit : Would moving this functionality to the locale module make the issues any easier to fix? You could delegate it to the C library, so: yes. I hope you don't suggest delegating it to the C locale functions. Do you?

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 20:23, schrieb Antoine Pitrou: Le mardi 30 novembre 2010 à 20:16 +0100, Martin v. Löwis a écrit : Would moving this functionality to the locale module make the issues any easier to fix? You could delegate it to the C library, so: yes. I hope you don't suggest delegating it to

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:40 +0100, Martin v. Löwis a écrit : Am 30.11.2010 20:23, schrieb Antoine Pitrou: Le mardi 30 novembre 2010 à 20:16 +0100, Martin v. Löwis a écrit : Would moving this functionality to the locale module make the issues any easier to fix? You could delegate

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Because we all know how locale is a pile of cr*p, both in specification and in implementations. Our unit tests for it are a clear proof of that. I wouldn't use expletives, but rather claim that the locale module is highly platform-dependent. Actually, I remember you saying that locale should

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:55 +0100, Martin v. Löwis a écrit : Wrt. to local number parsing, I think that the locale module would be way better than the nonsense that Python currently does. In the locale module, somebody at least has thought about what specifically constitutes a number.

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Ben Finney
haiyang kang corn...@gmail.com writes: I think it is a little ugly to have code like this: num = float(一.一), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the program. num =

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Terry Reedy
On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese ASCII numerals or Arabic cursive numerals in for i in range(...) for example. I do not think that anyone, at

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 21:24, schrieb Ben Finney: haiyang kang corn...@gmail.com writes: I think it is a little ugly to have code like this: num = float(一.一), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 23:43, schrieb Terry Reedy: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese ASCII numerals or Arabic cursive numerals in for i in range(...)

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Terry Reedy
On 11/30/2010 10:05 AM, Alexander Belopolsky wrote: My general answers to the questions you have raised are as follows: 1. Each new feature release should use the latest version of the UCD as of the first beta release (or perhaps a week or so before). New chars are new features and the beta

  1   2   >