Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Lennart Regebro
On Sun, Nov 28, 2010 at 21:24, Alexander Belopolsky alexander.belopol...@gmail.com wrote: While we have little choice but to follow UCD in defining str.isidentifier(), I think Python can promise users more stability in what it treats as space or as a digit in its builtins. Why? I can see this

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Hagen Fürstenau
During PEP 3003 discussion, it was suggested to handle it on a case by case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP 3003. It's covered by As the standard library is not directly tied to the language definition it is not covered by this moratorium. How is this

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Stephen J. Turnbull
Lennart Regebro writes: *I* think it is more important. In python 3, you can never ever assume anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8wink) for the forseeable future. I see no reason not to

Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Sylvain Thénault
On 29 novembre 14:21, Ron Adam wrote: On 11/29/2010 01:22 PM, Brett Cannon wrote: Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread haiyang kang
hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:一, 壹, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print 一 I think it is a

Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Tarek Ziadé
On Tue, Nov 30, 2010 at 7:33 AM, Éric Araujo mer...@netwok.org wrote: Good morning python-dev, PEP 291 (Backward Compatibility for Standard Library) does not seem to take Python 3 into account.  Is this PEP only relevant for the 2.7 branch?*  If it’s supposed to apply to 3.x too, despite the

Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Emile Anclin
On Monday 29 November 2010 20:22:22 Brett Cannon wrote: Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Steven D'Aprano
haiyang kang wrote: hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:一, 壹, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print 一

Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Michael Foord
On 30/11/2010 06:33, Éric Araujo wrote: Good morning python-dev, PEP 291 (Backward Compatibility for Standard Library) does not seem to take Python 3 into account. Is this PEP only relevant for the 2.7 branch?* If it’s supposed to apply to 3.x too, despite the view that 3.0 was a clean break,

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Steven D'Aprano
Stephen J. Turnbull wrote: Lennart Regebro writes: *I* think it is more important. In python 3, you can never ever assume anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8wink) for the forseeable

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
On Wed, 01 Dec 2010 00:23:22 +1100 Steven D'Aprano st...@pearwood.info wrote: But I think there is a good case for allowing the constructors int, float and complex to continue to accept numeric *strings* with non-ASCII digits. The code already exists, there's probably people out there who

[Python-Dev] Module size

2010-11-30 Thread Antoine Pitrou
On Mon, 29 Nov 2010 22:46:33 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: In practical terms, UCD comes at a price. The unicodedata module size is over 700K on my machine. This is almost half the size of the python executable and by far the largest extension module.

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano st...@pearwood.info wrote: .. But you should be able to write: text = input(Enter a number using your preferred digits: ) num = float(text) without caring whether the user enters 一.一 or 1.1 or something else. I find it ironic that people who

Re: [Python-Dev] Module size

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 29 Nov 2010 22:46:33 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: In practical terms, UCD comes at a price.  The unicodedata module size is over 700K on my machine.  This is almost half the

Re: [Python-Dev] Module size

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 09:32 -0500, Alexander Belopolsky a écrit : On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 29 Nov 2010 22:46:33 -0500 Alexander Belopolsky alexander.belopol...@gmail.com wrote: In practical terms, UCD comes at a price. The

Re: [Python-Dev] Module size

2010-11-30 Thread Tim Lesher
On Tue, Nov 30, 2010 at 09:41, Antoine Pitrou solip...@pitrou.net wrote: That said, I don't think the size is very important. For any non-trivial Python application, the size of unicodedata will be negligible compared to the size of Python objects. That depends very much on the platform and

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread haiyang kang
But you should be able to write: text = input(Enter a number using your preferred digits: ) num = float(text) without caring whether the user enters 一.一 or 1.1 or something else. yes. from logical point of view, this can happen. But i really doubt that if really there are users who would

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 4:13 PM, Martin v. Löwis mar...@v.loewis.de wrote: - Should Python documentation refer to the specific version of Unicode that it supports? You mean, mention it somewhere? Sure (although it would be nice if the documentation generator would automatically extract it

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang corn...@gmail.com wrote: But you should be able to write: text = input(Enter a number using your preferred digits: ) num = float(text) without caring whether the user enters 一.一 or 1.1 or something else. yes. from logical point of view, this

Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Barry Warsaw
On Nov 30, 2010, at 01:09 PM, Michael Foord wrote: PEP 291 is very old and should probably be retired. I don't think anyone is maintaining standard libraries in py3k that are also compatible with Python 2.anything. (At least not in a single codebase.) I agree. I think we should change the

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Stefan Krah
Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang corn...@gmail.com wrote: But you should be able to write: text = input(Enter a number using your preferred digits: ) num = float(text) without caring whether the user enters 一.一 or

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: .. Still, if it's not detrimental and it it's not difficult to support, then why do you care? It is difficult to support.  A fix for issue10557 would be much simpler if we did not support non-European

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Michael Foord
On 30/11/2010 16:40, Alexander Belopolsky wrote: [snip...] And of course, unicodedata.digit('\U0001D7CE') 0 but int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 12:40 PM, Michael Foord fuzzy...@voidspace.org.uk wrote: .. If you think non-ASCII digits are not difficult to support, please contribute to the following tracker issues: Would moving this functionality to the locale module make the issues any easier to fix? Sure,

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Sure, if we code it in Python, supporting it will by much easier: def normalize_digits(s): digits = {m.group(1) for m in re.finditer('(\d)', s)} trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} return s.translate(trtab) normalize_digits('١٢٣٤.٥٦') '1234.56' I

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 1:29 PM, Antoine Pitrou solip...@pitrou.net wrote: .. I am not sure this belongs to the locale module, however.  It seems to me, something like 'unicodealgo' for unicode algorithms would be more appropriate. It could simply be in unicodedata if you split the

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 09:15, schrieb Hagen Fürstenau: During PEP 3003 discussion, it was suggested to handle it on a case by case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP 3003. It's covered by As the standard library is not directly tied to the language definition it is not

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Would moving this functionality to the locale module make the issues any easier to fix? You could delegate it to the C library, so: yes. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:16 +0100, Martin v. Löwis a écrit : Would moving this functionality to the locale module make the issues any easier to fix? You could delegate it to the C library, so: yes. I hope you don't suggest delegating it to the C locale functions. Do you?

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 20:23, schrieb Antoine Pitrou: Le mardi 30 novembre 2010 à 20:16 +0100, Martin v. Löwis a écrit : Would moving this functionality to the locale module make the issues any easier to fix? You could delegate it to the C library, so: yes. I hope you don't suggest delegating it to

Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Brett Cannon
On Mon, Nov 29, 2010 at 12:21, Ron Adam r...@ronadam.com wrote: On 11/29/2010 01:22 PM, Brett Cannon wrote: On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault sylvain.thena...@logilab.fr  wrote: On 25 novembre 11:22, Ron Adam wrote: On 11/25/2010 08:30 AM, Emile Anclin wrote: hello,

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:40 +0100, Martin v. Löwis a écrit : Am 30.11.2010 20:23, schrieb Antoine Pitrou: Le mardi 30 novembre 2010 à 20:16 +0100, Martin v. Löwis a écrit : Would moving this functionality to the locale module make the issues any easier to fix? You could delegate

Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Brett Cannon
On Tue, Nov 30, 2010 at 00:34, Sylvain Thénault sylvain.thena...@logilab.fr wrote: On 29 novembre 14:21, Ron Adam wrote: On 11/29/2010 01:22 PM, Brett Cannon wrote: Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Because we all know how locale is a pile of cr*p, both in specification and in implementations. Our unit tests for it are a clear proof of that. I wouldn't use expletives, but rather claim that the locale module is highly platform-dependent. Actually, I remember you saying that locale should

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:55 +0100, Martin v. Löwis a écrit : Wrt. to local number parsing, I think that the locale module would be way better than the nonsense that Python currently does. In the locale module, somebody at least has thought about what specifically constitutes a number.

Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Brett Cannon
On Tue, Nov 30, 2010 at 07:35, Barry Warsaw ba...@python.org wrote: On Nov 30, 2010, at 01:09 PM, Michael Foord wrote: PEP 291 is very old and should probably be retired. I don't think anyone is maintaining standard libraries in py3k that are also compatible with Python 2.anything. (At least not

Re: [Python-Dev] ICU

2010-11-30 Thread Antoine Pitrou
Oh, about ICU: Actually, I remember you saying that locale should ideally be replaced with a wrapper around the ICU library. By that, I stand - however, I have given up the hope that this will happen anytime soon. Perhaps this could be made a GSOC topic. Regards Antoine.

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Ben Finney
haiyang kang corn...@gmail.com writes: I think it is a little ugly to have code like this: num = float(一.一), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the program. num =

Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Barry Warsaw
On Nov 30, 2010, at 12:11 PM, Brett Cannon wrote: I will channel Neal: I decline and/or do not want to respond. =) PEP 291 updated. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Terry Reedy
On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese ASCII numerals or Arabic cursive numerals in for i in range(...) for example. I do not think that anyone, at

Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Ron Adam
On 11/30/2010 01:41 PM, Brett Cannon wrote: On Mon, Nov 29, 2010 at 12:21, Ron Adamr...@ronadam.com wrote: On 11/29/2010 01:22 PM, Brett Cannon wrote: On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault sylvain.thena...@logilab.frwrote: On 25 novembre 11:22, Ron Adam wrote: On

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 21:24, schrieb Ben Finney: haiyang kang corn...@gmail.com writes: I think it is a little ugly to have code like this: num = float(一.一), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 23:43, schrieb Terry Reedy: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese ASCII numerals or Arabic cursive numerals in for i in range(...)

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Terry Reedy
On 11/30/2010 10:05 AM, Alexander Belopolsky wrote: My general answers to the questions you have raised are as follows: 1. Each new feature release should use the latest version of the UCD as of the first beta release (or perhaps a week or so before). New chars are new features and the beta

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Ben Finney
Martin v. Löwis mar...@v.loewis.de writes: Am 30.11.2010 21:24, schrieb Ben Finney: The string need not be a literal in the program; it can be input to the program. num = float(input_from_the_external_world) Does that change your assessment of whether non-ASCII digits are

[Python-Dev] I/O ABCs

2010-11-30 Thread Daniel Stutzbach
The documentation for the collections Abstract Base Classes (ABCs) [1] contains a table listing all of the collections ABCs, their parent classes, their abstract methods, and the methods they provide. This table makes it very easy to figure out which methods I must override when I derive from one

Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Nick Coghlan
On Wed, Dec 1, 2010 at 8:48 AM, Ron Adam r...@ronadam.com wrote: * It almost seems like the concept of a sub-module (in a package) is flawed.  I'm not sure I can explain what causes me to feel that way at the moment though. It isn't flawed, it is just a *lot* more complicated than most people

Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Nick Coghlan
On Wed, Dec 1, 2010 at 3:59 PM, Ron Adam r...@ronadam.com wrote: Yes, it's realising that it is a *lot* more *complicated*, that gets me. Flawed isn't the right word, it's rather a feeling things could have been simpler if perhaps some things were done differently. *That* feeling I can