[Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Martin Blais
Hi. Like a lot of people (or so I hear in the blogosphere...), I've been experiencing some friction in my code with unicode conversion problems. Even when being super extra careful with the types of str's or unicode objects that my variables can contain, there is always some case or oversight

Re: [Python-Dev] Tests and unicode

2005-10-03 Thread Reinhold Birkenfeld
Martin v. Löwis wrote: Reinhold Birkenfeld wrote: One problem is that no Unicode escapes can be used since compiling the file raises ValueErrors for them. Such strings would have to be produced using unichr(). You mean, in Unicode literals? There are various approaches, depending on

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Michael Hudson
Martin Blais [EMAIL PROTECTED] writes: What if we could completely disable the implicit conversions between unicode and str? In other words, if you would ALWAYS be forced to call either .encode() or .decode() to convert between one and the other... wouldn't that help a lot deal with that

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread M.-A. Lemburg
Reinhold Birkenfeld wrote: Martin v. Löwis wrote: Whether we think it should be supported depends on who we is, as with all these minor features: some think it is a waste of time, some think it should be supported if reasonably possible, and some think this a conditio sine qua non. It certainly

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit : What if we could completely disable the implicit conversions between unicode and str? This would be very annoying when dealing with some modules or libraries where the type (str / unicode) returned by a function depends on the

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: A good rule of thumb is to convert to unicode everything that is semantically textual and isn't pure ASCII. (anyone who are tempted to argue otherwise should benchmark their applications, both speed- and memorywise, and be prepared to come up with very strong arguments

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : Antoine Pitrou wrote: A good rule of thumb is to convert to unicode everything that is semantically textual and isn't pure ASCII. How can you be sure that something that is /semantically textual/ will always remain pure

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Martin Blais
On 10/3/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: I'm not sure it's a sensible default. Me neither, especially since this would make it impossible to write polymorphic code - e.g. ', '.join(list) wouldn't work anymore if list contains Unicode; dito for u', '.join(list) with list

Re: [Python-Dev] Divorcing str and unicode (no moreimplicit conversions).

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: A good rule of thumb is to convert to unicode everything that is semantically textual and isn't pure ASCII. How can you be sure that something that is /semantically textual/ will always remain pure ASCII ? is != will always remain /F

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton
Martin Blais wrote: Hi. Like a lot of people (or so I hear in the blogosphere...), I've been experiencing some friction in my code with unicode conversion problems. Even when being super extra careful with the types of str's or unicode objects that my variables can contain, there is always

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton
M.-A. Lemburg wrote: Michael Hudson wrote: Martin Blais [EMAIL PROTECTED] writes: What if we could completely disable the implicit conversions between unicode and str? In other words, if you would ALWAYS be forced to call either .encode() or .decode() to convert between one and the other...

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Jim Fulton wrote: I would argue that it's evil to change the default encoding in the first place, except in this case to disable implicit encoding or decoding. absolutely. unfortunately, all attempts to add such information to the sys module documentation seem to have failed... (last time I

[Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Piet Delport
PEP 255 (Simple Generators) closes with: Q. Then why not allow an expression on return too? A. Perhaps we will someday. In Icon, return expr means both I'm done, and but I have one final useful value to return too, and this is it. At the start, and in the absence of compelling uses

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Josiah Carlson
Antoine Pitrou [EMAIL PROTECTED] wrote: Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : Antoine Pitrou wrote: A good rule of thumb is to convert to unicode everything that is semantically textual and isn't pure ASCII. How can you be sure that something that

[Python-Dev] PEP 343 and __with__

2005-10-03 Thread Jason Orendorff
I'm -1 on PEP 343. It seems ...complex. And even with all the complexity, I *still* won't be able to type with self.lock: ... which I submit is perfectly reasonable, clean, and clear. Instead I have to type with locking(self.lock): ... where locking() is apparently either a new

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Josiah Carlson wrote: and isn't pure ASCII. How can you be sure that something that is /semantically textual/ will always remain pure ASCII ? That's contradictory, unless your software never goes out of the anglo-saxon world (and even...). Non-unicode text input widgets. Works

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Phillip J. Eby
At 12:37 PM 10/3/2005 -0400, Jason Orendorff wrote: I'm -1 on PEP 343. It seems ...complex. And even with all the complexity, I *still* won't be able to type with self.lock: ... which I submit is perfectly reasonable, clean, and clear. Which is why it's proposed to add __enter__/__exit__

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Antoine Pitrou
Hi, Josiah: How can you be sure that something that is /semantically textual/ will always remain pure ASCII ? That's contradictory, unless your software never goes out of the anglo-saxon world (and even...). Non-unicode text input widgets. You didn't understand my statement. I didn't

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Guido van Rossum
For the record, I very much want PEPs 342 and 343 implemented. I haven't had the time to look at the patch and don't expect to find the time any time soon, but it's not for lack of desire to see this feature implemented. I don't like Jason's __with__ proposal and even less like his idea to drop

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: Under the default encoding (and quite a few other encodings), that's true for plain ascii strings and Unicode strings. If I have an unicode string containing legal characters greater than 0x7F, and I pass it to a function which converts it to str, the conversion

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Phillip J. Eby
At 07:02 PM 10/3/2005 +0100, Michael Hudson wrote: Phillip J. Eby [EMAIL PROTECTED] writes: Since the PEP is accepted and has patches for both its implementation and a good part of its documentation, a major change like this would certainly need a better rationale. Though given the

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Antoine Pitrou
Hi, Le lundi 03 octobre 2005 à 20:37 +0200, Fredrik Lundh a écrit : If I have an unicode string containing legal characters greater than 0x7F, and I pass it to a function which converts it to str, the conversion fails. so? if it does that, it's not unicode safe. [...] what's that

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: If I have an unicode string containing legal characters greater than 0x7F, and I pass it to a function which converts it to str, the conversion fails. so? if it does that, it's not unicode safe. [...] what's that has to do with my argument (which is that

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread Martin v. Löwis
M.-A. Lemburg wrote: Is the added complexity needed to support not having Unicode support compiled into Python really worth it ? If there are volunteers willing to maintain it, and the other volunteers are not affected: certainly. I know that Martin introduced this feature a long time ago,

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: Is the added complexity needed to support not having Unicode support compiled into Python really worth it ? If there are volunteers willing to maintain it, and the other volunteers are not affected: certainly. No objections there. I only see that

Re: [Python-Dev] bytes type

2005-10-03 Thread Guido van Rossum
On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote: Could the bytes type be just the same as the current str type but without the implicit unicode conversion ? Or am I missing some desired functionality ? No. It will be a mutable array of bytes. It will intentionally resemble strings as little

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Jason Orendorff
Phillip J. Eby writes: You didn't offer any reasons why this would be useful and/or good. It makes it dramatically easier to write Python classes that correctly support 'with'. I don't see any simple way to do this under PEP 343; the only sane thing to do is write a separate @contextmanager

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Martin Blais
On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote: If that's how things were designed, then Python's entire standard brary (not to mention third-party libraries) is not unicode safe - to quote your own words - since many functions may return 8-bit strings containing non-ascii

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Phillip J. Eby
At 05:15 PM 10/3/2005 -0400, Jason Orendorff wrote: Phillip J. Eby writes: You didn't offer any reasons why this would be useful and/or good. It makes it dramatically easier to write Python classes that correctly support 'with'. I don't see any simple way to do this under PEP 343; the only

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread M.-A. Lemburg
Martin Blais wrote: On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote: If that's how things were designed, then Python's entire standard brary (not to mention third-party libraries) is not unicode safe - to quote your own words - since many functions may return 8-bit strings containing

Re: [Python-Dev] bytes type

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 14:02 -0700, Guido van Rossum a écrit : On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote: Could the bytes type be just the same as the current str type but without the implicit unicode conversion ? Or am I missing some desired functionality ? No. It will be a

Re: [Python-Dev] bytes type

2005-10-03 Thread Guido van Rossum
This would presumaby support the (read-only part of the) buffer API so search would be covered. I don't see a use case for replace. Alternatively, you could always specify Latin-1 as the encoding and convert it that way -- I don't think there's any input that can cause Latin-1 decoding to fail.

Re: [Python-Dev] bytes type

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 17:42 -0700, Guido van Rossum a écrit : I don't see a use case for replace. Agreed. Alternatively, you could always specify Latin-1 as the encoding and convert it that way -- I don't think there's any input that can cause Latin-1 decoding to fail. You seem to be

Re: [Python-Dev] 64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit linux problem)

2005-10-03 Thread Viren Shah
Phillip J. Eby wrote: At 09:49 AM 9/29/2005 -0400, Viren Shah wrote: [I sent this earlier without being a subscriber and it was sent to the moderation queue so I'm resending it after subscribing] Hi, I'm running a 64-bit Fedora Core 3 with python 2.3.4. I'm trying to install

Re: [Python-Dev] 64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit linux problem)

2005-10-03 Thread Viren Shah
Phillip J. Eby wrote: At 12:14 PM 9/29/2005 -0400, Viren Shah wrote: File /root/svn-install-apps/setuptools-0.6a4/pkg_resources.py, line 949, in _get return self.loader.get_data(path) OverflowError: signed integer is greater than maximum Interesting. That looks like it might be

[Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Piet Delport
PEP 255 (Simple Generators) closes with: Q. Then why not allow an expression on return too? A. Perhaps we will someday. In Icon, return expr means both I'm done, and but I have one final useful value to return too, and this is it. At the start, and in the absence of compelling uses

[Python-Dev] Unicode charmap decoders slow

2005-10-03 Thread Tony Nelson
Is there a faster way to transcode from 8-bit chars (charmaps) to utf-8 than going through unicode()? I'm writing a small card-file program. As a test, I use a 53 MB MBox file, in mac-roman encoding. My program reads and parses the file into messages in about 3 to 5 seconds (Wow! Go Python!),

Re: [Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Christopher Armstrong
On 10/4/05, Piet Delport [EMAIL PROTECTED] wrote: One system that could benefit from this change is Christopher Armstrong's defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to use enhanced generators. The resulting code is much cleaner than before, and closer to the

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-03 Thread jepler
As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very slow compared to encoding or decoding with utf-8. Here I'm working with 53k of data instead of 53 megs. (Note: this is a laptop, so it's possible that thermal or battery management features affected these numbers a

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread skip
Antoine If an stdlib function returns an 8-bit string containing Antoine non-ascii data, then this string used in unicode context incurs Antoine an implicit conversion, which fails. Such strings should be converted to Unicode at the point where they enter the application. That's

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread James Y Knight
On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote: Antoine Pitrou wrote: If I have an unicode string containing legal characters greater than 0x7F, and I pass it to a function which converts it to str, the conversion fails. so? if it does that, it's not unicode safe. [...] what's