Hi.
Like a lot of people (or so I hear in the blogosphere...), I've been
experiencing some friction in my code with unicode conversion
problems. Even when being super extra careful with the types of str's
or unicode objects that my variables can contain, there is always some
case or oversight
Martin v. Löwis wrote:
Reinhold Birkenfeld wrote:
One problem is that no Unicode escapes can be used since compiling
the file raises ValueErrors for them. Such strings would have to
be produced using unichr().
You mean, in Unicode literals? There are various approaches, depending
on
Martin Blais [EMAIL PROTECTED] writes:
What if we could completely disable the implicit conversions between
unicode and str? In other words, if you would ALWAYS be forced to
call either .encode() or .decode() to convert between one and the
other... wouldn't that help a lot deal with that
Reinhold Birkenfeld wrote:
Martin v. Löwis wrote:
Whether we think it should be supported depends
on who we is, as with all these minor features: some think it is
a waste of time, some think it should be supported if reasonably
possible, and some think this a conditio sine qua non. It certainly
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :
What if we could completely disable the implicit conversions between
unicode and str?
This would be very annoying when dealing with some modules or libraries
where the type (str / unicode) returned by a function depends on the
Antoine Pitrou wrote:
A good rule of thumb is to convert to unicode everything that is
semantically textual
and isn't pure ASCII.
(anyone who are tempted to argue otherwise should benchmark their
applications, both speed- and memorywise, and be prepared to come
up with very strong arguments
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
Antoine Pitrou wrote:
A good rule of thumb is to convert to unicode everything that is
semantically textual
and isn't pure ASCII.
How can you be sure that something that is /semantically textual/ will
always remain pure
On 10/3/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
I'm not sure it's a sensible default.
Me neither, especially since this would make it impossible
to write polymorphic code - e.g. ', '.join(list) wouldn't
work anymore if list contains Unicode; dito for u', '.join(list)
with list
Antoine Pitrou wrote:
A good rule of thumb is to convert to unicode everything that is
semantically textual
and isn't pure ASCII.
How can you be sure that something that is /semantically textual/ will
always remain pure ASCII ?
is != will always remain
/F
Martin Blais wrote:
Hi.
Like a lot of people (or so I hear in the blogosphere...), I've been
experiencing some friction in my code with unicode conversion
problems. Even when being super extra careful with the types of str's
or unicode objects that my variables can contain, there is always
M.-A. Lemburg wrote:
Michael Hudson wrote:
Martin Blais [EMAIL PROTECTED] writes:
What if we could completely disable the implicit conversions between
unicode and str? In other words, if you would ALWAYS be forced to
call either .encode() or .decode() to convert between one and the
other...
Jim Fulton wrote:
I would argue that it's evil to change the default encoding
in the first place, except in this case to disable implicit
encoding or decoding.
absolutely. unfortunately, all attempts to add such information to the
sys module documentation seem to have failed...
(last time I
PEP 255 (Simple Generators) closes with:
Q. Then why not allow an expression on return too?
A. Perhaps we will someday. In Icon, return expr means both I'm
done, and but I have one final useful value to return too, and
this is it. At the start, and in the absence of compelling uses
Antoine Pitrou [EMAIL PROTECTED] wrote:
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
Antoine Pitrou wrote:
A good rule of thumb is to convert to unicode everything that is
semantically textual
and isn't pure ASCII.
How can you be sure that something that
I'm -1 on PEP 343. It seems ...complex. And even with all the
complexity, I *still* won't be able to type
with self.lock: ...
which I submit is perfectly reasonable, clean, and clear. Instead I
have to type
with locking(self.lock): ...
where locking() is apparently either a new
Josiah Carlson wrote:
and isn't pure ASCII.
How can you be sure that something that is /semantically textual/ will
always remain pure ASCII ? That's contradictory, unless your software
never goes out of the anglo-saxon world (and even...).
Non-unicode text input widgets. Works
At 12:37 PM 10/3/2005 -0400, Jason Orendorff wrote:
I'm -1 on PEP 343. It seems ...complex. And even with all the
complexity, I *still* won't be able to type
with self.lock: ...
which I submit is perfectly reasonable, clean, and clear.
Which is why it's proposed to add __enter__/__exit__
Hi,
Josiah:
How can you be sure that something that is /semantically textual/ will
always remain pure ASCII ? That's contradictory, unless your software
never goes out of the anglo-saxon world (and even...).
Non-unicode text input widgets.
You didn't understand my statement.
I didn't
For the record, I very much want PEPs 342 and 343 implemented. I
haven't had the time to look at the patch and don't expect to find the
time any time soon, but it's not for lack of desire to see this
feature implemented.
I don't like Jason's __with__ proposal and even less like his idea to
drop
Antoine Pitrou wrote:
Under the default encoding (and quite a few other encodings), that's true
for
plain ascii strings and Unicode strings.
If I have an unicode string containing legal characters greater than
0x7F, and I pass it to a function which converts it to str, the
conversion
At 07:02 PM 10/3/2005 +0100, Michael Hudson wrote:
Phillip J. Eby [EMAIL PROTECTED] writes:
Since the PEP is accepted and has patches for both its implementation
and a
good part of its documentation, a major change like this would certainly
need a better rationale.
Though given the
Hi,
Le lundi 03 octobre 2005 à 20:37 +0200, Fredrik Lundh a écrit :
If I have an unicode string containing legal characters greater than
0x7F, and I pass it to a function which converts it to str, the
conversion fails.
so? if it does that, it's not unicode safe.
[...]
what's that
Antoine Pitrou wrote:
If I have an unicode string containing legal characters greater than
0x7F, and I pass it to a function which converts it to str, the
conversion fails.
so? if it does that, it's not unicode safe.
[...]
what's that has to do with
my argument (which is that
M.-A. Lemburg wrote:
Is the added complexity needed to support not having Unicode support
compiled into Python really worth it ?
If there are volunteers willing to maintain it, and the other volunteers
are not affected: certainly.
I know that Martin introduced this feature a long time ago,
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
Is the added complexity needed to support not having Unicode support
compiled into Python really worth it ?
If there are volunteers willing to maintain it, and the other volunteers
are not affected: certainly.
No objections there. I only see that
On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote:
Could the bytes type be just the same as the current str type but
without the implicit unicode conversion ? Or am I missing some desired
functionality ?
No. It will be a mutable array of bytes. It will intentionally
resemble strings as little
Phillip J. Eby writes:
You didn't offer any reasons why this would be useful and/or good.
It makes it dramatically easier to write Python classes that correctly
support 'with'. I don't see any simple way to do this under PEP 343;
the only sane thing to do is write a separate @contextmanager
On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote:
If that's how things were designed, then Python's entire standard
brary (not to mention third-party libraries) is not unicode safe -
to quote your own words - since many functions may return 8-bit strings
containing non-ascii
At 05:15 PM 10/3/2005 -0400, Jason Orendorff wrote:
Phillip J. Eby writes:
You didn't offer any reasons why this would be useful and/or good.
It makes it dramatically easier to write Python classes that correctly
support 'with'. I don't see any simple way to do this under PEP 343;
the only
Martin Blais wrote:
On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote:
If that's how things were designed, then Python's entire standard
brary (not to mention third-party libraries) is not unicode safe -
to quote your own words - since many functions may return 8-bit strings
containing
Le lundi 03 octobre 2005 à 14:02 -0700, Guido van Rossum a écrit :
On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote:
Could the bytes type be just the same as the current str type but
without the implicit unicode conversion ? Or am I missing some desired
functionality ?
No. It will be a
This would presumaby support the (read-only part of the) buffer API so
search would be covered.
I don't see a use case for replace.
Alternatively, you could always specify Latin-1 as the encoding and
convert it that way -- I don't think there's any input that can cause
Latin-1 decoding to fail.
Le lundi 03 octobre 2005 à 17:42 -0700, Guido van Rossum a écrit :
I don't see a use case for replace.
Agreed.
Alternatively, you could always specify Latin-1 as the encoding and
convert it that way -- I don't think there's any input that can cause
Latin-1 decoding to fail.
You seem to be
Phillip J. Eby wrote:
At 09:49 AM 9/29/2005 -0400, Viren Shah wrote:
[I sent this earlier without being a subscriber and it was sent to the
moderation queue so I'm resending it after subscribing]
Hi,
I'm running a 64-bit Fedora Core 3 with python 2.3.4. I'm trying to
install
Phillip J. Eby wrote:
At 12:14 PM 9/29/2005 -0400, Viren Shah wrote:
File /root/svn-install-apps/setuptools-0.6a4/pkg_resources.py,
line 949, in _get
return self.loader.get_data(path)
OverflowError: signed integer is greater than maximum
Interesting. That looks like it might be
PEP 255 (Simple Generators) closes with:
Q. Then why not allow an expression on return too?
A. Perhaps we will someday. In Icon, return expr means both I'm
done, and but I have one final useful value to return too, and
this is it. At the start, and in the absence of compelling uses
Is there a faster way to transcode from 8-bit chars (charmaps) to utf-8
than going through unicode()?
I'm writing a small card-file program. As a test, I use a 53 MB MBox file,
in mac-roman encoding. My program reads and parses the file into messages
in about 3 to 5 seconds (Wow! Go Python!),
On 10/4/05, Piet Delport [EMAIL PROTECTED] wrote:
One system that could benefit from this change is Christopher Armstrong's
defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to
use enhanced generators. The resulting code is much cleaner than before, and
closer to the
As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very
slow compared to encoding or decoding with utf-8. Here I'm working with 53k of
data instead of 53 megs. (Note: this is a laptop, so it's possible that
thermal or battery management features affected these numbers a
Antoine If an stdlib function returns an 8-bit string containing
Antoine non-ascii data, then this string used in unicode context incurs
Antoine an implicit conversion, which fails.
Such strings should be converted to Unicode at the point where they enter
the application. That's
On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote:
Antoine Pitrou wrote:
If I have an unicode string containing legal characters greater
than
0x7F, and I pass it to a function which converts it to str, the
conversion fails.
so? if it does that, it's not unicode safe.
[...]
what's
41 matches
Mail list logo