Adam Olsen wrote:
On 10/30/05, François Pinard [EMAIL PROTECTED] wrote:
All development is done in house by French people. All documentation,
external or internal, comments, identifier and function names,
everything is in French. Some of the developers here have had a long
programming life,
François Pinard wrote:
All development is done in house by French people. All documentation,
external or internal, comments, identifier and function names,
everything is in French.
There's nothing stopping you from creating your own
Frenchified version of Python that lets you use all
the
[Greg Ewing]
All development is done in house by French people. All documentation,
external or internal, comments, identifier and function names,
everything is in French.
There's nothing stopping you from creating your own Frenchified
version of Python that lets you use all the
[Martin von Löwis]
My canonical example is François Pinard, who keeps requesting it,
saying that local people where surprised they couldn't use accented
characters in Python. Perhaps that's because he actually is Quebecian
:-)
I presume I should comment a bit on this.
People here are
On 10/30/05, François Pinard [EMAIL PROTECTED] wrote:
All development is done in house by French people. All documentation,
external or internal, comments, identifier and function names,
everything is in French. Some of the developers here have had a long
programming life, while they only
At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
Bengt Richter wrote:
Please bear with me for a few paragraphs ;-)
Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the parser how to
convert
Neil Hodgson wrote:
M.-A. Lemburg:
Unicode has the concept of combining code points, e.g. you can
store an é (e with a accent) as e + '. Now if you slice
off the accent, you'll break the character that you encoded
using combining code points.
...
next_indextype(u, index) - integer
Bengt Richter wrote:
At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
Bengt Richter wrote:
Please bear with me for a few paragraphs ;-)
Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the
Bill Janssen wrote:
I just got mail this morning from a researcher who wants exactly what
Martin described, and wondered why the default MacPython 2.4.2 didn't
provide it by default. :-)
If all he wants is to represent Deseret, he can do so in a 16-bit
Unicode type, too: Python supports
I think he was more interested in the invariant Martin proposed, that
len(\U0001)
should always be the same and should always be 1.
Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
On 10/25/05, Bill Janssen [EMAIL PROTECTED] wrote:
I think he was more interested in the invariant Martin proposed, that
len(\U0001)
should always be the same and should always be 1.
Yes but why? What does this invariant do for him?
--
--Guido van Rossum (home page:
Guido van Rossum wrote:
Yes but why? What does this invariant do for him?
I don't know about this person, but there are a few things that
don't work properly in UTF-16 mode:
- the Unicode character database fails to lookup things.
u\U0001D670.isupper() gives false, but should give true
M.-A. Lemburg:
You mean a slice that slices out the next indextype ?
Yes.
This sounds a lot like you'd want iterators for the various
index types. Should be possible to implement on top of the
proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc.
Iterators may be helpful, but
Neil Hodgson wrote:
I'd like to more tightly define Unicode strings for Python 3000.
Currently, Unicode strings may be implemented with either 2 byte
(UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to
contain any Unicode character and should be indexable yielding
Phillip J. Eby wrote:
I'm tempted to say it would be even better if there was a command line
option that could be used to force all binary opens to result in bytes, and
require all text opens to specify an encoding.
For Python 3000? -1. There shouldn't be command line switches that have
that
Martin v. Löwis:
That's very tricky. If you have multiple implementations, you make
usage at the C API difficult. If you make it either UTF-8 or UTF-32,
you make PythonWin difficult. If you make it UTF-16, you make indexing
difficult.
For Windows, the code will get a little uglier,
Neil Hodgson wrote:
Guido van Rossum:
Folks, please focus on what Python 3000 should do.
I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array data type. But
Bengt Richter wrote:
Please bear with me for a few paragraphs ;-)
Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the parser how to
convert string literals (currently on the Unicode ones)
into
I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array data type. But I could
use some validation or feedback on this idea from actual
practitioners.
+1 from
Python should allow strings to
contain any Unicode character and should be indexable yielding
characters rather than half characters. Therefore Python strings
should appear to be UTF-32.
+1.
Bill
___
Python-Dev mailing list
Python-Dev@python.org
Neil Hodgson wrote:
For Windows, the code will get a little uglier, needing to perform
an allocation/encoding and deallocation more often then at present but
I don't think there will be a speed degradation as Windows is
currently performing a conversion from 8 bit to UTF-16 inside many
There are many design alternatives: one option would be to support
*three* internal representations in a single type, generating the
others from the one operation existing as needed. The default, initial
representation might be UTF-8, with UCS-4 only being generated when
indexing occurs, and
On 10/24/05, Martin v. Löwis [EMAIL PROTECTED] wrote:
Indeed. My guess is that indexing is more common than you think,
especially when iterating over the string. Of course, iteration
could also operate on UTF-8, if you introduced string iterator
objects.
Python's slice-and-dice model pretty
On 10/24/05, Martin v. Löwis [EMAIL PROTECTED] wrote:
Guido van Rossum wrote:
Changing the APIs would be much work, although perhaps not impossible
of Python 3000. For example, Raymond Hettinger's partition() API
doesn't refer to indices at all, and can replace many uses of find()
or
M.-A. Lemburg:
Unicode has the concept of combining code points, e.g. you can
store an é (e with a accent) as e + '. Now if you slice
off the accent, you'll break the character that you encoded
using combining code points.
...
next_indextype(u, index) - integer
Returns the
- yet others think: I want all of Unicode, with proper, efficient
indexing, so I want four bytes per char.
I doubt the last one though. Probably they really don't want efficient
indexing, they want to perform higher-level operations that currently
are only possible using efficient
On 10/24/05, Bill Janssen [EMAIL PROTECTED] wrote:
- yet others think: I want all of Unicode, with proper, efficient
indexing, so I want four bytes per char.
I doubt the last one though. Probably they really don't want efficient
indexing, they want to perform higher-level operations
Guido van Rossum wrote:
I think the API should reflect the representation *to some extend*,
namely it shouldn't claim to have operations that are typically
thought of as O(1) that can only be implemented as O(n).
Maybe a compromise could be reached by using a
btree of chunks or something, so
Guido van Rossum wrote:
Python's slice-and-dice model pretty much ensures that indexing is
common. Almost everything is ultimately represented as indices: regex
search results have the index in the API, find()/index() return
indices, many operations take a start and/or end index.
Maybe the
Guido writes:
Oh, I don't doubt that they want it. But often they don't *need* it,
and the higher-level goal they are trying to accomplish can be dealt
with better in a different way. (Sort of my response to people asking
for static typing in Python as well. :-)
I suppose that's true. But
-1 on keeping the source encoding of string literals. Python should
definitely decode them at compile time.
-1 on decoding implicitly as needed. This causes decoding to happen
late, in unpredictable places. Decodes can fail; they should happen
as early and as close to the data source as
On Oct 23, 2005, at 3:10 PM, Jason Orendorff wrote:
-1 on decoding implicitly as needed. This causes decoding to happen
late, in unpredictable places. Decodes can fail; they should happen
as early and as close to the data source as possible.
That's not necessarily true... Some codecs can't
On Sunday 23 October 2005 18:10, Jason Orendorff wrote:
-1 on keeping the source encoding of string literals. Python should
definitely decode them at compile time.
-1 on decoding implicitly as needed. This causes decoding to happen
late, in unpredictable places. Decodes can fail; they
Folks, please focus on what Python 3000 should do.
I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array data type. But I could
use some validation or feedback on
On Oct 23, 2005, at 6:06 PM, Guido van Rossum wrote:
Folks, please focus on what Python 3000 should do.
I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array
At 06:06 PM 10/23/2005 -0700, Guido van Rossum wrote:
Folks, please focus on what Python 3000 should do.
I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array
Please bear with me for a few paragraphs ;-)
One aspect of str-type strings is the efficiency afforded when all the encoding
really
is ascii. If the internal encoding were e.g. fixed utf-16le for strings, maybe
with today's
computers it would still be efficient enough for most actual string
Martin Blais wrote:
Yes. setdefaultencoding() is removed from sys by site.py. To get it
again you must reload sys.
Thanks.
Actually, I should take the opportunity to advise people that
setdefaultencoding doesn't really work. With the default default
encoding, strings and Unicode objects hash
On 10/15/05, Reinhold Birkenfeld [EMAIL PROTECTED] wrote:
Martin Blais wrote:
On 10/3/05, Michael Hudson [EMAIL PROTECTED] wrote:
Martin Blais [EMAIL PROTECTED] writes:
How hard would that be to implement?
import sys
reload(sys)
sys.setdefaultencoding('undefined')
Hmmm any
On 10/3/05, Michael Hudson [EMAIL PROTECTED] wrote:
Martin Blais [EMAIL PROTECTED] writes:
How hard would that be to implement?
import sys
reload(sys)
sys.setdefaultencoding('undefined')
Hmmm any particular reason for the call to reload() here?
Martin Blais wrote:
On 10/3/05, Michael Hudson [EMAIL PROTECTED] wrote:
Martin Blais [EMAIL PROTECTED] writes:
How hard would that be to implement?
import sys
reload(sys)
sys.setdefaultencoding('undefined')
Hmmm any particular reason for the call to reload() here?
Yes.
Hi.
Like a lot of people (or so I hear in the blogosphere...), I've been
experiencing some friction in my code with unicode conversion
problems. Even when being super extra careful with the types of str's
or unicode objects that my variables can contain, there is always some
case or oversight
Martin Blais [EMAIL PROTECTED] writes:
What if we could completely disable the implicit conversions between
unicode and str? In other words, if you would ALWAYS be forced to
call either .encode() or .decode() to convert between one and the
other... wouldn't that help a lot deal with that
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :
What if we could completely disable the implicit conversions between
unicode and str?
This would be very annoying when dealing with some modules or libraries
where the type (str / unicode) returned by a function depends on the
Antoine Pitrou wrote:
A good rule of thumb is to convert to unicode everything that is
semantically textual
and isn't pure ASCII.
(anyone who are tempted to argue otherwise should benchmark their
applications, both speed- and memorywise, and be prepared to come
up with very strong arguments
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
Antoine Pitrou wrote:
A good rule of thumb is to convert to unicode everything that is
semantically textual
and isn't pure ASCII.
How can you be sure that something that is /semantically textual/ will
always remain pure
On 10/3/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
I'm not sure it's a sensible default.
Me neither, especially since this would make it impossible
to write polymorphic code - e.g. ', '.join(list) wouldn't
work anymore if list contains Unicode; dito for u', '.join(list)
with list
Martin Blais wrote:
Hi.
Like a lot of people (or so I hear in the blogosphere...), I've been
experiencing some friction in my code with unicode conversion
problems. Even when being super extra careful with the types of str's
or unicode objects that my variables can contain, there is always
M.-A. Lemburg wrote:
Michael Hudson wrote:
Martin Blais [EMAIL PROTECTED] writes:
What if we could completely disable the implicit conversions between
unicode and str? In other words, if you would ALWAYS be forced to
call either .encode() or .decode() to convert between one and the
other...
Jim Fulton wrote:
I would argue that it's evil to change the default encoding
in the first place, except in this case to disable implicit
encoding or decoding.
absolutely. unfortunately, all attempts to add such information to the
sys module documentation seem to have failed...
(last time I
Antoine Pitrou [EMAIL PROTECTED] wrote:
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
Antoine Pitrou wrote:
A good rule of thumb is to convert to unicode everything that is
semantically textual
and isn't pure ASCII.
How can you be sure that something that
Josiah Carlson wrote:
and isn't pure ASCII.
How can you be sure that something that is /semantically textual/ will
always remain pure ASCII ? That's contradictory, unless your software
never goes out of the anglo-saxon world (and even...).
Non-unicode text input widgets. Works
52 matches
Mail list logo