Hi all,
Thanks everybody for your comments on this topic. Our initial
motivation for doing that is to simplify RPython by getting rid of the
RPython unicode type. I think that the outcome of these mails is that
there is no single obvious answer as to whether the change would
benefit or hurt Pyth
On 23 January 2014 20:54, Steven D'Aprano wrote:
> On Thu, Jan 23, 2014 at 01:27:50PM +, Oscar Benjamin wrote:
>
>> Steven wrote:
>> > With a UTF-8 implementation, won't that mean that string indexing
>> > operations are O(N) rather than O(1)? E.g. how do you know which UTF-8
>> > byte(s) to l
On Thu, Jan 23, 2014 at 10:45:25PM +0200, Elefterios Stamatogiannakis wrote:
> >But having said all this, I know that using UTF-8 internally for strings
> >is quite common (e.g. Haskell does it, without even an index cache, and
> >documents that indexing operations can be slow). CPython's FSR has
>
On 23/1/2014 10:54 μμ, Steven D'Aprano wrote:
On Thu, Jan 23, 2014 at 01:27:50PM +, Oscar Benjamin wrote:
Steven wrote:
With a UTF-8 implementation, won't that mean that string indexing
operations are O(N) rather than O(1)? E.g. how do you know which UTF-8
byte(s) to look at to get the cha
On Tue, Jan 21, 2014 at 11:01 PM, Johan Råde wrote:
> At the Leysin Sprint Armin outlined a new design of the PyPy 2 unicode
> class. He gave two versions of the design:
Why spend brain cycles on a Pypy unicode class, when you could just
move on to Pypy3? The majority of the Python community is
On Thu, Jan 23, 2014 at 01:27:50PM +, Oscar Benjamin wrote:
> Steven wrote:
> > With a UTF-8 implementation, won't that mean that string indexing
> > operations are O(N) rather than O(1)? E.g. how do you know which UTF-8
> > byte(s) to look at to get the character at index 42 without having to
Hi Oscar,
Thanks for explaining the caching in detail :-)
On Thu, Jan 23, 2014 at 2:27 PM, Oscar Benjamin
wrote:
> big saving. If the string comes from anything other than utf-8 the indexing
> cache can be built while decoding (and reencoding as utf-8 under the hood).
Actually, you need to walk
On 2014-01-22 08:01, Johan Råde wrote:
Next, would such a change break any existing Python 2 code on Windows?
Yes it will. For instance the following code for counting characters in
a string:
f = [0] * (1 << 16)
for c in s:
f[ord(c)] += 1
I would like to qualify this statement.
Get
On Wed, Jan 22, 2014 at 06:56:32PM +0100, Armin Rigo wrote:
> Hi Johan,
>
> On Wed, Jan 22, 2014 at 8:01 AM, Johan Råde wrote:
> > (I hope this makes more sense than my ramblings on IRC last night.)
>
> All versions you gave make sense as far as I'm concerned :-) But this
> last one is the clea
On Wed, Jan 22, 2014 at 08:01:31AM +0100, Johan Råde wrote:
> At the Leysin Sprint Armin outlined a new design of the PyPy 2 unicode
> class. He gave two versions of the design:
>
> A: unicode with a UTF-8 implementation and a UTF-32 interface.
>
> B: unicode with a UTF-8 implementation, a UT
Hi Johan,
On Wed, Jan 22, 2014 at 8:01 AM, Johan Råde wrote:
> (I hope this makes more sense than my ramblings on IRC last night.)
All versions you gave make sense as far as I'm concerned :-) But this
last one is the clearest indeed.
It seems that Python 3 went that way anyway too, and exposes
At the Leysin Sprint Armin outlined a new design of the PyPy 2 unicode
class. He gave two versions of the design:
A: unicode with a UTF-8 implementation and a UTF-32 interface.
B: unicode with a UTF-8 implementation, a UTF-16 interface on Windows
and a UTF-32 interface on UNIX-like systems.
12 matches
Mail list logo