On Tue, 10 Jun 2014 12:27:26 -0700, wxjmfauth wrote:

> Le samedi 7 juin 2014 04:20:22 UTC+2, Tim Chase a écrit :
>> On 2014-06-06 09:59, Travis Griggs wrote:
>> 
>> > On Jun 4, 2014, at 4:01 AM, Tim Chase wrote:
>> 
>> > > If you use UTF-8 for everything
>> 
>> 
>> > 
>> > It seems to me, that increasingly other libraries (C, etc), use
>> 
>> > utf8 as the preferred string interchange format.
>> 
>> 
>> 
>> I definitely advocate UTF-8 for any streaming scenario, as you're
>> 
>> iterating unidirectionally over the data anyways, so why use/transmit
>> 
>> more bytes than needed.  The only failing of UTF-8 that I've found in
>> 
>> the real world(*) is when you have to requirement of constant-time
>> 
>> indexing into strings.
>> 
>> 
>> 
>> -tkc
> 
> And once again, just an illustration,
> 
>>>> timeit.repeat("(x*1000 + y)", setup="x = 'abc'; y = 'z'")
> [0.9457552436453511, 0.9190932610143818, 0.9322044912393039]
>>>> timeit.repeat("(x*1000 + y)", setup="x = 'abc'; y = '\u0fce'")
> [2.5541921791045183, 2.52434366066052, 2.5337417948967413]
>>>> timeit.repeat("(x*1000 + y)", setup="x = 'abc'.encode('utf-8'); y =
>>>> 'z'.encode('utf-8')")
> [0.9168235779232532, 0.8989583403075017, 0.8964204541650247]
>>>> timeit.repeat("(x*1000 + y)", setup="x = 'abc'.encode('utf-8'); y =
>>>> '\u0fce'.encode('utf-8')")
> [0.9320969737165115, 0.9086006535332558, 0.9051715140790861]
>>>> 
>>>> 
>>>> sys.getsizeof('abc'*1000 + '\u0fce')
> 6040
>>>> sys.getsizeof(('abc'*1000 + '\u0fce').encode('utf-8'))
> 3020
>>>>
>>>>
> 
> But you know, that's not the problem.
> 
> When a see a core developper discussing benchmarking,
> when the same application using non ascii chars become 1, 2, 5, 10, 20
> if not more, slower comparing to pure ascii, I'm wondering if there is
> not a serious problem somewhere.
> 
> (and also becoming slower that Py3.2)
> 
> BTW, very easy to explain.
> 
> I do not understand why the "free, open, what-you-wish-here, ... "
> software is so often pushing to the adoption of serious corporate
> products.
> 
> jmf

Your error reports always seem to resolve around benchmarks despite speed 
not being one of Pythons prime objectives

Computers store data using bytes
ASCII Characters can be used storing a single byte
Unicode code-points cannot be stored in a single byte
therefore Unicode will always be inherently slower than ASCII

implementation details mean that some Unicode characters may be handled 
more efficiently than others, why is this wrong?
why should all Unicode operations be equally slow?



-- 
There isn't any problem
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to