Re: [Python-Dev] The future of the wchar_t cache

Steve Dower Mon, 22 Oct 2018 07:50:06 -0700

On 22Oct2018 1007, Serhiy Storchaka wrote:

22.10.18 16:24, Steve Dower пише:
Yes, that's true. But "should reduce ... footprint" is also anoptimisation that deserves a benchmark by that standard. Also, I'mproposing keeping the 'kind' as UCS-2 when the string is created fromUCS-2 data that is likely to be used as UCS-2. We would not create theUCS-1 version in this case, so it's not the same as prefilling thecache, but it would cost a bit of memory in exchange for CPU. Ifslicing and concatentation between matching kinds also preserved thekind, a lot of path handling code could avoid back-and-forth conversions.
Oh, I afraid this will complicate the whole code of unicodeobject.c (andseveral other files) a much and can introduce a lot of subtle bugs.
For example, when you search a UCS2 string in a UCS1 string, the currentcode returns the result fast, because a UCS1 string can't contain codes> 0xff, and a UCS2 string should contain codes > 0xff. And there aremany such assumptions.

That doesn't change though, as we're only ever expanding the range. Sosearching a UCS2 string in a UCS2 string that doesn't contain any actualUCS2 characters is the only case that would be affected, and whetherthat case occurs more than the UCS2->UCS1->UCS2 conversion case issomething we can measure (but I'd be surprised if substring searchesoccur more frequently than OS conversions).

Currently, unicode_compare_eq exits early when the kinds do not match,and that would be a problem (but is also easily fixable). But otherstring operations already handle mismatched kinds.


Cheers,
Steve
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The future of the wchar_t cache

Reply via email to