Re: [Python-Dev] The future of the wchar_t cache

Steve Dower Mon, 22 Oct 2018 13:44:42 -0700

On 22Oct2018 1047, Steve Dower wrote:

On 22Oct2018 1007, Serhiy Storchaka wrote:
22.10.18 16:24, Steve Dower пише:
Yes, that's true. But "should reduce ... footprint" is also anoptimisation that deserves a benchmark by that standard. Also, I'mproposing keeping the 'kind' as UCS-2 when the string is created fromUCS-2 data that is likely to be used as UCS-2. We would not createthe UCS-1 version in this case, so it's not the same as prefillingthe cache, but it would cost a bit of memory in exchange for CPU. Ifslicing and concatentation between matching kinds also preserved thekind, a lot of path handling code could avoid back-and-forthconversions.
Oh, I afraid this will complicate the whole code of unicodeobject.c(and several other files) a much and can introduce a lot of subtle bugs.
For example, when you search a UCS2 string in a UCS1 string, thecurrent code returns the result fast, because a UCS1 string can'tcontain codes > 0xff, and a UCS2 string should contain codes > 0xff.And there are many such assumptions.
That doesn't change though, as we're only ever expanding the range. Sosearching a UCS2 string in a UCS2 string that doesn't contain any actualUCS2 characters is the only case that would be affected, and whetherthat case occurs more than the UCS2->UCS1->UCS2 conversion case issomething we can measure (but I'd be surprised if substring searchesoccur more frequently than OS conversions).
Currently, unicode_compare_eq exits early when the kinds do not match,and that would be a problem (but is also easily fixable). But otherstring operations already handle mismatched kinds.

I made the changes (along with a somewhat expensive update to make__hash__ produce the same value for UCS1 and UCS2 strings) and it worksjust fine, but the speed difference seems to be fairly trivial. Equalitytime in particular is slower (highly optimized memcpy vs. plain-old forloop).

That said, I didn't remove the wchar_t cache (though I tried some tricksto avoid it), so it's possible that once that's gone we'll see anavoidable regression here, but on its own this doesn't contribute much.


Cheers,
Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The future of the wchar_t cache

Reply via email to