On 22Oct2018 1047, Steve Dower wrote:
On 22Oct2018 1007, Serhiy Storchaka wrote:
22.10.18 16:24, Steve Dower пише:
Yes, that's true. But "should reduce ... footprint" is also an
optimisation that deserves a benchmark by that standard. Also, I'm
proposing keeping the 'kind' as UCS-2 when the string is created from
UCS-2 data that is likely to be used as UCS-2. We would not create
the UCS-1 version in this case, so it's not the same as prefilling
the cache, but it would cost a bit of memory in exchange for CPU. If
slicing and concatentation between matching kinds also preserved the
kind, a lot of path handling code could avoid back-and-forth
conversions.
Oh, I afraid this will complicate the whole code of unicodeobject.c
(and several other files) a much and can introduce a lot of subtle bugs.
For example, when you search a UCS2 string in a UCS1 string, the
current code returns the result fast, because a UCS1 string can't
contain codes > 0xff, and a UCS2 string should contain codes > 0xff.
And there are many such assumptions.
That doesn't change though, as we're only ever expanding the range. So
searching a UCS2 string in a UCS2 string that doesn't contain any actual
UCS2 characters is the only case that would be affected, and whether
that case occurs more than the UCS2->UCS1->UCS2 conversion case is
something we can measure (but I'd be surprised if substring searches
occur more frequently than OS conversions).
Currently, unicode_compare_eq exits early when the kinds do not match,
and that would be a problem (but is also easily fixable). But other
string operations already handle mismatched kinds.
I made the changes (along with a somewhat expensive update to make
__hash__ produce the same value for UCS1 and UCS2 strings) and it works
just fine, but the speed difference seems to be fairly trivial. Equality
time in particular is slower (highly optimized memcpy vs. plain-old for
loop).
That said, I didn't remove the wchar_t cache (though I tried some tricks
to avoid it), so it's possible that once that's gone we'll see an
avoidable regression here, but on its own this doesn't contribute much.
Cheers,
Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com