Re: [Python-Dev] The future of the wchar_t cache

2018-10-23 Thread Nick Coghlan
On Tue, 23 Oct 2018 at 00:50, Steve Dower wrote: > > On 22Oct2018 1007, Serhiy Storchaka wrote: > > 22.10.18 16:24, Steve Dower пише: > >> Yes, that's true. But "should reduce ... footprint" is also an > >> optimisation that deserves a benchmark by that standard. Also, I'm > >> proposing keeping t

Re: [Python-Dev] The future of the wchar_t cache

2018-10-23 Thread Serhiy Storchaka
22.10.18 23:41, Steve Dower пише: That said, I didn't remove the wchar_t cache (though I tried some tricks to avoid it), so it's possible that once that's gone we'll see an avoidable regression here, but on its own this doesn't contribute much. Could you please test PR 2599 on Windows? It make

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Steve Dower
On 22Oct2018 1047, Steve Dower wrote: On 22Oct2018 1007, Serhiy Storchaka wrote: 22.10.18 16:24, Steve Dower пише: Yes, that's true. But "should reduce ... footprint" is also an optimisation that deserves a benchmark by that standard. Also, I'm proposing keeping the 'kind' as UCS-2 when the st

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Steve Dower
On 22Oct2018 1007, Serhiy Storchaka wrote: 22.10.18 16:24, Steve Dower пише: Yes, that's true. But "should reduce ... footprint" is also an optimisation that deserves a benchmark by that standard. Also, I'm proposing keeping the 'kind' as UCS-2 when the string is created from UCS-2 data that i

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Serhiy Storchaka
22.10.18 16:24, Steve Dower пише: Yes, that's true. But "should reduce ... footprint" is also an optimisation that deserves a benchmark by that standard. Also, I'm proposing keeping the 'kind' as UCS-2 when the string is created from UCS-2 data that is likely to be used as UCS-2. We would not c

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Serhiy Storchaka
22.10.18 11:09, Victor Stinner пише: +1 to remove wchar_t cache. IMHO it wastes memory for no real performance gain. By the way, can we start to schedule the *removal* of the Py_UNICODE API? For example, decide when Py_DEPRECATED is used in the C API? Should we start to deprecate when Python 2 r

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Steve Dower
On 22Oct2018 0928, Victor Stinner wrote: Also, I'm proposing keeping the 'kind' as UCS-2 when the string is created from UCS-2 data that is likely to be used as UCS-2. Oh. That's a major change in the PEP 393 design. You would have to modify many functions in CPython. Currently, the PEP 393 req

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Serhiy Storchaka
20.10.18 16:01, Stefan Behnel пише: But regarding the use under Windows, I wonder if there's interest in keeping it as a special Windows-only feature, e.g. to speed up the data exchange with the Win32 APIs. I guess it would have to provide a visible (performance?) advantage to justify such specia

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Victor Stinner
Le lun. 22 oct. 2018 à 15:24, Steve Dower a écrit : > Yes, that's true. But "should reduce ... footprint" is also an > optimisation that deserves a benchmark by that standard. pyperformance has a mode to mesure the memory usage (mostly the memory peak) if someone wants to have a look. > Also, I'

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Steve Dower
On 22Oct2018 0913, Victor Stinner wrote: Le lun. 22 oct. 2018 à 15:08, Steve Dower a écrit : Agreed the cache is useless here, but since the listdir() result came in as wchar_t we could keep it that way (assuming we'd only be changing it to char), and then there wouldn't have to be a conversion

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Victor Stinner
Le lun. 22 oct. 2018 à 15:08, Steve Dower a écrit : > Agreed the cache is useless here, but since the listdir() result came in > as wchar_t we could keep it that way (assuming we'd only be changing it > to char), and then there wouldn't have to be a conversion when we > immediately pass it back to

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Steve Dower
On 22Oct2018 0413, Victor Stinner wrote: For code like "for name in os.listdir(): open(name): " (replace listdir with scandir if you want to get file metadata), the cache is useless, since the fresh string has to be converted to wchar_t* anyway, and the cache is destroyed at the end of the lo

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Victor Stinner
Le sam. 20 oct. 2018 à 18:02, Steve Dower a écrit : > I don't have numbers, but my instinct says the most impacted operations > would be retrieving collections of strings from the OS (avoiding a > scan/conversion for each one), comparisons against these collections > (in-memory handling for hash/c

Re: [Python-Dev] The future of the wchar_t cache

2018-10-22 Thread Victor Stinner
Hi Serhiy, +1 to remove wchar_t cache. IMHO it wastes memory for no real performance gain. By the way, can we start to schedule the *removal* of the Py_UNICODE API? For example, decide when Py_DEPRECATED is used in the C API? Should we start to deprecate when Python 2 reachs its end of life? Or c

Re: [Python-Dev] The future of the wchar_t cache

2018-10-20 Thread Steve Dower
On 20Oct2018 0901, Stefan Behnel wrote: I'd be happy to get rid of it. But regarding the use under Windows, I wonder if there's interest in keeping it as a special Windows-only feature, e.g. to speed up the data exchange with the Win32 APIs. I guess it would have to provide a visible (performance

Re: [Python-Dev] The future of the wchar_t cache

2018-10-20 Thread Stefan Behnel
Serhiy Storchaka schrieb am 20.10.2018 um 13:06: > Currently the PyUnicode object contains two caches: for UTF-8 > representation and for wchar_t representation. They are needed not for > optimization but for supporting C API which returns borrowed references for > such representations. > > The UT

Re: [Python-Dev] The future of the wchar_t cache

2018-10-20 Thread INADA Naoki
+1 to remove wchar_t cache. I hope we can remove it at Python 3.9 or 3.10. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40ma

[Python-Dev] The future of the wchar_t cache

2018-10-20 Thread Serhiy Storchaka
Currently the PyUnicode object contains two caches: for UTF-8 representation and for wchar_t representation. They are needed not for optimization but for supporting C API which returns borrowed references for such representations. The UTF-8 cache always was in unicode objects (but in Python 2