On 28/12/2020 02:07, Inada Naoki wrote:
On Sun, Dec 27, 2020 at 8:20 PM Ronald Oussoren via Python-Dev
<python-dev@python.org> wrote:
On 26 Dec 2020, at 18:43, Guido van Rossum <gu...@python.org> wrote:
On Sat, Dec 26, 2020 at 3:54 AM Phil Thompson via Python-Dev
<python-dev@python.org> wrote:
That wouldn’t be a solution for code using the PyUnicode_* APIs of
course, nor Python code explicitly checking for the str type.
In the end a new string “kind” (next to the 1, 2 and 4 byte variants)
where callbacks are used to provide data might be the most pragmatic.
That will still break code peaking directly in the the PyUnicodeObject
struct, but anyone doing that should know that that is not a stable
API.
I had a similar idea for lazy loading or lazy decoding of Unicode
objects.
But I have rejected the idea and proposed to deprecate
PyUnicode_READY() because of the balance between merits and
complexity:
* Simplifying the Unicode object may introduce more room for
optimization because Unicode is the essential type for Python. Since
Python is a dynamic language, a huge amount of str comparison happened
in runtime compared with static languages like Java and Rust.
* Third parties may forget to check PyErr_Occurred() after API like
PyUnicode_Contains or PyUnicode_Compare when the author knows all
operands are exact Unicode type.
Additionally, if we introduce the customizable lazy str object, it's
very easy to release GIL during basic Unicode operations. Many third
parties may assume PyUnicode_Compare doesn't release GIL if both
operands are Unicode objects. It will produce bugs hard to find and
reproduce.
I would have no problem with the protocol stating that the GIL must not
be released by "foreign" unicode implementations.
So I'm +1 to make Unicode simple by removing PyUnicode_READY(), and -1
to make Unicode complicated by adding customizable callback for lazy
population.
Anyway, I am OK to un-deprecate PyUnicode_READY() and make it no-op
macro since Python 3.12.
But I don't know how many third-parties use it properly, because
legacy Unicode objects are very rare already.
For me lazy population might not be enough (as I'm not sure precisely
what you mean by it). I would like to be able to use my foreign unicode
thing to be used as the storage.
For example (where text() returns a unicode object with a foreign
kind)...
some_text = an_editor.text()
more_text = another_editor.text()
if some_text == more_text:
print("The text is the same")
...would not involve any conversions at all. The following would require
a conversion...
if some_text == "literal text":
Phil
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/ZSPNNLM25FRIEK2KYN5JORIR76PZH22N/
Code of Conduct: http://python.org/psf/codeofconduct/