Eric Snow schrieb am 04.02.22 um 17:35:
On Fri, Feb 4, 2022 at 8:21 AM Stefan Behnel wrote:
Correct. We (intentionally) have our own way to intern strings and do not
depend on CPython's identifier framework.
You're talking about __Pyx_StringTabEntry (and __Pyx_InitString())?
Yes, that's what we generate. The C code parsing is done here:
https://github.com/cython/cython/blob/79637b23da77732e753b1e1ab5669b3e29978be3/Cython/Compiler/Code.py#L531-L550
The deduplication is a bit complex on our side because it needs to handle
Python source encodings, and also distinguishes between identifiers (that
become 'str' in Py2), plain Unicode strings and byte strings. You don't
need most of that for plain C code. But it's done here:
https://github.com/cython/cython/blob/79637b23da77732e753b1e1ab5669b3e29978be3/Cython/Compiler/Code.py#L1009-L1088
And then there's a whole bunch of code that helps in getting Unicode
character code points and arbitrary byte values in very long strings pushed
through C compilers, while keeping it mostly readable for interested users. :)
https://github.com/cython/cython/blob/master/Cython/Compiler/StringEncoding.py
You probably don't need that either, as long as you only deal with ASCII
strings.
Any way, have fun. Feel free to ask if I can help.
Stefan
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/QHJBAKIQUKFPIM6GZ7DYNJF3HDMDQQUH/
Code of Conduct: http://python.org/psf/codeofconduct/