Eric Snow schrieb am 04.02.22 um 17:35:
On Fri, Feb 4, 2022 at 8:21 AM Stefan Behnel wrote:
Correct. We (intentionally) have our own way to intern strings and do not
depend on CPython's identifier framework.

You're talking about __Pyx_StringTabEntry (and __Pyx_InitString())?

Yes, that's what we generate. The C code parsing is done here:

https://github.com/cython/cython/blob/79637b23da77732e753b1e1ab5669b3e29978be3/Cython/Compiler/Code.py#L531-L550

The deduplication is a bit complex on our side because it needs to handle Python source encodings, and also distinguishes between identifiers (that become 'str' in Py2), plain Unicode strings and byte strings. You don't need most of that for plain C code. But it's done here:

https://github.com/cython/cython/blob/79637b23da77732e753b1e1ab5669b3e29978be3/Cython/Compiler/Code.py#L1009-L1088

And then there's a whole bunch of code that helps in getting Unicode character code points and arbitrary byte values in very long strings pushed through C compilers, while keeping it mostly readable for interested users. :)

https://github.com/cython/cython/blob/master/Cython/Compiler/StringEncoding.py

You probably don't need that either, as long as you only deal with ASCII strings.

Any way, have fun. Feel free to ask if I can help.

Stefan

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QHJBAKIQUKFPIM6GZ7DYNJF3HDMDQQUH/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to