Marc-Andre Lemburg <[email protected]> added the comment:
I like the idea and thanks for putting work into this.
Some comments:
* when using macro variables, always put the variables in parens in the
expansion; this avoids precedence issues, weird syntax errors, etc. - even if
it may not be necessary
* a function would be cleaner, but since this code is very performance
sensitive, I'd opt for the macro version, unless someone can prove that a
function would be just as fast in benchmarks
* the macros should be documented in the unicodeobject.h header file and
clearly mention that ptr and end should be side-effect free and that ptr must
an lvalue
* please use the faster bitmask operators for joining surrogates, i.e. ucs4 =
((((high & 0x03FF) << 10) | (low & 0x03FF)) + 0x00010000);
* the Py_UNICODE_JOIN_SURROGATES() macro should use Py_UCS4 as prefix since it
returns Py_UCS4 values, i.e. Py_UCS4_JOIN_SURROGATES()
* same for the Py_UNICODE_NEXT() macro, i.e. Py_UCS4_NEXT()
* in order to make the macro easier to understand, please rename it to
Py_UCS4_READ_CODE_POINT(); that's a little more typing, but still a lot less
than without the macro :-)
* this version should be slightly faster and is also easier to read:
#define Py_UCS4_READ_CODE_POINT(ptr, end) \
((Py_UNICODE_ISHIGHSURROGATE((ptr)[0]) && \
(ptr) < (end) && \
Py_UNICODE_ISLOWSURROGATE((ptr)[1])) ? \
Py_UNICODE_JOIN_SURROGATES((ptr)++, (ptr)++) : \
(Py_UCS4)*(ptr)++)
I haven't tested it, but you get the idea.
BTW: You only focus on UCS2 builds. Please also make sure that these changes
work on UCS4 builds, e.g. Py_UCS2_READ_CODE_POINT() will also work on UCS4
builds and join code points there.
Note that UCS4 builds currently don't join surrogates, so a high and low
surrogates appear as two code points, which they are, but given the experience
with UCS2 builds, may not be what the user expects. So for the purpose of
consistency we should be careful with auto-joining surrogates in UCS2.
It does make sence for ord() and the various string methods, but should be done
with care in other cases.
In any case, we should clearly document where these macros are used and warn
about the implications of using them in the wrong places.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue10542>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com