Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
On Wed, Dec 29, 2010 at 7:19 AM, Marc-Andre Lemburg <rep...@bugs.python.org> wrote: .. > * The macros still need some more attention to enhance their performance. > Although I made your suggested change from '-' to '&', I seriously doubt that this would make any difference on modern CPUs. Why do you think these macros are performance critical? Users with lots of supplementary characters in their files are probably better off with a wide build where Py_UNICODE_NEXT() is just *ptr++ and can hardly be further optimized. Higher performance algorithms are possible, but those should probably do some loop unrolling and/or process more than one character at a time. At this point, however it is too soon to worry about optimization before we even know where these macros will be used. > * For consistency, I'd choose names Py_UNICODE_READ_NEXT() > and Py_UNICODE_WRITE_NEXT() instead of Py_UNICODE_NEXT() and > Py_UNICODE_PUT_NEXT(). > I would leave it for you and Raymond to reach a consensus. My understanding is that Raymond does not want "next" in the name, so your suggestion still conflicts with that. I would mildly prefer GET/PUT over READ/WRITE because the latter suggests multiple characters. As discussed before, the macro prefix does not imply the return value. Compare this to Py_UNICODE_ISSPACE() and friends or pretty much any other Py_UNICODE_ macro. Note that I added a leading underscore to Py_UNICODE_JOIN_SURROGATES and other new macros, so there is no immediate pressure to get the names perfect. > * The macros need to be carefully documented, both in unicodeobject.h > and the general docs. > I've added a description above _Py_UNICODE_*NEXT macros. I would really like to see these macros in private use for a while before they are published for general audience. I'll add a comment describing _Py_UNICODE_JOIN_SURROGATES. The remaining macros seem to be fairly self-explanatory (unlike, say Py_UNICODE_ISDIGIT or Py_UNICODE_ISTITLE which are not documented in unicodeobject.h.) Explicit downcasts would probably make sense, for example *(ptr)++ = (Py_UNICODE)ch instead of *(ptr)++ = ch, but I don't think we need explicit casts say in Py_UCS4 code = (ch) - 0x10000; where they can mask coding errors. I also looked for the use of casts elsewhere in unicodeobject.h and the following does not look right: #define Py_UNICODE_ISSPACE(ch) \ ((ch) < 128U ? _Py_ascii_whitespace[(ch)] : _PyUnicode_IsWhitespace(ch)) It looks like this won't work right if ch is a signed char. > * Same for your _Py_UNICODE_NEXT() to make sure that the return > value is indeed a Py_UNICODE value. > The return value of _Py_UNICODE_NEXT() is *not* Py_UNICODE, it is Py_UCS4 and as far as I can see, every conditional branch in narrow case has an explicit cast. In the wide case, I don't think we want an explicit cast because ptr should already be Py_UCS4* and if it is not, it may be a coding error that we don't want to mask. > * In general, we should probably be clear on the allowed input > and define the output types in the documentation. I agree. I'll add a note that ptr and end should be Py_UNICODE*. I am not sure what we should say about ch argument. If we add casts, the macro will accept anything, but we should probably document it as expecting Py_UCS4. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10542> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com