[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2012-01-05 Thread Benjamin Peterson
Benjamin Peterson benja...@python.org added the comment: Closing now. -- nosy: +benjamin.peterson resolution: - out of date status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-09-29 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: The PEP 393 has been accepted and merge into Python 3.3. Python 3.3 doesn't need the Py_UNICODE_NEXT macro anymore. But my macros (unicode_macros.patch) are still useful. -- versions: +Python 3.2 -Python 3.3

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-09-29 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Py_UNICODE_NEXT has been removed from 3.3 but it's still available and used in 2.7/3.2 (even if it's private). In order to fix #10521 on 2.7/3.2 the _Py_UNICODE_PUT_NEXT macro attached to this patch is required. -- versions:

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-22 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: The attached patch adds the following 4 public macros to unicodeobjects.h: Py_UNICODE_IS_SURROGATE(ch) Py_UNICODE_IS_HIGH_SURROGATE(ch) Py_UNICODE_IS_LOW_SURROGATE(ch) Py_UNICODE_JOIN_SURROGATES(high, low) and documents them. Since

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-22 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Ezio Melotti wrote: Ezio Melotti ezio.melo...@gmail.com added the comment: The attached patch adds the following 4 public macros to unicodeobjects.h: Py_UNICODE_IS_SURROGATE(ch) Py_UNICODE_IS_HIGH_SURROGATE(ch)

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-22 Thread Roundup Robot
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 77171f993bf2 by Ezio Melotti in branch 'default': #10542: Add 4 macros to work with surrogates: Py_UNICODE_IS_SURROGATE, Py_UNICODE_IS_HIGH_SURROGATE, Py_UNICODE_IS_LOW_SURROGATE, Py_UNICODE_JOIN_SURROGATES.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-18 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: I attached a patch to fix the str.is* methods on #9200 that also includes the macro. Since they are not public there, I don't see a reason to do 2 separate commits on 2.7/3.2 (one for the feature and one for the fix). --

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Le 17/08/2011 07:04, Ezio Melotti a écrit : As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and with trailing _ in 2.7/3.2. They

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: STINNER Victor wrote: STINNER Victor victor.stin...@haypocalc.com added the comment: Le 17/08/2011 07:04, Ezio Melotti a écrit : As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: For Python 2.7 and 3.2, I would prefer to not touch a public header, and so add the macros in unicodeobject.c. Is there some reason for this? I think it's better if we have them in the same place rather than renaming and moving them in

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Ah yes, the correct prefix for functions working on Py_UNICODE characters/strings is Py_UNICODE, not PyUNICODE, sorry. For Python 2.7 and 3.2, I would prefer to not touch a public header, and so add the macros in unicodeobject.c.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Ezio used two different naming schemes in his email. Please always use Py_UNICODE_... or _Py_UNICODE (not PyUNICODE_ or _PyUNICODE_). Indeed, that was a typo + copy/paste. I meant to say Py_UNICODE_* and _Py_UNICODE_*. Sorry about the

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Ezio Melotti wrote: Ezio Melotti ezio.melo...@gmail.com added the comment: Ezio used two different naming schemes in his email. Please always use Py_UNICODE_... or _Py_UNICODE (not PyUNICODE_ or _PyUNICODE_). Indeed, that was a typo

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: For bug fixes, you can put the macros straight into unicodeobject.c, but please leave unicodeobject.h untouched - otherwise people will mess around with these macros (even if they are private) and users will start to wonder about linker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Ezio Melotti wrote: Ezio Melotti ezio.melo...@gmail.com added the comment: For bug fixes, you can put the macros straight into unicodeobject.c, but please leave unicodeobject.h untouched - otherwise people will mess around with these

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Eric V. Smith
Eric V. Smith e...@trueblade.com added the comment: On 8/17/2011 6:30 AM, Ezio Melotti wrote: OK, so in 2.7/3.2 I'll put them in unicodeobject.c, and in 3.3 I'll move them in unicodeobject.c. I believe the second file should be unicodeobject.h, correct? --

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Correct. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___ ___ Python-bugs-list

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Also what about 3.2? Are you saying that we should fix the bug in 3.2/3.3 only and leave 2.x alone or that you don't want the bug to be fixed in all the bug-fix releases (i.e. 2.7/3.2)? Notice that the macros themselves don't fix any

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: OK, so in 2.7/3.2 I'll put them in unicodeobject.c It looks like #9200 only needs Py_UNICODE_NEXT, which can be implemented without the other Py_UNICODE_*SURROGATE* macros. -- ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Martin v. Löwis wrote: A PEP 393 draft implementation is available at https://bitbucket.org/t0rsten/pep-393/ (branch pep-393); if this gets into 3.3, this issue will be outdated: there won't be narrow builds of Python anymore (nor

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: I think the 4 macros: #define _Py_UNICODE_ISSURROGATE #define _Py_UNICODE_ISHIGHSURROGATE #define _Py_UNICODE_ISLOWSURROGATE #define _Py_UNICODE_JOIN_SURROGATES are quite straightforward and can avoid using the trailing _. Since I would

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: I think the 4 macros: #define _Py_UNICODE_ISSURROGATE #define _Py_UNICODE_ISHIGHSURROGATE #define _Py_UNICODE_ISLOWSURROGATE #define _Py_UNICODE_JOIN_SURROGATES are quite straightforward and can avoid using the trailing _. I don't want

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: All the other macros[0] follow the same convention, e.g. Py_UNICODE_ISLOWER and Py_UNICODE_TOLOWER. I agree that keeping the words separate makes them more readable though. [0]: Include/unicodeobject.h:328 --

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Ezio Melotti ezio.melo...@gmail.com added the comment: I think the 4 macros: #define _Py_UNICODE_ISSURROGATE #define _Py_UNICODE_ISHIGHSURROGATE #define _Py_UNICODE_ISLOWSURROGATE #define _Py_UNICODE_JOIN_SURROGATES are quite

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: I now see there are lots of good things in the BOM FAQ that have come up lately regarding surrogates and other illegal characters, and about what can go in data streams. I quote a few of these from http://unicode.org/faq/utf_bom.html below:

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Antoine Pitrou rep...@bugs.python.org wrote on Tue, 16 Aug 2011 09:18:46 -: I think the 4 macros: #define _Py_UNICODE_ISSURROGATE #define _Py_UNICODE_ISHIGHSURROGATE #define _Py_UNICODE_ISLOWSURROGATE #define

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Ezio Melotti rep...@bugs.python.org wrote on Tue, 16 Aug 2011 09:23:50 -: All the other macros[0] follow the same convention, e.g. Py_UNICODE_ISLOWER and Py_UNICODE_TOLOWER. I agree that keeping the words separate makes them more

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Tom Christiansen wrote: So keeping your preamble bits, I might have considered doing it this way if it were me doing it: #define _Py_UNICODE_IS_SURROGATE #define _Py_UNICODE_IS_LEAD_SURROGATE #define

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Marc-Andre Lemburg rep...@bugs.python.org wrote on Tue, 16 Aug 2011 12:11:22 -: The reasoning behind e.g. ISSURROGATE is that those names originate from and are consistent with the already existing ISLOWER/ISUPPER/ISTITLE macros

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: I'm reposting my patch from #12751. I think that it's simpler than belopolsky's patch: it doesn't add public macros in unicodeobject.h and don't add the complex Py_UNICODE_NEXT() macro. My patch only adds private macros in

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: STINNER Victor wrote: STINNER Victor victor.stin...@haypocalc.com added the comment: I'm reposting my patch from #12751. I think that it's simpler than belopolsky's patch: it doesn't add public macros in unicodeobject.h and don't

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Marc-Andre Lemburg wrote: Marc-Andre Lemburg m...@egenix.com added the comment: STINNER Victor wrote: STINNER Victor victor.stin...@haypocalc.com added the comment: I'm reposting my patch from #12751. I think that it's simpler than

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: My patch version 2: don't test for a specific major version of an OS, test only its name. My patch now changes also tests for FreeBSD, NetBSD, OpenBSD, (...), and the _expectations list in regrtest.py. -- Added file:

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file22916/linux3-v2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: -- Removed message: http://bugs.python.org/msg142225 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: (oops, msg142225 was for issue #12326) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: The code review links point to something weird. Victor, can you upload your patch for review? My first impression is that your patch does not accomplish much beyond replacing some literal expressions with macros. What

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: The code review links point to something weird. That's because I posted a patch for another issue. It's the patch set 5, not the patch set 6 :-) Direct link: http://bugs.python.org/review/10542/patch/3174/9874 My first

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and with trailing _ in 2.7/3.2. They should go in unicodeobject.h and be public in 3.3+.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-15 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: See also #12751. -- nosy: +tchrist ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-15 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: A PEP 393 draft implementation is available at https://bitbucket.org/t0rsten/pep-393/ (branch pep-393); if this gets into 3.3, this issue will be outdated: there won't be narrow builds of Python anymore (nor will there be wide builds).

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-15 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: That's a really good news. Some Unicode issues can still be fixed on 2.7 and 3.2 though. FWIW I was planning to look at this and #9200 in the following days and see if I can fix them. -- ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-30 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Actually, it looks like PEP 3131 and the Language Reference [1] still disagree. The latter says: identifier ::= id_start id_continue* which should probably be identifier ::= xid_start xid_continue* instead. Interesting.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-30 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: I think the proposal is that fixing this minefield can wait until Python 3.3 (or even 3.4, or later). That is what I was thinking. (Alex: You might not know that Martin was the main proponent of non-ASCII identifiers, so this assessment should

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Alexander Belopolsky wrote: Alexander Belopolsky belopol...@users.sourceforge.net added the comment: I am attaching a patch for commit review. I added an underscore prefix to all new macros. This way I am not introducing new

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: Let's wait for 3.3 with the change. Definitely. -- nosy: +georg.brandl versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Wed, Dec 29, 2010 at 10:00 AM, Georg Brandl rep...@bugs.python.org wrote: .. Let's wait for 3.3 with the change. Definitely. Does this also mean that the numerous surrogates related bugs should wait until 3.3 as

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: That bug already strikes me as quite exotic. You need to at least address Marc-Andre's remarks, and to give an overview of what else you'd like to change as well, and how this could affect semantics. Remember that the next release is already a

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Wed, Dec 29, 2010 at 7:19 AM, Marc-Andre Lemburg rep...@bugs.python.org wrote: .. * The macros still need some more attention to enhance their performance. Although I made your suggested change from '-' to '', I

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Sat, Nov 27, 2010 at 5:24 PM, Marc-Andre Lemburg rep...@bugs.python.org wrote: .. Perhaps we should allow ord() to work on surrogates in UCS4 builds as well. That would reduce the number of surprises. This is an

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: The example in my previous message should have been: '\U0001' == '\uD800\uDC00' True -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Wed, Dec 29, 2010 at 11:36 AM, Georg Brandl rep...@bugs.python.org wrote: .. That bug already strikes me as quite exotic. Would it look as exotic if presented like this? File stdin, line 1 ̀ = 5 ^

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: I should stop using e-mail to reply to bug reports! The mangled example was ̀ = 5 File stdin, line 1 ̀ = 5 ^ SyntaxError: invalid character in identifier -- ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Changes by Alexander Belopolsky belopol...@users.sourceforge.net: Added file: http://bugs.python.org/file20190/issue10542a.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Le mercredi 29 décembre 2010 à 19:26 +, Alexander Belopolsky a écrit : Would it look as exotic if presented like this? File stdin, line 1 ̀ = 5 ^ SyntaxError: invalid character in identifier (works on a wide

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Wed, Dec 29, 2010 at 3:36 PM, STINNER Victor rep...@bugs.python.org wrote: .. Use non-ASCII identifiers is exotic. Use non-BMP identifiers is crazy :-) Hmm, we clearly disagree on what crosses the boundary of the

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Seriously, it can wait 3.3. What exactly can wait until 3.3? The presented patch introduces no user visible changes. It is only a stepping stone to restoring some sanity in a way supplementary characters are treated by narrow builds.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Wed, Dec 29, 2010 at 8:02 PM, Martin v. Löwis rep...@bugs.python.org wrote: .. I plan to propose a complete redesign of the representation of Unicode strings, which may well make this entire set of changes obsolete.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Wed, Dec 29, 2010 at 9:38 PM, Alexander Belopolsky rep...@bugs.python.org wrote: .. Given that until recently (r87433) the PEP and the reference manual disagreed on the definition, Actually, it looks like PEP 3131 and

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Are you serious? This sounds like a py4k idea. Can you give us a hint on what the new representation will be? I'm thinking about an approach of a variable representation: one, two, or four bytes, depending on the widest character that

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-28 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Sat, Nov 27, 2010 at 5:03 PM, Marc-Andre Lemburg rep...@bugs.python.org wrote: ..  * this version should be slightly faster and is also easier to read: #define Py_UCS4_READ_CODE_POINT(ptr, end) \ ..      

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-28 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: I am attaching a patch for commit review. I added an underscore prefix to all new macros. This way I am not introducing new features and we will have a full release cycle to come up with better names. i would just note

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-16 Thread Alexander Belopolsky
Changes by Alexander Belopolsky belopol...@users.sourceforge.net: -- nosy: +doerwalter ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___ ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-16 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Fri, Dec 10, 2010 at 6:09 PM, Daniel Stutzbach rep...@bugs.python.org wrote: .. The second check for surrogates in Py_UNICODE_PUT_NEXT is necessary, unless you can prove that Py_UNICODE_SOME_TRANSFORMATION will never

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-10 Thread Daniel Stutzbach
Daniel Stutzbach stutzb...@google.com added the comment: In bltinmodule.c, it looks like some of the indentation doesn't line up? Bikeshedding aside, it looks good to me. I agree with Eric Smith that the first part macro name usually refers to the type of the first argument (or the type the

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-07 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: Daniel, While these macros should not affect ABI, I would appreciate your feedback in light of your work on issue 8654. -- nosy: +stutzbach ___ Python tracker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-07 Thread Daniel Stutzbach
Daniel Stutzbach stutzb...@google.com added the comment: +1 on the general idea of abstracting out repeated code. I will take a closer look at the details within the next few days. -- ___ Python tracker rep...@bugs.python.org

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-03 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Sat, Nov 27, 2010 at 6:38 PM, Raymond Hettinger rep...@bugs.python.org wrote: .. I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator protocol is being used. As a data point, ICU defines

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Alexander Belopolsky wrote: Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Sat, Nov 27, 2010 at 6:38 PM, Raymond Hettinger rep...@bugs.python.org wrote: .. I suggest Py_UNICODE_ADVANCE() to avoid false

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Raymond Hettinger wrote: Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Mark, can you opine on this? Yes, I'll have a look later today. -- ___ Python tracker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: I like the idea and thanks for putting work into this. Some comments: * when using macro variables, always put the variables in parens in the expansion; this avoids precedence issues, weird syntax errors, etc. - even if it may not be

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Sat, Nov 27, 2010 at 5:03 PM, Marc-Andre Lemburg rep...@bugs.python.org wrote: .. [I'll respond to skipped when I update the patch] In any case, we should clearly document where these macros are used and warn about

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Sat, Nov 27, 2010 at 5:03 PM, Marc-Andre Lemburg rep...@bugs.python.org wrote: ..  * same for the Py_UNICODE_NEXT() macro, i.e. Py_UCS4_NEXT()  * in order to make the macro easier to understand, please rename it to

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Alexander Belopolsky wrote: Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Sat, Nov 27, 2010 at 5:03 PM, Marc-Andre Lemburg rep...@bugs.python.org wrote: .. * same for the Py_UNICODE_NEXT() macro, i.e.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: The idea is that the first part refers to what the macro returns (Py_UCS4) and the read part of the name refers to moving a pointer across an array (any array of integers). I thought the first part generally meant the type of the first

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: * the Py_UNICODE_JOIN_SURROGATES() macro should use Py_UCS4 as prefix since it returns Py_UCS4 values, i.e. Py_UCS4_JOIN_SURROGATES() * same for the Py_UNICODE_NEXT() macro, i.e. Py_UCS4_NEXT() I'm not so familiar with the prefix

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Sat, Nov 27, 2010 at 5:41 PM, Ezio Melotti rep...@bugs.python.org wrote: Ezio Melotti ezio.melo...@gmail.com added the comment: * the Py_UNICODE_JOIN_SURROGATES() macro should use Py_UCS4 as prefix since it returns

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator protocol is being used. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator protocol is being used. You can't use the iterator protocol on a non-PyObject, and Py_UNICODE_* (as opposed to PyUnicode_*) suggests the macro operates on a raw array of

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: AFAIU the macro returns lone surrogates as they are, this means that: 1) if the string contains only surrogate pairs, Py_UNICODE_NEXT will iterate on scalar values[0]; 2) if the string contains only lone surrogates, it will iterate on

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: I am attaching a patch that defines Py_UNICODE_PUT_NEXT() macro (tentative name) and uses it to fix str.upper method. The implementation of surrogate-aware str.upper shows that NEXT/PUT_NEXT abstractions may lead to

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
New submission from Alexander Belopolsky belopol...@users.sourceforge.net: As discussed in issue 10521 and the sprawling len(chr(i)) = 2? thread [1] on python-dev, many functions in python library behave differently on narrow and wide builds. While there are unavoidable differences such as

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Changes by Alexander Belopolsky belopol...@users.sourceforge.net: -- nosy: +haypo, loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: In addition to the proposed Py_UNICODE_NEXT and Py_UNICODE_PUT_NEXT, str.__format__ would also need a function that tells it how many Py_UNICODEs are needed to store a given Py_UCS4. -- ___ Python

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Fri, Nov 26, 2010 at 7:27 PM, Eric Smith rep...@bugs.python.org wrote: .. In addition to the proposed Py_UNICODE_NEXT and Py_UNICODE_PUT_NEXT,  str.__format__ would also need a function that tells it how many

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: I'd need access to this without having to build a PyUnicodeObject, for efficiency. But it sounds like it does have the basic functionality I need. For my use I'd really need it to take the result of Py_UNICODE_NEXT. Something like: Py_ssize_t

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Fri, Nov 26, 2010 at 7:45 PM, Eric Smith rep...@bugs.python.org wrote: .. For my use I'd really need it to take the result of Py_UNICODE_NEXT. Something like: Py_ssize_t Py_UNICODE_NUM_NEEDED(Py_UCS4 c) and it

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: I don't like macro having a result and using multiple instructions using the evil magic trick (the ,). It's harder to maintain the code and harder to debug than a classical function. Don't you think that modern compilers are able

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: The code will basically be: Py_UCS4 fill; parse_format_string(fmt, ..., fill, ...); /* lots more code */ if (fill_needed) { /* compute how many characters to reserve */ space_needed = Py_UNICODE_NUM_NEEDED(fill) *

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Fri, Nov 26, 2010 at 8:41 PM, STINNER Victor rep...@bugs.python.org wrote: .. I don't like macro having a result and using multiple instructions using the evil magic trick (the ,). It's harder to maintain the code

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: The compiler's decision to inline something should not be related to its ability to put variables in a register. But I definitely agree that we should get the abstraction right first and worry about the implementation later. --

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Fri, Nov 26, 2010 at 9:22 PM, Eric Smith rep...@bugs.python.org wrote: .. But I definitely agree that we should get the abstraction right first and worry about the implementation later. I am fairly happy with

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: Raymond, I wonder if you would like to comment on the iterator analogy and/or on adding public names to C API. -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Mark, can you opine on this? -- assignee: belopolsky - lemburg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___