Re: [Python-Dev] ints not overflowing into longs?
On Wed, 2 Nov 2011 19:41:30 -0700 Guido van Rossum gu...@python.org wrote: Apparently Macports is still using a buggy compiler. If I understand things correctly, this is technically not a buggy compiler but Python making optimistic assumptions about the C standard. (from issue11149: clang (as with gcc 4.x) assumes signed integer overflow is undefined. But Python depends on the fact that signed integer overflow wraps) I'd happily call that a buggy C standard, though :-) Regards Antoine. I reported a similar issue before and got this reply from Ned Delly: Thanks for the pointer. That looks like a duplicate of Issue11149 (and Issue12701). Another manifestation of this was reported in Issue13061 which also originated from MacPorts. I'll remind them that the configure change is likely needed for all Pythons. It's still safest to stick with good old gcc-4.2 on OS X at the moment. (Those issues are on bugs.python.org.) --Guido On Wed, Nov 2, 2011 at 7:32 PM, Derek Shockey derek.shoc...@gmail.com wrote: I just found an unexpected behavior and I'm wondering if it is a bug. In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it appears that integers are not correctly overflowing into longs and instead are yielding bizarre results. I can only reproduce this when using the exponent operator with two ints (declaring either operand explicitly as long prevents the behavior). 2**100 0 2**100L 1267650600228229401496703205376L 20**20 -2101438300051996672 20L**20 1048576L 10**20 7766279631452241920 10L**20L 1L To confirm I'm not crazy, I tried in the 2.7.1 and 2.6.7 installations included in OS X 10.7, and also a 2.7.2+ (not sure what the + is) on an Ubuntu machine and didn't see this behavior. This looks like some kind of truncation error, but I don't know much about the internals of Python and have no idea what's going on. I assume since it's only in my MacPorts installation, it must be build configuration issue that is specific to OS X, perhaps only 10.7, or MacPorts. Am I doing something wrong, and is there a way to fix it before I compile? I could find any references to this problem as a known issue. Thanks, Derek ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ints not overflowing into longs?
Le Mercredi 2 Novembre 2011 19:32:38 Derek Shockey a écrit : I just found an unexpected behavior and I'm wondering if it is a bug. In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it appears that integers are not correctly overflowing into longs and instead are yielding bizarre results. I can only reproduce this when using the exponent operator with two ints (declaring either operand explicitly as long prevents the behavior). 2**100 0 This issue has already been fixed twice in Python 2.7 branch: int_pow() has been fixed and -fwrapv is now used for Clang. http://bugs.python.org/issue11149 http://bugs.python.org/issue12973 It is maybe time for a new release? :-) Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] draft PEP: virtual environments
For what its worth On 11/1/2011 11:43 AM, Paul Moore wrote: On 1 November 2011 16:40, Paul Moorep.f.mo...@gmail.com wrote: On 1 November 2011 16:29, Paul Moorep.f.mo...@gmail.com wrote: On 31 October 2011 20:10, Carl Meyerc...@oddbird.net wrote: For Windows, can you point me at the nt scripts? If they aren't too complex, I'd be willing to port to Powershell. For what its worth, there have been a number of efforts in this direction: https://bitbucket.org/guillermooo/virtualenvwrapper-powershell https://bitbucket.org/vanl/virtualenvwrapper-powershell (Both different implementations) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot failures
On Sat, Oct 22, 2011 at 14:30, Andrea Crotti andrea.crott...@gmail.com wrote: On 10/21/2011 10:08 PM, Antoine Pitrou wrote: Hello, There are currently a bunch of various buildbot failures on all 3 branches. I would remind committers to regularly take a look at the buildbots, so that these failures get solved reasonably fast. Regards Antoine. In my previous workplace if someone broke a build committing something wrong he/she had to bring cake for everyone next meeting. The cake is not really feasible I guess, but isn't it possible to notify the developer that broke the build? You just have to keep track and bring all of the cakes that you owe to PyCon. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ints not overflowing into longs?
Hi Derek, I tried in the 2.7.1 and 2.6.7 installations included in OS X 10.7, and also a 2.7.2+ (not sure what the + is) The + means that’s it’s 2.7.2 + some commits, in other words the in-development version that will become 2.7.3. This bit of info seems to be missing from the doc. Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Unicode exception indexing
There is a backwards compatibility issue with PEP 393 and Unicode exceptions: the start and end indices: are they Py_UNICODE indices, or code point indices? On the one hand, these indices are used in formatting error messages such as codec can't encode character \u%04x in position %d, suggesting they are regular indices into the string (counting code points). On the other hand, they are used by error handlers to lookup the character, and existing error handlers (including the ones we have now) use PyUnicode_AsUnicode to find the character. This suggests that the indices should be Py_UNICODE indices, for compatibility (and they currently do work in this way). The indices can only be different if the string is an UCS-4 string, and Py_UNICODE is a two-byte type (i.e. on Windows). So what should it be? As a compromise, it would be possible to convert between these indices, by counting the non-BMP characters that precede the index if the indices might differ. That would be expensive to compute, but provide backwards compatibility to the C API. It's less clear what backwards compatibility to Python code would require - most likely, people would use the indices for slicing operations (rather than performing an UTF-16 conversion and performing indexing on that). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ints not overflowing into longs?
I believe you're right. The 2.7.2 MacPorts portfile definitely passes the -fwrapv flag to clang, but the bad behavior still occurs with exponents. I verified the current head of the 2.7 branch does not have this problem when built with clang, so I'm assuming that issue12973 resolved this with a patch to int_pow() and that it will be out in the next release. -Derek On Thu, Nov 3, 2011 at 4:30 AM, Antoine Pitrou solip...@pitrou.net wrote: On Wed, 2 Nov 2011 19:41:30 -0700 Guido van Rossum gu...@python.org wrote: Apparently Macports is still using a buggy compiler. If I understand things correctly, this is technically not a buggy compiler but Python making optimistic assumptions about the C standard. (from issue11149: clang (as with gcc 4.x) assumes signed integer overflow is undefined. But Python depends on the fact that signed integer overflow wraps) I'd happily call that a buggy C standard, though :-) Regards Antoine. I reported a similar issue before and got this reply from Ned Delly: Thanks for the pointer. That looks like a duplicate of Issue11149 (and Issue12701). Another manifestation of this was reported in Issue13061 which also originated from MacPorts. I'll remind them that the configure change is likely needed for all Pythons. It's still safest to stick with good old gcc-4.2 on OS X at the moment. (Those issues are on bugs.python.org.) --Guido On Wed, Nov 2, 2011 at 7:32 PM, Derek Shockey derek.shoc...@gmail.com wrote: I just found an unexpected behavior and I'm wondering if it is a bug. In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it appears that integers are not correctly overflowing into longs and instead are yielding bizarre results. I can only reproduce this when using the exponent operator with two ints (declaring either operand explicitly as long prevents the behavior). 2**100 0 2**100L 1267650600228229401496703205376L 20**20 -2101438300051996672 20L**20 1048576L 10**20 7766279631452241920 10L**20L 1L To confirm I'm not crazy, I tried in the 2.7.1 and 2.6.7 installations included in OS X 10.7, and also a 2.7.2+ (not sure what the + is) on an Ubuntu machine and didn't see this behavior. This looks like some kind of truncation error, but I don't know much about the internals of Python and have no idea what's going on. I assume since it's only in my MacPorts installation, it must be build configuration issue that is specific to OS X, perhaps only 10.7, or MacPorts. Am I doing something wrong, and is there a way to fix it before I compile? I could find any references to this problem as a known issue. Thanks, Derek ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/derek.shockey%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ints not overflowing into longs?
Derek Shockey derek.shoc...@gmail.com wrote: I believe you're right. The 2.7.2 MacPorts portfile definitely passes the -fwrapv flag to clang, but the bad behavior still occurs with exponents. Really? Even without the fix for issue12973 the -fwrapv flag should be sufficient, as reported in issue13061 and Issue11149. For clang version 3.0 (trunk 139691) on FreeBSD this is the case. Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
Le jeudi 3 novembre 2011 18:14:42, mar...@v.loewis.de a écrit : There is a backwards compatibility issue with PEP 393 and Unicode exceptions: the start and end indices: are they Py_UNICODE indices, or code point indices? Oh oh. That's exactly why I didn't want to start to work on this issue. http://bugs.python.org/issue13064 In a Python error handler, exc.object[exc.start:exc.end] should be used to get the unencodable/undecodable substring. In a C error handler, it depends if you use a Py_UNICODE* pointer or PyUnicode_Substring() / PyUnicode_READ. Using google.fr/codesearch, I found some user error handlers implemented in Python: * straw: html_replace * Nuxeo: latin9_fallback * peerscape: htmlentityescape * pymt: cssescape * I found no error implemented in C (not any call to PyCodec_RegisterError). So what should it be? I suggest to use code point indices. Code point indices is also now more natural with the PEP 393. Because it is an incompatible change, it should be documented in the PEP and in the What's new in Python 3.3 document. As a compromise, it would be possible to convert between these indices, by counting the non-BMP characters that precede the index if the indices might differ. I started such hack for the UTF-8 codec... It is really tricky, we should not do that! That would be expensive to compute Yeah, O(n) should be avoided when is it possible. -- FYI I implemented a proof-of-concept in Python of the surrogateescape error handler for Python 2 (for Mercurial): https://bitbucket.org/haypo/misc/src/tip/python/surrogateescape.py Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot failures
Brian Curtin, 03.11.2011 15:59: On Sat, Oct 22, 2011 at 14:30, Andrea Crotti wrote: On 10/21/2011 10:08 PM, Antoine Pitrou wrote: Hello, There are currently a bunch of various buildbot failures on all 3 branches. I would remind committers to regularly take a look at the buildbots, so that these failures get solved reasonably fast. Regards Antoine. In my previous workplace if someone broke a build committing something wrong he/she had to bring cake for everyone next meeting. The cake is not really feasible I guess, but isn't it possible to notify the developer that broke the build? You just have to keep track and bring all of the cakes that you owe to PyCon. Did you mean PieCon? Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
On Thu, 03 Nov 2011 18:14:42 +0100 mar...@v.loewis.de wrote: There is a backwards compatibility issue with PEP 393 and Unicode exceptions: the start and end indices: are they Py_UNICODE indices, or code point indices? On the one hand, these indices are used in formatting error messages such as codec can't encode character \u%04x in position %d, suggesting they are regular indices into the string (counting code points). On the other hand, they are used by error handlers to lookup the character, and existing error handlers (including the ones we have now) use PyUnicode_AsUnicode to find the character. This suggests that the indices should be Py_UNICODE indices, for compatibility (and they currently do work in this way). But what about error handlers written in Python? The indices can only be different if the string is an UCS-4 string, and Py_UNICODE is a two-byte type (i.e. on Windows). So what should it be? I'd say let's do the Right Thing and accept the small compatibility breach (surrogates on UCS-2 builds). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ints not overflowing into longs?
You're right; among my many tests I think I muddled the situation with a stray CFLAGS variable in my environment. Apologies for the misinformation. The current MacPorts portfile does not add -fwrapv. Adding -fwrapv to OPT in the Makefile solves the problem. I confirmed by manually building the v2.7.2 tag with clang and -fwrapv, and the overflow behavior is correct. I've notified the MacPorts package maintainer. -Derek On Thu, Nov 3, 2011 at 11:07 AM, Stefan Krah ste...@bytereef.org wrote: Derek Shockey derek.shoc...@gmail.com wrote: I believe you're right. The 2.7.2 MacPorts portfile definitely passes the -fwrapv flag to clang, but the bad behavior still occurs with exponents. Really? Even without the fix for issue12973 the -fwrapv flag should be sufficient, as reported in issue13061 and Issue11149. For clang version 3.0 (trunk 139691) on FreeBSD this is the case. Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/derek.shockey%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
On Thu, Nov 3, 2011 at 12:29 PM, Antoine Pitrou solip...@pitrou.net wrote: On Thu, 03 Nov 2011 18:14:42 +0100 mar...@v.loewis.de wrote: There is a backwards compatibility issue with PEP 393 and Unicode exceptions: the start and end indices: are they Py_UNICODE indices, or code point indices? On the one hand, these indices are used in formatting error messages such as codec can't encode character \u%04x in position %d, suggesting they are regular indices into the string (counting code points). On the other hand, they are used by error handlers to lookup the character, and existing error handlers (including the ones we have now) use PyUnicode_AsUnicode to find the character. This suggests that the indices should be Py_UNICODE indices, for compatibility (and they currently do work in this way). But what about error handlers written in Python? The indices can only be different if the string is an UCS-4 string, and Py_UNICODE is a two-byte type (i.e. on Windows). So what should it be? I'd say let's do the Right Thing and accept the small compatibility breach (surrogates on UCS-2 builds). +1 -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
On 11/3/2011 3:16 PM, Victor Stinner wrote: Le jeudi 3 novembre 2011 18:14:42, mar...@v.loewis.de a écrit : There is a backwards compatibility issue with PEP 393 and Unicode exceptions: the start and end indices: are they Py_UNICODE indices, or code point indices? I had the impression that we were abolishing the wide versus narrow build difference and that this issue would disappear. I must have missed something. So what should it be? I suggest to use code point indices. Code point indices is also now more natural with the PEP 393. I think we should look forward, not backwards. Error messages are defined as undefined ;-). So I think we should do what is right for the new implementation. I suspect that means that I am agreeing with both Victor and Antoine. Because it is an incompatible change, it should be documented in the PEP and in the What's new in Python 3.3 document. ... Yeah, O(n) should be avoided when is it possible. Definitely to both. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
Am 03.11.2011 22:19, schrieb Terry Reedy: On 11/3/2011 3:16 PM, Victor Stinner wrote: Le jeudi 3 novembre 2011 18:14:42, mar...@v.loewis.de a écrit : There is a backwards compatibility issue with PEP 393 and Unicode exceptions: the start and end indices: are they Py_UNICODE indices, or code point indices? I had the impression that we were abolishing the wide versus narrow build difference and that this issue would disappear. I must have missed something. Most certainly. The Py_UNICODE type continues to exist for backwards compatibility. It is now always a typedef for wchar_t, which makes it a 16-bit type on Windows. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
On the one hand, these indices are used in formatting error messages such as codec can't encode character \u%04x in position %d, suggesting they are regular indices into the string (counting code points). On the other hand, they are used by error handlers to lookup the character, and existing error handlers (including the ones we have now) use PyUnicode_AsUnicode to find the character. This suggests that the indices should be Py_UNICODE indices, for compatibility (and they currently do work in this way). But what about error handlers written in Python? I'm working on a patch where an C error handler using PyUnicodeEncodeError_GetStart gets a different value than a Python error handler accessing .start. The _GetStart/_GetEnd functions would take the value from the exception object, and adjust it before returning it. The implementation is fairly straight-forward, just a little expensive (in the case of non-BMP strings on Windows). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
I started such hack for the UTF-8 codec... It is really tricky, we should not do that! With the proper encapsulation, it's not that tricky. I have written functions PyUnicode_IndexToWCharIndex and PyUnicode_WCharIndexToIndex, and PyUnicodeEncodeError_GetStart and friends would use that function. I'd also need new functions PyUnicodeEncodeError_GetStartIndex to access the true start field. That would be expensive to compute Yeah, O(n) should be avoided when is it possible. Ok. I'll wait half a day or so for people to reconsider (now knowing that it's actually feasible to be fully backwards compatible); if nobody speaks up, I go ahead and accept the breakage. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
On 11/3/2011 5:43 PM, Martin v. Löwis wrote: I had the impression that we were abolishing the wide versus narrow build difference and that this issue would disappear. I must have missed something. Most certainly. The Py_UNICODE type continues to exist for backwards compatibility. It is now always a typedef for wchar_t, which makes it a 16-bit type on Windows. Thank you for answering: My revised impression now is that any string I create with Python code in Python 3.3+ (as distributed, without extensions or ctypes calls) will use the new implementation and will index and and slice correctly, even with extended chars. So indexing is only an issue for those writing or using C-coded extensions with the old unicode C-API on systems with a 16-bit wchar_t. Correct? --- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
Your approach (doing the right thing for both Python and C, new API to avoid the C performance problem) sounds good to me. -- Nick Coghlan (via Gmail on Android, so likely to be more terse than usual) On Nov 4, 2011 7:58 AM, Martin v. Löwis mar...@v.loewis.de wrote: I started such hack for the UTF-8 codec... It is really tricky, we should not do that! With the proper encapsulation, it's not that tricky. I have written functions PyUnicode_IndexToWCharIndex and PyUnicode_WCharIndexToIndex, and PyUnicodeEncodeError_GetStart and friends would use that function. I'd also need new functions PyUnicodeEncodeError_GetStartIndex to access the true start field. That would be expensive to compute Yeah, O(n) should be avoided when is it possible. Ok. I'll wait half a day or so for people to reconsider (now knowing that it's actually feasible to be fully backwards compatible); if nobody speaks up, I go ahead and accept the breakage. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode exception indexing
On Thu, 03 Nov 2011 22:47:00 +0100 Martin v. Löwis mar...@v.loewis.de wrote: On the one hand, these indices are used in formatting error messages such as codec can't encode character \u%04x in position %d, suggesting they are regular indices into the string (counting code points). On the other hand, they are used by error handlers to lookup the character, and existing error handlers (including the ones we have now) use PyUnicode_AsUnicode to find the character. This suggests that the indices should be Py_UNICODE indices, for compatibility (and they currently do work in this way). But what about error handlers written in Python? I'm working on a patch where an C error handler using PyUnicodeEncodeError_GetStart gets a different value than a Python error handler accessing .start. The _GetStart/_GetEnd functions would take the value from the exception object, and adjust it before returning it. Is it worth the hassle? We can just port our existing error handlers, and I guess the few third-party error handlers written in C (if any) can bear the transition. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com