Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Antoine Pitrou
On Wed, 2 Nov 2011 19:41:30 -0700
Guido van Rossum gu...@python.org wrote:
 Apparently Macports is still using a buggy compiler.

If I understand things correctly, this is technically not a buggy
compiler but Python making optimistic assumptions about the C standard.
(from issue11149: clang (as with gcc 4.x) assumes signed integer
overflow is undefined. But Python depends on the fact that signed
integer overflow wraps)

I'd happily call that a buggy C standard, though :-)

Regards

Antoine.


 I reported a
 similar issue before and got this reply from Ned Delly:
 
 
 Thanks for the pointer.  That looks like a duplicate of Issue11149 (and
 Issue12701).  Another manifestation of this was reported in Issue13061
 which also originated from MacPorts.  I'll remind them that the
 configure change is likely needed for all Pythons.  It's still safest to
 stick with good old gcc-4.2 on OS X at the moment.
 
 
 (Those issues are on bugs.python.org.)
 
 --Guido
 
 On Wed, Nov 2, 2011 at 7:32 PM, Derek Shockey derek.shoc...@gmail.com wrote:
  I just found an unexpected behavior and I'm wondering if it is a bug.
  In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it
  appears that integers are not correctly overflowing into longs and
  instead are yielding bizarre results. I can only reproduce this when
  using the exponent operator with two ints (declaring either operand
  explicitly as long prevents the behavior).
 
  2**100
  0
  2**100L
  1267650600228229401496703205376L
 
  20**20
  -2101438300051996672
  20L**20
  1048576L
 
  10**20
  7766279631452241920
  10L**20L
  1L
 
  To confirm I'm not crazy, I tried in the 2.7.1 and 2.6.7 installations
  included in OS X 10.7, and also a 2.7.2+ (not sure what the + is) on
  an Ubuntu machine and didn't see this behavior. This looks like some
  kind of truncation error, but I don't know much about the internals of
  Python and have no idea what's going on. I assume since it's only in
  my MacPorts installation, it must be build configuration issue that is
  specific to OS X, perhaps only 10.7, or MacPorts.
 
  Am I doing something wrong, and is there a way to fix it before I
  compile? I could find any references to this problem as a known issue.
 
  Thanks,
  Derek
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
  http://mail.python.org/mailman/options/python-dev/guido%40python.org
 
 
 
 
 -- 
 --Guido van Rossum (python.org/~guido)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Victor Stinner
Le Mercredi 2 Novembre 2011 19:32:38 Derek Shockey a écrit :
 I just found an unexpected behavior and I'm wondering if it is a bug.
 In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it
 appears that integers are not correctly overflowing into longs and
 instead are yielding bizarre results. I can only reproduce this when
 using the exponent operator with two ints (declaring either operand
 explicitly as long prevents the behavior).
 
  2**100
 
 0

This issue has already been fixed twice in Python 2.7 branch: int_pow() has 
been fixed and -fwrapv is now used for Clang.

http://bugs.python.org/issue11149
http://bugs.python.org/issue12973

It is maybe time for a new release? :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] draft PEP: virtual environments

2011-11-03 Thread VanL

For what its worth

On 11/1/2011 11:43 AM, Paul Moore wrote:

On 1 November 2011 16:40, Paul Moorep.f.mo...@gmail.com  wrote:

On 1 November 2011 16:29, Paul Moorep.f.mo...@gmail.com  wrote:

On 31 October 2011 20:10, Carl Meyerc...@oddbird.net  wrote:

For Windows, can you point me at the nt scripts? If they aren't too
complex, I'd be willing to port to Powershell.


For what its worth, there have been a number of efforts in this direction:

https://bitbucket.org/guillermooo/virtualenvwrapper-powershell
https://bitbucket.org/vanl/virtualenvwrapper-powershell

(Both different implementations)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot failures

2011-11-03 Thread Brian Curtin
On Sat, Oct 22, 2011 at 14:30, Andrea Crotti andrea.crott...@gmail.com wrote:
 On 10/21/2011 10:08 PM, Antoine Pitrou wrote:

 Hello,

 There are currently a bunch of various buildbot failures on all 3
 branches. I would remind committers to regularly take a look at the
 buildbots, so that these failures get solved reasonably fast.

 Regards

 Antoine.

 In my previous workplace if someone broke a build committing something wrong
 he/she
 had to bring cake for everyone next meeting.

 The cake is not really feasible I guess, but isn't it possible to notify the
 developer that
 broke the build?

You just have to keep track and bring all of the cakes that you owe to PyCon.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Éric Araujo
Hi Derek,

 I tried in the 2.7.1 and 2.6.7 installations included in OS X 10.7,
 and also a 2.7.2+ (not sure what the + is)

The + means that’s it’s 2.7.2 + some commits, in other words the
in-development version that will become 2.7.3.  This bit of info seems
to be missing from the doc.

Regards
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Unicode exception indexing

2011-11-03 Thread martin

There is a backwards compatibility issue with PEP 393 and Unicode exceptions:
the start and end indices: are they Py_UNICODE indices, or code point indices?

On the one hand, these indices are used in formatting error messages such as
codec can't encode character \u%04x in position %d, suggesting they  
are regular

indices into the string (counting code points).

On the other hand, they are used by error handlers to lookup the character,
and existing error handlers (including the ones we have now) use
PyUnicode_AsUnicode to find the character. This suggests that the indices
should be Py_UNICODE indices, for compatibility (and they currently do
work in this way).

The indices can only be different if the string is an UCS-4 string, and
Py_UNICODE is a two-byte type (i.e. on Windows).

So what should it be?

As a compromise, it would be possible to convert between these indices,
by counting the non-BMP characters that precede the index if the indices
might differ. That would be expensive to compute, but provide backwards
compatibility to the C API. It's less clear what backwards compatibility
to Python code would require - most likely, people would use the indices
for slicing operations (rather than performing an UTF-16 conversion and
performing indexing on that).

Regards,
Martin



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Derek Shockey
I believe you're right. The 2.7.2 MacPorts portfile definitely passes
the -fwrapv flag to clang, but the bad behavior still occurs with
exponents. I verified the current head of the 2.7 branch does not have
this problem when built with clang, so I'm assuming that issue12973
resolved this with a patch to int_pow() and that it will be out in the
next release.

-Derek

On Thu, Nov 3, 2011 at 4:30 AM, Antoine Pitrou solip...@pitrou.net wrote:
 On Wed, 2 Nov 2011 19:41:30 -0700
 Guido van Rossum gu...@python.org wrote:
 Apparently Macports is still using a buggy compiler.

 If I understand things correctly, this is technically not a buggy
 compiler but Python making optimistic assumptions about the C standard.
 (from issue11149: clang (as with gcc 4.x) assumes signed integer
 overflow is undefined. But Python depends on the fact that signed
 integer overflow wraps)

 I'd happily call that a buggy C standard, though :-)

 Regards

 Antoine.


 I reported a
 similar issue before and got this reply from Ned Delly:

 
 Thanks for the pointer.  That looks like a duplicate of Issue11149 (and
 Issue12701).  Another manifestation of this was reported in Issue13061
 which also originated from MacPorts.  I'll remind them that the
 configure change is likely needed for all Pythons.  It's still safest to
 stick with good old gcc-4.2 on OS X at the moment.
 

 (Those issues are on bugs.python.org.)

 --Guido

 On Wed, Nov 2, 2011 at 7:32 PM, Derek Shockey derek.shoc...@gmail.com 
 wrote:
  I just found an unexpected behavior and I'm wondering if it is a bug.
  In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it
  appears that integers are not correctly overflowing into longs and
  instead are yielding bizarre results. I can only reproduce this when
  using the exponent operator with two ints (declaring either operand
  explicitly as long prevents the behavior).
 
  2**100
  0
  2**100L
  1267650600228229401496703205376L
 
  20**20
  -2101438300051996672
  20L**20
  1048576L
 
  10**20
  7766279631452241920
  10L**20L
  1L
 
  To confirm I'm not crazy, I tried in the 2.7.1 and 2.6.7 installations
  included in OS X 10.7, and also a 2.7.2+ (not sure what the + is) on
  an Ubuntu machine and didn't see this behavior. This looks like some
  kind of truncation error, but I don't know much about the internals of
  Python and have no idea what's going on. I assume since it's only in
  my MacPorts installation, it must be build configuration issue that is
  specific to OS X, perhaps only 10.7, or MacPorts.
 
  Am I doing something wrong, and is there a way to fix it before I
  compile? I could find any references to this problem as a known issue.
 
  Thanks,
  Derek
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
  http://mail.python.org/mailman/options/python-dev/guido%40python.org
 



 --
 --Guido van Rossum (python.org/~guido)


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/derek.shockey%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Stefan Krah
Derek Shockey derek.shoc...@gmail.com wrote:
 I believe you're right. The 2.7.2 MacPorts portfile definitely passes
 the -fwrapv flag to clang, but the bad behavior still occurs with
 exponents.

Really? Even without the fix for issue12973 the -fwrapv flag
should be sufficient, as reported in issue13061 and Issue11149.

For clang version 3.0 (trunk 139691) on FreeBSD this is the case.


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Victor Stinner
Le jeudi 3 novembre 2011 18:14:42, mar...@v.loewis.de a écrit :
 There is a backwards compatibility issue with PEP 393 and Unicode
 exceptions: the start and end indices: are they Py_UNICODE indices, or
 code point indices?

Oh oh. That's exactly why I didn't want to start to work on this issue.
http://bugs.python.org/issue13064

In a Python error handler, exc.object[exc.start:exc.end] should be used to get 
the unencodable/undecodable substring.

In a C error handler, it depends if you use a Py_UNICODE* pointer or 
PyUnicode_Substring() / PyUnicode_READ.

Using google.fr/codesearch, I found some user error handlers implemented in 
Python:
 * straw: html_replace
 * Nuxeo: latin9_fallback
 * peerscape: htmlentityescape
 * pymt: cssescape
 * 

I found no error implemented in C (not any call to PyCodec_RegisterError).

 So what should it be?

I suggest to use code point indices. Code point indices is also now more 
natural with the PEP 393.

Because it is an incompatible change, it should be documented in the PEP and 
in the What's new in Python 3.3 document.

 As a compromise, it would be possible to convert between these indices,
 by counting the non-BMP characters that precede the index if the indices
 might differ.

I started such hack for the UTF-8 codec... It is really tricky, we should not 
do that!

 That would be expensive to compute

Yeah, O(n) should be avoided when is it possible.

--

FYI I implemented a proof-of-concept in Python of the surrogateescape error 
handler for Python 2 (for Mercurial):
https://bitbucket.org/haypo/misc/src/tip/python/surrogateescape.py

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot failures

2011-11-03 Thread Stefan Behnel

Brian Curtin, 03.11.2011 15:59:

On Sat, Oct 22, 2011 at 14:30, Andrea Crotti wrote:

On 10/21/2011 10:08 PM, Antoine Pitrou wrote:


Hello,

There are currently a bunch of various buildbot failures on all 3
branches. I would remind committers to regularly take a look at the
buildbots, so that these failures get solved reasonably fast.

Regards

Antoine.


In my previous workplace if someone broke a build committing something wrong
he/she
had to bring cake for everyone next meeting.

The cake is not really feasible I guess, but isn't it possible to notify the
developer that
broke the build?


You just have to keep track and bring all of the cakes that you owe to PyCon.


Did you mean PieCon?

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Antoine Pitrou
On Thu, 03 Nov 2011 18:14:42 +0100
mar...@v.loewis.de wrote:
 There is a backwards compatibility issue with PEP 393 and Unicode exceptions:
 the start and end indices: are they Py_UNICODE indices, or code point indices?
 
 On the one hand, these indices are used in formatting error messages such as
 codec can't encode character \u%04x in position %d, suggesting they  
 are regular
 indices into the string (counting code points).
 
 On the other hand, they are used by error handlers to lookup the character,
 and existing error handlers (including the ones we have now) use
 PyUnicode_AsUnicode to find the character. This suggests that the indices
 should be Py_UNICODE indices, for compatibility (and they currently do
 work in this way).

But what about error handlers written in Python?

 The indices can only be different if the string is an UCS-4 string, and
 Py_UNICODE is a two-byte type (i.e. on Windows).
 
 So what should it be?

I'd say let's do the Right Thing and accept the small compatibility
breach (surrogates on UCS-2 builds).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Derek Shockey
You're right; among my many tests I think I muddled the situation with
a stray CFLAGS variable in my environment. Apologies for the
misinformation. The current MacPorts portfile does not add -fwrapv.
Adding -fwrapv to OPT in the Makefile solves the problem. I confirmed
by manually building the v2.7.2 tag with clang and -fwrapv, and the
overflow behavior is correct. I've notified the MacPorts package
maintainer.


-Derek

On Thu, Nov 3, 2011 at 11:07 AM, Stefan Krah ste...@bytereef.org wrote:
 Derek Shockey derek.shoc...@gmail.com wrote:
 I believe you're right. The 2.7.2 MacPorts portfile definitely passes
 the -fwrapv flag to clang, but the bad behavior still occurs with
 exponents.

 Really? Even without the fix for issue12973 the -fwrapv flag
 should be sufficient, as reported in issue13061 and Issue11149.

 For clang version 3.0 (trunk 139691) on FreeBSD this is the case.


 Stefan Krah


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/derek.shockey%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Guido van Rossum
On Thu, Nov 3, 2011 at 12:29 PM, Antoine Pitrou solip...@pitrou.net wrote:
 On Thu, 03 Nov 2011 18:14:42 +0100
 mar...@v.loewis.de wrote:
 There is a backwards compatibility issue with PEP 393 and Unicode exceptions:
 the start and end indices: are they Py_UNICODE indices, or code point 
 indices?

 On the one hand, these indices are used in formatting error messages such as
 codec can't encode character \u%04x in position %d, suggesting they
 are regular
 indices into the string (counting code points).

 On the other hand, they are used by error handlers to lookup the character,
 and existing error handlers (including the ones we have now) use
 PyUnicode_AsUnicode to find the character. This suggests that the indices
 should be Py_UNICODE indices, for compatibility (and they currently do
 work in this way).

 But what about error handlers written in Python?

 The indices can only be different if the string is an UCS-4 string, and
 Py_UNICODE is a two-byte type (i.e. on Windows).

 So what should it be?

 I'd say let's do the Right Thing and accept the small compatibility
 breach (surrogates on UCS-2 builds).

+1

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Terry Reedy

On 11/3/2011 3:16 PM, Victor Stinner wrote:

Le jeudi 3 novembre 2011 18:14:42, mar...@v.loewis.de a écrit :

There is a backwards compatibility issue with PEP 393 and Unicode
exceptions: the start and end indices: are they Py_UNICODE indices, or
code point indices?


I had the impression that we were abolishing the wide versus narrow 
build difference and that this issue would disappear. I must have missed 
something.



So what should it be?


I suggest to use code point indices. Code point indices is also now more
natural with the PEP 393.


I think we should look forward, not backwards. Error messages are 
defined as undefined ;-). So I think we should do what is right for the 
new implementation. I suspect that means that I am agreeing with both 
Victor and Antoine.



Because it is an incompatible change, it should be documented in the PEP and
in the What's new in Python 3.3 document.

...

Yeah, O(n) should be avoided when is it possible.


Definitely to both.

--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Martin v. Löwis
Am 03.11.2011 22:19, schrieb Terry Reedy:
 On 11/3/2011 3:16 PM, Victor Stinner wrote:
 Le jeudi 3 novembre 2011 18:14:42, mar...@v.loewis.de a écrit :
 There is a backwards compatibility issue with PEP 393 and Unicode
 exceptions: the start and end indices: are they Py_UNICODE indices, or
 code point indices?
 
 I had the impression that we were abolishing the wide versus narrow
 build difference and that this issue would disappear. I must have missed
 something.

Most certainly. The Py_UNICODE type continues to exist for backwards
compatibility. It is now always a typedef for wchar_t, which makes it
a 16-bit type on Windows.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Martin v. Löwis
 On the one hand, these indices are used in formatting error messages such as
 codec can't encode character \u%04x in position %d, suggesting they  
 are regular
 indices into the string (counting code points).

 On the other hand, they are used by error handlers to lookup the character,
 and existing error handlers (including the ones we have now) use
 PyUnicode_AsUnicode to find the character. This suggests that the indices
 should be Py_UNICODE indices, for compatibility (and they currently do
 work in this way).
 
 But what about error handlers written in Python?

I'm working on a patch where an C error handler using
PyUnicodeEncodeError_GetStart gets a different value than a Python
error handler accessing .start. The _GetStart/_GetEnd functions would
take the value from the exception object, and adjust it before returning
it.

The implementation is fairly straight-forward, just a little expensive
(in the case of non-BMP strings on Windows).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Martin v. Löwis
 I started such hack for the UTF-8 codec... It is really tricky, we should not 
 do that!

With the proper encapsulation, it's not that tricky. I have written
functions PyUnicode_IndexToWCharIndex and PyUnicode_WCharIndexToIndex,
and PyUnicodeEncodeError_GetStart and friends would use that function.
I'd also need new functions PyUnicodeEncodeError_GetStartIndex to access
the true start field.

 That would be expensive to compute
 
 Yeah, O(n) should be avoided when is it possible.

Ok. I'll wait half a day or so for people to reconsider (now knowing
that it's actually feasible to be fully backwards compatible); if nobody
speaks up, I go ahead and accept the breakage.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Terry Reedy



On 11/3/2011 5:43 PM, Martin v. Löwis wrote:


I had the impression that we were abolishing the wide versus narrow
build difference and that this issue would disappear. I must have missed
something.


Most certainly. The Py_UNICODE type continues to exist for backwards
compatibility. It is now always a typedef for wchar_t, which makes it
a 16-bit type on Windows.


Thank you for answering: My revised impression now is that any string I 
create with Python code in Python 3.3+ (as distributed, without 
extensions or ctypes calls) will use the new implementation and will 
index and and slice correctly, even with extended chars. So indexing is 
only an issue for those writing or using C-coded extensions with the old 
unicode C-API on systems with a 16-bit wchar_t. Correct?


---
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Nick Coghlan
Your approach (doing the right thing for both Python and C, new API to
avoid the C performance problem) sounds good to me.

--
Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
On Nov 4, 2011 7:58 AM, Martin v. Löwis mar...@v.loewis.de wrote:

  I started such hack for the UTF-8 codec... It is really tricky, we
 should not
  do that!

 With the proper encapsulation, it's not that tricky. I have written
 functions PyUnicode_IndexToWCharIndex and PyUnicode_WCharIndexToIndex,
 and PyUnicodeEncodeError_GetStart and friends would use that function.
 I'd also need new functions PyUnicodeEncodeError_GetStartIndex to access
 the true start field.

  That would be expensive to compute
 
  Yeah, O(n) should be avoided when is it possible.

 Ok. I'll wait half a day or so for people to reconsider (now knowing
 that it's actually feasible to be fully backwards compatible); if nobody
 speaks up, I go ahead and accept the breakage.

 Regards,
 Martin
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Antoine Pitrou
On Thu, 03 Nov 2011 22:47:00 +0100
Martin v. Löwis mar...@v.loewis.de wrote:

  On the one hand, these indices are used in formatting error messages such 
  as
  codec can't encode character \u%04x in position %d, suggesting they  
  are regular
  indices into the string (counting code points).
 
  On the other hand, they are used by error handlers to lookup the character,
  and existing error handlers (including the ones we have now) use
  PyUnicode_AsUnicode to find the character. This suggests that the indices
  should be Py_UNICODE indices, for compatibility (and they currently do
  work in this way).
  
  But what about error handlers written in Python?
 
 I'm working on a patch where an C error handler using
 PyUnicodeEncodeError_GetStart gets a different value than a Python
 error handler accessing .start. The _GetStart/_GetEnd functions would
 take the value from the exception object, and adjust it before returning
 it.

Is it worth the hassle? We can just port our existing error handlers,
and I guess the few third-party error handlers written in C (if any)
can bear the transition.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com