Re: [Python-Dev] Unicode exception indexing

Antoine Pitrou Thu, 03 Nov 2011 19:24:40 -0700

On Thu, 03 Nov 2011 22:47:00 +0100
"Martin v. Löwis" <[email protected]> wrote:


> >> On the one hand, these indices are used in formatting error messages such 
> >> as
> >> "codec can't encode character \u%04x in position %d", suggesting they  
> >> are regular
> >> indices into the string (counting code points).
> >>
> >> On the other hand, they are used by error handlers to lookup the character,
> >> and existing error handlers (including the ones we have now) use
> >> PyUnicode_AsUnicode to find the character. This suggests that the indices
> >> should be Py_UNICODE indices, for compatibility (and they currently do
> >> work in this way).
> > 
> > But what about error handlers written in Python?
> 
> I'm working on a patch where an C error handler using
> PyUnicodeEncodeError_GetStart gets a different value than a Python
> error handler accessing .start. The _GetStart/_GetEnd functions would
> take the value from the exception object, and adjust it before returning
> it.

Is it worth the hassle? We can just port our existing error handlers,
and I guess the few third-party error handlers written in C (if any)
can bear the transition.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unicode exception indexing

Reply via email to