Re: [Python-Dev] Unicode exception indexing

Martin v. Löwis Thu, 03 Nov 2011 14:48:57 -0700

>> On the one hand, these indices are used in formatting error messages such as
>> "codec can't encode character \u%04x in position %d", suggesting they  
>> are regular
>> indices into the string (counting code points).
>>
>> On the other hand, they are used by error handlers to lookup the character,
>> and existing error handlers (including the ones we have now) use
>> PyUnicode_AsUnicode to find the character. This suggests that the indices
>> should be Py_UNICODE indices, for compatibility (and they currently do
>> work in this way).
> 
> But what about error handlers written in Python?


I'm working on a patch where an C error handler using
PyUnicodeEncodeError_GetStart gets a different value than a Python
error handler accessing .start. The _GetStart/_GetEnd functions would
take the value from the exception object, and adjust it before returning
it.

The implementation is fairly straight-forward, just a little expensive
(in the case of non-BMP strings on Windows).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unicode exception indexing

Reply via email to