Re: [Python-Dev] [GSoC] Porting on RPM3

Nick Coghlan Tue, 22 Mar 2011 13:48:39 -0700

On Tue, Mar 22, 2011 at 7:29 PM, Panu Matilainen
<[email protected]> wrote:
> The bindings cannot go changing header contents to their liking, so any
> canonicalization would have to go into rpm proper, the build-side of things
> to be exact so the runtime doesn't have to care. Requiring rpm to fiddle
> with encodings + canonicalization for every single string it processes at
> runtime would require enormous changes throughout rpm, and presumably at a
> massive performance cost too.


Just a hint from our experience with APIs like os/email/urllib.parse:
you pretty much end up *needing* to have parallel bytes and str APIs
(including higher level data structures that know how to encode and
decode themselves) to get this to work properly. The str APIs will
work 90% of the time, but you still need access to the raw bytes to
recover when the simple approach fails. One key choice to be made is
whether to go the brittle option (i.e. ASCII) for the implicit
decoding, or the permissive one (i.e. UTF-8 with surrogateescape). The
former punts on the complicated encoding issues (e.g. urllib.parse
does this, since correctly formed URLs are meant to be encoded into
pure ASCII), while the latter works by default in more situations, but
can allow malformed data to escape the IO layer and cause problems in
other parts of the program (e.g. many of the os APIs do this, since
real world applications often care more about round tripping correctly
between different OS interfaces).

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [GSoC] Porting on RPM3

Reply via email to