On 10/03/2012 06:40 AM, Jason Connor wrote:
> Hi All,
> 
> Lately we've been struggling with a rash of bugs related to i18n input in 
> Pulp. Python 2's unicode support is only so-so and whenever we get non-ascii 
> or non-utf-8 encoded strings, we tend to run into trouble (the most common is 
> problematic encoding seems to be latin-1). Given that Python's str type is 
> really just a byte array with some built in smarts, it isn't really possible 
> to guess what the encoding might actually be.
> 
> To address this issue, I propose that we make string encoding as utf-8 a hard 
> requirement on the server. To enforce this, we'll try to decode all strings 
> from utf-8 and any failures will get a 400 server response with some sort of 
> standardized message: utf-8 encoded strings only (dummy), or something 
> similar.

+1

Boundary validation is the only way to ensure Unicode sanity in Python 2
(same goes for Python 3, it's just a lot harder to omit it
accidentally). You'll still need to figure out what to do with repos
that already contain non-ASCII entries with an unknown encoding though.

Cheers,
Nick.

-- 
Nick Coghlan
Red Hat Infrastructure Engineering & Development, Brisbane

_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list

Reply via email to