On 10/03/2012 06:40 AM, Jason Connor wrote: > Hi All, > > Lately we've been struggling with a rash of bugs related to i18n input in > Pulp. Python 2's unicode support is only so-so and whenever we get non-ascii > or non-utf-8 encoded strings, we tend to run into trouble (the most common is > problematic encoding seems to be latin-1). Given that Python's str type is > really just a byte array with some built in smarts, it isn't really possible > to guess what the encoding might actually be. > > To address this issue, I propose that we make string encoding as utf-8 a hard > requirement on the server. To enforce this, we'll try to decode all strings > from utf-8 and any failures will get a 400 server response with some sort of > standardized message: utf-8 encoded strings only (dummy), or something > similar.
+1 Boundary validation is the only way to ensure Unicode sanity in Python 2 (same goes for Python 3, it's just a lot harder to omit it accidentally). You'll still need to figure out what to do with repos that already contain non-ASCII entries with an unknown encoding though. Cheers, Nick. -- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane _______________________________________________ Pulp-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-list
