On Wed, Feb 24, 2010 at 4:49 PM, Christopher Hiller <[email protected]> wrote: > List, > > I'm having a difficult time with this particular problem. I have a codebase > where I would like to find all occurrences of implicit decodes. It's > difficult to do this with grep, and I was wondering if there was another way > by means of decorators or monkeypatching or compiler/parse tree analysis or > something. An example: > > foo = u'bar' + 'baz' > > This implicitly decodes "baz" using the system default encoding. In my case > this encoding is ASCII. > > However -- and this is where problems can arise -- what if you had this: > > foo = u'bar' + 'büz' > > ...which results in a SyntaxError if your default encoding is ASCII. > > Any ideas? I'm having problems googling for solutions because I'm not > entirely sure what to google for.
I went through this process myself recently. The path I took was to switch out the default unicode codec with one that explodes, run the unit tests, and incrementally fix the problems. The code is open source and you can snag it here: http://bitbucket.org/jek/flatland/src/75d8155a30a2/tests/__init__.py http://bitbucket.org/jek/flatland/src/75d8155a30a2/tests/_util.py The short version looks like: class NoCoercionCodec(codecs.Codec): def encode(self, input, errors='string'): raise UnicodeError("encoding coercion blocked") def decode(self, input, errors='strict'): raise UnicodeError("encoding coercion blocked") The real version is a little longer. The stdlib does some implicit conversions, and in my case I didn't want those to explode. _______________________________________________ Portland mailing list [email protected] http://mail.python.org/mailman/listinfo/portland
