On Sat, Sep 27, 2008 at 23:24, Bob Ippolito <[EMAIL PROTECTED]> wrote: > On Sat, Sep 27, 2008 at 2:10 PM, Arnar Birgisson <[EMAIL PROTECTED]> wrote: >> On Sat, Sep 27, 2008 at 22:13, Bob Ippolito <[EMAIL PROTECTED]> wrote: >>> If you give it unicode input, it will decode to unicode. Basically it >>> scans through the str until it finds non-ASCII, escape, or end quote. >>> If it finds the end quote first it will just allocate a new string >>> with exactly that day, which is super fast since it's just an alloc >>> and copy. >>> >>> It will of course always decode everything containing non-ASCII >>> characters or any escape sequences to unicode. It is not currently >>> configurable. It was done for performance, but also does produce nicer >>> looking repr output because you don't have so many 'u' characters to >>> look at :) Given the way str works in Python 2.x it should not be an >>> incompatible change except for doctests... and I guess code that >>> explicitly checks for unicode and doesn't know what to do with str, >>> but that would be weird. >> >> The reason I asked was because I've had problems even with pure-ASCII >> strs when mixed with unicode objects in some DB-API drivers, working >> with filesystems on the OS-X and others. A "solution" was to have >> everything in unicode. > > I've never seen a pure ASCII str cause problems. I've seen pure ASCII > unicode cause problems in stupid ways though, because not all Python C > code that handles text can handle unicode. Dumb stuff like this bites > me all the time in Genshi templates (where all string literals are > unicode): > >>>> datetime.datetime.now().strftime(u'%Y') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: strftime() argument 1 must be str, not unicode > > Any operation involving a str and unicode should up-convert to > unicode, and regardless of the defaultencoding a pure ASCII str will > properly get handled (at least in Python 2.5, I don't remember what > 2.4 did)... e.g. ''.join(['', u'foo']) returns u'foo'
Right, I can't remember the exact details now - this was in my last job :) >> Since that means the string given to simplejson to decode will be a >> unicode string anyways, so in that case there's no problem :) > > If you can prove that there is an actual problem I'm sure I can come > up with a flag that would ensure unicode, but the implementation would > probably be just translation of the input document to unicode before > decoding ;) Well, don't worry about it :) cheers, Arnar _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com