On Sat, Sep 27, 2008 at 11:35 AM, Arnar Birgisson <[EMAIL PROTECTED]> wrote: > Hi Bob, > > On Sat, Sep 27, 2008 at 20:16, Bob Ippolito <[EMAIL PROTECTED]> wrote: >> Even without the C speedups, it's several times faster. With the C >> speedups, it's WAY faster. I highly recommend that everyone update >> their frameworks to use the latest code. > > Excellent stuff! > >> There aren't really any API breaking changes, but when decoding a str >> input it will return str objects instead of unicode if the str is all >> ASCII with no escaped characters. I'm not aware of any scenario other >> than doctests where this could be a problem. > > Is it configurable to make it always decode to unicode? Did you make > this change for performance?
If you give it unicode input, it will decode to unicode. Basically it scans through the str until it finds non-ASCII, escape, or end quote. If it finds the end quote first it will just allocate a new string with exactly that day, which is super fast since it's just an alloc and copy. It will of course always decode everything containing non-ASCII characters or any escape sequences to unicode. It is not currently configurable. It was done for performance, but also does produce nicer looking repr output because you don't have so many 'u' characters to look at :) Given the way str works in Python 2.x it should not be an incompatible change except for doctests... and I guess code that explicitly checks for unicode and doesn't know what to do with str, but that would be weird. -bob _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com