[issue1734234] Fast path for unicodedata.normalize()
Antoine Pitrou pit...@free.fr added the comment: Committed in r72054, r72055. Thanks for the patch! -- resolution: accepted - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1734234 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1734234] Fast path for unicodedata.normalize()
Daniel Diniz aja...@gmail.com added the comment: Should this be considered for 3.1? -- nosy: +ajaksu2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1734234 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1734234] Fast path for unicodedata.normalize()
Martin v. Löwis mar...@v.loewis.de added the comment: The patch looks fine to me, please apply. One change is necessary: the quick check should only be performed if it is the newest version (i.e. self is NULL); otherwise, we would need to add a delta list for changed quickcheck values, as well. I think it would be possible to fold the NO and MAYBE answers into NO in the database already, reducing the number of necessary bits to 4, and then allowing to check with a simple bit test (i.e. no shift). OTOH, the shift can be avoided already, by changing quickcheck_shift into a bitmask. OTTH, perhaps the compiler does that already, anyway. With a reduction of the number of bits, it would be possible to reclaim a byte, by merging the bits into one of the other fields. Whether that's worth it, I don't know. -- assignee: - pitrou resolution: - accepted ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1734234 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1734234] Fast path for unicodedata.normalize()
Antoine Pitrou pit...@free.fr added the comment: Here is a new patch against trunk, including the modified data files. All tests pass and I confirm a very healthy speed-up (~ 5x) when trying to a normalize an already normalized string. The slowdown for non-normalized strings is so small that it cannot be distinguished from measurement noise. Martin, do you think this can be committed? -- nosy: +loewis stage: - patch review type: - performance versions: +Python 2.7, Python 3.1 -Python 2.6 Added file: http://bugs.python.org/file12350/uninorm.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1734234 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1734234] Fast path for unicodedata.normalize()
Virgil Dupras added the comment: It's a very interesting patch. I wonder why it fell into oblivion. stuff like unicode.normalize('NFC', u'\xe9') was more than twice as fast for me. Making sure that all unicode is normalized can be a bottleneck in a lot of applications (it somewhat is in my apps). The downside is that it makes test_codecs and test_unicode_file fail. -- nosy: +vdupras _ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1734234 _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com