New submission from Serhiy Storchaka: The proposed patch optimizes unicode-escape and raw-unicode-escape codecs. Coders still slower than in 3.2, but much faster than in 3.3. Further speedup is possible with the use of stringlib, but I think that this is enough. The code unified and simplified (251 insertions, 345 deletions).
Benchmark results (on AMD Athlon 64 X2 4600+): Py2.7 Py3.2 Py3.3 Py3.4+patch 193 (+11%) 325 (-34%) 66 (+224%) 214 decode unicode-escape 'A'*10000 138 (+72%) 241 (-1%) 154 (+55%) 238 decode unicode-escape '\x80'*10000 193 (+10%) 323 (-34%) 72 (+194%) 212 decode unicode-escape '\x80'+'A'*9999 160 (+59%) 273 (-7%) 169 (+51%) 255 decode unicode-escape '\u0100'*10000 193 (-7%) 324 (-44%) 61 (+195%) 180 decode unicode-escape '\u0100'+'A'*9999 138 (+67%) 242 (-5%) 135 (+71%) 231 decode unicode-escape '\u0100'+'\x80'*9999 160 (+59%) 274 (-7%) 169 (+51%) 255 decode unicode-escape '\u8000'*10000 193 (-7%) 323 (-44%) 61 (+195%) 180 decode unicode-escape '\u8000'+'A'*9999 138 (+67%) 242 (-5%) 135 (+71%) 231 decode unicode-escape '\u8000'+'\x80'*9999 160 (+60%) 276 (-7%) 169 (+51%) 256 decode unicode-escape '\u8000'+'\u0100'*9999 178 (+42%) 275 (-8%) 177 (+43%) 253 decode unicode-escape '\U00010000'*10000 192 (+30%) 323 (-23%) 61 (+310%) 250 decode unicode-escape '\U00010000'+'A'*9999 139 (+35%) 243 (-23%) 119 (+57%) 187 decode unicode-escape '\U00010000'+'\x80'*9999 161 (+38%) 273 (-19%) 150 (+48%) 222 decode unicode-escape '\U00010000'+'\u0100'*9999 161 (+38%) 273 (-19%) 150 (+48%) 222 decode unicode-escape '\U00010000'+'\u8000'*9999 558 (-62%) 427 (-50%) 82 (+161%) 214 decode raw-unicode-escape 'A'*10000 560 (-62%) 425 (-50%) 75 (+183%) 212 decode raw-unicode-escape '\x80'*10000 558 (-62%) 425 (-50%) 75 (+183%) 212 decode raw-unicode-escape '\x80'+'A'*9999 178 (+75%) 235 (+32%) 108 (+188%) 311 decode raw-unicode-escape '\u0100'*10000 555 (-62%) 424 (-50%) 61 (+248%) 212 decode raw-unicode-escape '\u0100'+'A'*9999 559 (-62%) 424 (-50%) 61 (+248%) 212 decode raw-unicode-escape '\u0100'+'\x80'*9999 179 (+74%) 235 (+32%) 108 (+188%) 311 decode raw-unicode-escape '\u8000'*10000 555 (-62%) 424 (-50%) 61 (+248%) 212 decode raw-unicode-escape '\u8000'+'A'*9999 558 (-62%) 425 (-50%) 61 (+248%) 212 decode raw-unicode-escape '\u8000'+'\x80'*9999 178 (+75%) 235 (+32%) 108 (+188%) 311 decode raw-unicode-escape '\u8000'+'\u0100'*9999 200 (+18%) 249 (-5%) 132 (+79%) 236 decode raw-unicode-escape '\U00010000'*10000 554 (-58%) 423 (-46%) 61 (+277%) 230 decode raw-unicode-escape '\U00010000'+'A'*9999 558 (-59%) 424 (-46%) 61 (+277%) 230 decode raw-unicode-escape '\U00010000'+'\x80'*9999 178 (+46%) 235 (+11%) 100 (+160%) 260 decode raw-unicode-escape '\U00010000'+'\u0100'*9999 178 (+44%) 235 (+9%) 100 (+157%) 257 decode raw-unicode-escape '\U00010000'+'\u8000'*9999 182 (+137%) 215 (+101%) 148 (+192%) 432 encode unicode-escape 'A'*10000 582 (-10%) 617 (-16%) 470 (+11%) 521 encode unicode-escape '\x80'*10000 182 (+131%) 215 (+96%) 148 (+184%) 421 encode unicode-escape '\x80'+'A'*9999 624 (-7%) 967 (-40%) 558 (+4%) 579 encode unicode-escape '\u0100'*10000 183 (-19%) 215 (-31%) 132 (+12%) 148 encode unicode-escape '\u0100'+'A'*9999 584 (-23%) 617 (-27%) 464 (-3%) 451 encode unicode-escape '\u0100'+'\x80'*9999 627 (-8%) 968 (-40%) 557 (+4%) 579 encode unicode-escape '\u8000'*10000 183 (-19%) 215 (-31%) 148 (+0%) 148 encode unicode-escape '\u8000'+'A'*9999 584 (-23%) 617 (-27%) 490 (-8%) 451 encode unicode-escape '\u8000'+'\x80'*9999 629 (-8%) 969 (-40%) 555 (+4%) 578 encode unicode-escape '\u8000'+'\u0100'*9999 931 (-39%) 939 (-39%) 602 (-5%) 572 encode unicode-escape '\U00010000'*10000 183 (+7%) 215 (-9%) 180 (+9%) 196 encode unicode-escape '\U00010000'+'A'*9999 584 (-9%) 617 (-13%) 482 (+11%) 534 encode unicode-escape '\U00010000'+'\x80'*9999 630 (-14%) 962 (-43%) 565 (-4%) 544 encode unicode-escape '\U00010000'+'\u0100'*9999 630 (-14%) 964 (-44%) 565 (-4%) 544 encode unicode-escape '\U00010000'+'\u8000'*9999 332 (+1459%) 330 (+1468%) 333 (+1454%) 5175 encode raw-unicode-escape 'A'*10000 332 (+1589%) 329 (+1604%) 333 (+1584%) 5607 encode raw-unicode-escape '\x80'*10000 336 (+1569%) 334 (+1579%) 333 (+1584%) 5607 encode raw-unicode-escape '\x80'+'A'*9999 904 (-38%) 911 (-39%) 557 (+0%) 558 encode raw-unicode-escape '\u0100'*10000 336 (+15%) 335 (+16%) 197 (+97%) 388 encode raw-unicode-escape '\u0100'+'A'*9999 335 (+16%) 335 (+16%) 197 (+97%) 388 encode raw-unicode-escape '\u0100'+'\x80'*9999 904 (-38%) 913 (-39%) 557 (+0%) 558 encode raw-unicode-escape '\u8000'*10000 335 (+16%) 335 (+16%) 197 (+96%) 387 encode raw-unicode-escape '\u8000'+'A'*9999 335 (+16%) 335 (+16%) 196 (+98%) 388 encode raw-unicode-escape '\u8000'+'\x80'*9999 912 (-39%) 909 (-39%) 554 (+1%) 558 encode raw-unicode-escape '\u8000'+'\u0100'*9999 966 (-40%) 997 (-42%) 584 (-0%) 583 encode raw-unicode-escape '\U00010000'*10000 336 (-42%) 335 (-41%) 213 (-8%) 196 encode raw-unicode-escape '\U00010000'+'A'*9999 336 (-42%) 335 (-41%) 213 (-8%) 196 encode raw-unicode-escape '\U00010000'+'\x80'*9999 911 (-43%) 911 (-43%) 570 (-8%) 522 encode raw-unicode-escape '\U00010000'+'\u0100'*9999 911 (-43%) 913 (-43%) 570 (-8%) 522 encode raw-unicode-escape '\U00010000'+'\u8000'*9999 ---------- components: Interpreter Core, Unicode files: faster_unicode_escape.patch keywords: 3.3regression, patch messages: 173901 nosy: benjamin.peterson, ezio.melotti, haypo, lemburg, pitrou, serhiy.storchaka priority: normal severity: normal stage: patch review status: open title: Faster unicode-escape and raw-unicode-escape codecs type: performance versions: Python 3.4 Added file: http://bugs.python.org/file27740/faster_unicode_escape.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16334> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com