I made a mistake talking about the example. Exchange binary operator and unary operator.
Cesare 2010/1/29 Cesare Di Mauro <cesare.di.ma...@gmail.com> > 2010/1/29 Nick Coghlan <ncogh...@gmail.com> > > I wouldn't consider changing from bytecode to wordcode uncontroversial - >> the potential to have an effect on cache hit ratios means it needs to be >> benchmarked (the U-S performance tests should be helpful there). >> > > It's quite strange, but from the tests made it seems that wpython perform > better with old architectures (such as my Athlon64 socket 754), which have > less resources like caches. > > It'll be interesting to check how it works on more limited ISAs. I'm > especially curious about ARMs. > > >> It's the same basic problem where any changes to the ceval loop can have >> surprising performance effects due to the way they affect the compiled >> switch statements ability to fit into the cache and other low level >> processor weirdness. >> >> Cheers, >> Nick. >> > > Sure, but consider that with wpython wordcodes require less space on > average. Also, less instructions are executed inside the ceval loop, thanks > to some natural instruction grouping. > > For example, I recently introduced in wpython 1.1 a new opcode to handle > more efficiently expression generators. It's mapped as a unary operator, so > it exposes interesting properties which I'll show you with an example. > > def f(a): > return sum(x for x in a) > > With CPython 2.6.4 it generates: > > 0 LOAD_GLOBAL 0 (sum) > 3 LOAD_CONST 1 (<code object <genexpr> at 00512EC8, file "<stdin>", line > 1>) > 6 MAKE_FUNCTION 0 > 9 LOAD_FAST 0 (a) > 12 GET_ITER > 13 CALL_FUNCTION 1 > 16 CALL_FUNCTION 1 > 19 RETURN_VALUE > > With wpython 1.1: > > 0 LOAD_GLOBAL 0 (sum) > 1 LOAD_CONST 1 (<code object <genexpr> at 01F13208, file "<stdin>", line > 1>) > 2 MAKE_FUNCTION 0 > 3 FAST_BINOP get_generator a > 5 QUICK_CALL_FUNCTION 1 > 6 RETURN_VALUE > > The new opcode is GET_GENERATOR, which is equivalent (but more efficient, > using a faster internal function call) to: > > GET_ITER > CALL_FUNCTION 1 > > The compiler initially generated the following opcodes: > > LOAD_FAST 0 (a) > GET_GENERATOR > > then the peepholer recognized the pattern UNARY(FAST), and produced the > single opcode: > > FAST_BINOP get_generator a > > In the end, the ceval loop executes a single instruction instead of three. > The wordcode requires 14 bytes to be stored instead of 20, so it will use 1 > data cache line instead of 2 on CPUs with 16 bytes lines data cache. > > The same grouping behavior happens with binary operators as well. Opcodes > aggregation is a natural and useful concept with the new wordcode structure. > > Cheers, > Cesare >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com