I ran across an interesting paper about some VM optimizations yesterday: http://www.object-arts.com/Papers/TheInterpreterIsDead.PDF
One thing mentioned was that saving even one cycle in their 'PUSH_SELF' opcode improved interpreter performance by 5%. I thought that was pretty cool, and then I realized CPython doesn't even *have* a PUSH_SELF opcode. So, today, I took a stab at implementing one, by converting "LOAD_FAST 0" calls to a "LOAD_SELF" opcode. Pystone and Parrotbench improved by about 2% or so. That wasn't great, so I added a "SELF_ATTR" opcode that combines a LOAD_SELF and a LOAD_ATTR in the same opcode while avoiding extra stack and refcount manipulation. This raised the total improvement for pystone to about 5%, but didn't seem to improve parrotbench any further. I guess parrotbench doesn't do much self.attr stuff in places that really count, and looking at the code it indeed seems that most self.* stuff is done at higher levels of the parsing benchmark, not the innermost loops. Indeed, even pystone doesn't do much attribute access on the first argument of most of its functions, especially not those in inner loops. Only Proc1() and the Record.copy() method do anything that would be helped by SELF_ATTR. But it seems to me that this is very unusual for object-oriented code, and that more common uses of Python should be helped a lot more by this. Do we have any benchmarks that don't use 'foo = self.foo' type shortcuts in their inner loops? Anyway, my main question is, do these sound like worthwhile optimizations? The code isn't that complex; the only tricky thing I did was having the opcodes' error case (unbound local) fall through to the LOAD_FAST opcode so as not to duplicate the error handling code, in the hopes of keeping the eval loop size down. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com