I made a mistake talking about the example. Exchange binary operator and
unary operator.

Cesare

2010/1/29 Cesare Di Mauro <cesare.di.ma...@gmail.com>

> 2010/1/29 Nick Coghlan <ncogh...@gmail.com>
>
> I wouldn't consider changing from bytecode to wordcode uncontroversial -
>> the potential to have an effect on cache hit ratios means it needs to be
>> benchmarked (the U-S performance tests should be helpful there).
>>
>
> It's quite strange, but from the tests made it seems that wpython perform
> better with old architectures (such as my Athlon64 socket 754), which have
> less resources like caches.
>
> It'll be interesting to check how it works on more limited ISAs. I'm
> especially curious about ARMs.
>
>
>> It's the same basic problem where any changes to the ceval loop can have
>> surprising performance effects due to the way they affect the compiled
>> switch statements ability to fit into the cache and other low level
>> processor weirdness.
>>
>> Cheers,
>> Nick.
>>
>
> Sure, but consider that with wpython wordcodes require less space on
> average. Also, less instructions are executed inside the ceval loop, thanks
> to some natural instruction grouping.
>
> For example, I recently introduced in wpython 1.1 a new opcode to handle
> more efficiently expression generators. It's mapped as a unary operator, so
> it exposes interesting properties which I'll show you with an example.
>
> def f(a):
>     return sum(x for x in a)
>
> With CPython 2.6.4 it generates:
>
>   0 LOAD_GLOBAL 0 (sum)
>   3 LOAD_CONST 1 (<code object <genexpr> at 00512EC8, file "<stdin>", line
> 1>)
>   6 MAKE_FUNCTION 0
>   9 LOAD_FAST 0 (a)
> 12 GET_ITER
> 13 CALL_FUNCTION 1
> 16 CALL_FUNCTION 1
> 19 RETURN_VALUE
>
> With wpython 1.1:
>
> 0 LOAD_GLOBAL 0 (sum)
> 1 LOAD_CONST 1 (<code object <genexpr> at 01F13208, file "<stdin>", line
> 1>)
> 2 MAKE_FUNCTION 0
> 3 FAST_BINOP get_generator a
> 5 QUICK_CALL_FUNCTION 1
> 6 RETURN_VALUE
>
> The new opcode is GET_GENERATOR, which is equivalent (but more efficient,
> using a faster internal function call) to:
>
> GET_ITER
> CALL_FUNCTION 1
>
> The compiler initially generated the following opcodes:
>
> LOAD_FAST 0 (a)
> GET_GENERATOR
>
> then the peepholer recognized the pattern UNARY(FAST), and produced the
> single opcode:
>
> FAST_BINOP get_generator a
>
> In the end, the ceval loop executes a single instruction instead of three.
> The wordcode requires 14 bytes to be stored instead of 20, so it will use 1
> data cache line instead of 2 on CPUs with 16 bytes lines data cache.
>
> The same grouping behavior happens with binary operators as well. Opcodes
> aggregation is a natural and useful concept with the new wordcode structure.
>
> Cheers,
> Cesare
>
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to