[Python-Dev] Byte code arguments from two to one byte: did anyone try this?

Jurjen N.E. Bos Mon, 31 Jan 2011 02:03:46 -0800

I tried to find any research on this subject, but I couldn't find any,

so I'll be daring and vulnerable and just try it out to see what yourthoughts

are.

I single stepped a simple loop in Python to see where the efficiencybottlenecks are.I was impressed by the optimizations already in there, but I stilldare to suggest an optimization that from my estimates might shaveoff a few cycles, speeding up Python about 5%.The idea is simple: change the byte code argument values from twobytes to one.

Implications are:
- code changes are relatively simple, see below
- fewer memory reads, which are becoming more and more expensive

- saves three instructions for every opcode with args (i.e. most ofthem)



Code changes are, as far as I could find:
compile.c:
assemble_emit must produce extended opcodes
    for all cases of more than 8 bits instead of 16

ceval.c:
NEXTARG and PEEKARG need adjustment
EXTENDED_ARG needs adjustment
    (this will be a four byte instruction, which is ugly, I agree)

peephole.c:
GETARG, SETARG, need adjustment
also GETJUMPTGT, CODESIZE
routine tuple_of_constants, fold_binops_on_constants, PyCode_Optimize
    are dependent on instruction length, which will be 2 instead of 3
(search for the digit 3 will find all cases, as far as I checked)
you probably will have to write a macro for codestr[i+3]

there is a check for code length >32700, but I think this one mightstay,

maybe if a few extra checks are added.

dis:
minor adjustments


Estimation of speed impact:

about 80% of the instructions seem to have an argument, and I neversaw an opcode >255 while looking at bytecode, so they are probablynot frequent.


The NEXTARG macro expands on my Macbook to:

mov    -408(%ebp),%edx        (next_instr)
movzbl 2(%edx),%eax           (*second byte)
shl    $0x8,%eax              (*shift)
movzbl 1(%edx),%edx           (first byte)
add    %edx,%eax              (*combine)

and the starred instructions will vanish.

The main loop is approximately 40 instructions, so a saving of threeinstructions is significant. I don't dare to claim 3/40 = 7.5% savings,

but I think 5% may be realistic.

Did anyone try this already? If not, I might take up the gauntlet
and try it myself, but I never did this before...


- Jurjen

PS I also saw that some scratch variables, mainly v and x, arecarefull stored back in memory by the compiler and the end of the biginterpreter loop, while their value isn't used anymore, of course.A few carefully placed braces might tell the compiler how uselessthis is and

save another few percent.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Byte code arguments from two to one byte: did anyone try this?

Reply via email to