The answer to this question varies from platform to platform, and I've only go windows to test on...
If I do 32 "save"s in a row, this will certainly be slower than doing a single "push".
If I do 1 "save", this will (hopefully) be faster than 1 "push".
Yep slightly.
How many "save"s does it take to be to be slower than one "push"?
This really depends on the architecture, the running core and so on. But Dan estimated a cutoff value of 3, this test program indicates a cutoff of 2:
set I0, 1000000 time N0 lp: pushp # or save P0, ... popp # or restore P0, ... dec I0 if I0, lp time N1 sub N1, N0 print N1 print " s\n" end
Loop only 0.02s (0.002 -j) 1 save_p + 1 restore_p 0.2s 2 save_p + 2 restore_p 0.4s 3 save_p + 3 restore_p 0.6s 1 pushp + 1 popp 0.38s
All run with the CGP core (-P switch), which is fastest here because pushX/save/restore are not JITed.
Athlon 800, i386/linux, non optimized compile.
leo