Armin Rigo <[email protected]> added the comment: Yes, it's a known "issue": when using CFFI callbacks (precisely), PyPy needs to make sure that threads are set up. You can get the same slow-down by calling anything that sets up threads (for example any code that starts another thread). Once threads are set up, all program loops, either interpreted or jitted, must contain two extra assembler instructions to decrement and test a counter, to release the GIL from time to time. The overhead of doing so is usually very small, but can be seen in your extreme example, where the JIT would normally compile this trivial loop to a few assembler instructions in the first place --- and so two extra instructions is a big overhead.
It could be fixed with some trickery, like arranging to receive a signal after some milliseconds have elapsed, and have the signal overwrite the JITted loops' instructions in-place. This would avoids three or four instructions in the JITted loops (as it could also be used e.g. to handle KeyboardInterrupts). Unsure what the cost of this really is, in actual programs. Did you find this slow-down out only on trivial benchmarks, or also in real programs? ---------- nosy: +arigo ________________________________________ PyPy bug tracker <[email protected]> <https://bugs.pypy.org/issue1690> ________________________________________ _______________________________________________ pypy-issue mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-issue
