Original RFC here: https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg06874.html
I included Richard's feedback (Thanks!) from the original RFC, and added quite a few things. This is now a proper PATCHset since it is a lot more mature. Highlights: - It works! I tested single/multi-threaded arm, aarch64 and alpha softmmu with various -smp's (up to 120 on aarch64) and -tb-size's. Also tested x86_64-linux-user with multi-threaded code. valgrind's drd shows no obvious issues (it doesn't swallow C11 atomics, so it spits out a lot of false positives though). Have not tested on a non-x86 host, but given the audit I did of global non-const variables (see commit message in patch 21), it should be OK. - Region-based allocation to maximize code_gen_buffer utilization. See patch 20. - Patches 1-8 are unrelated fixes, but I'm keeping them as part of this series to avoid merge headaches later on. - Performance-wise we get a 20% improvement when booting+shutting down debian-arm with MTTCG and -smp 8 (see patch 22). Not bad! This is due to not holding tb_lock during code translation, although the fact that we still have to take it after every translation remains a scalability issue. But before focusing on that, I'd like to get this reviewed. I broke down features as much as possible, so that we do not end up with a "per-thread TCG" megapatch. The series applies on top of the current master (b11365867568). Thanks, Emilio