I went ahead and created a short repro case which can be found at https://github.com/gtoubassi/qemu-spinrepro. Would appreciate thoughts from anyone or guidance on how to debug.
On Tue, Oct 19, 2021 at 3:05 PM Garrick Toubassi <gtouba...@gmail.com> wrote: > Hello > > I have a mystery I haven't been able to run down and would appreciate any > explanation or advice. > > On a mac/intel I am running qemu-system-x86_64 on a simple image which > bootstraps into 64 bit long mode and then runs a simple spin loop > (literally for (int i = 0; i < 10000000; i++) {}). This completes in ~5 > seconds of wall time. After completion it then enters user mode (CPL=3) > via a fabricated interrupt stack frame and an iretq, returning to the same > spin loop. In this case it runs about 100x faster. > > I at first thought maybe the TCG jit somehow isn't kicking in and maybe > there is some pure interpretation going on but I've run with "-trace > exec_tb -trace translate_block -d out_asm,guest_errors,nochain,int,plugin" > and it seems to be running "translation blocks", just a lot more of them > when running the slow loop (or to be more precise running one tb many more > times according to exec_tb logging). Upon inspection the relevant > generated assembly is morally equivalent between the two as best I can > tell. Which implies to me its something outside of the tb. I was thinking > perhaps its regenerating the code every time, but logging doesn't show that. > > I also was wondering if something about the MMU implementation might slow > things down when in user mode? In this case both loops are running under > the same GDT/page table which just happens to mark all pages as "user" > pages so that when jumping to CPL=3 it will still run. > > I can package up a reproducible case if it's helpful but wanted to see if > there is something obvious I am missing in terms of expected behavior > before doing that. > > Thanks! > > gt > > > >