On Wed, Oct 10, 2012 at 4:28 PM, Aurelien Jarno <aurel...@aurel32.net> wrote: > On Wed, Oct 10, 2012 at 03:21:48PM +0200, Laurent Desnogues wrote: >> On Tue, Oct 9, 2012 at 10:30 PM, Aurelien Jarno <aurel...@aurel32.net> wrote: >> > Use ldr pc, [pc, #-4] kind of branch for direct jump. This removes the >> > need to flush the icache on TB linking, and allow to remove the limit >> > on the code generation buffer. >> >> I'm not sure I like it. In general having data in the middle >> of code will increase I/D cache and I/D TLB pressure. > > Agreed. On the other hand, this patch remove the synchronization of > the instruction cache for TB linking/unlinking.
TB linking/unlinking should happen less often than code execution. >> > This improves the boot-up speed of a MIPS guest by 11%. >> >> Boot speed is very specific. Did you test some other code? >> Also what was your host? > > I tested it on a Cortex-A8 machine. I have only tested MIPS, but I can > do more tests, like running the openssl testsuite in the emulated guest. Yes, please. [...] > This doesn't really surprise me. The goal of the patch is to remove the > limit of 16MB for the generated code. I really doubt you reach such a > limit in user mode unless you use some complex code. > > On the other hand in system mode, this can be already reached once the > whole guest kernel is translated, so cached code is dropped and has to > be re-translated regularly. Re-translating guest code is clearly more > expensive than the increase of I/D cache and I/D TLB pressure. Ha yes, that's a real problem. What about having some define and/or runtime flag to keep both caches sync and your ldr PC change in QEMU? > The other way to allow more than 16MB of generated code would be to > disable direct jump on ARM. It adds one 32-bit constant loading + one > memory load, but then you don't have the I/D cache and TLB issue. The performance hit would be even worse :-) Laurent