[Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time

Launchpad Bug Tracker Mon, 09 Apr 2018 11:22:13 -0700

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu6

---------------
qemu (1:2.11+dfsg-1ubuntu6) bionic; urgency=medium


  * Remove LP: 1752026 changes to d/p/ubuntu/define-ubuntu-machine-types.patch.
    The Kernel fixes are preferred and already committed to the kernel.
    Therefore remove the default disabling of the HTM feature (LP: #1761175)
  * d/p/ubuntu/lp1739665-SSE-AVX-AVX512-cpu-features.patch: Enable new
    SSE/AVX/AVX512 cpu features (LP: #1739665)
  * d/p/ubuntu/lp1740219-continuous-space-commpage.patch: make Arm
    space+commpage continuous which avoids long startup times on
    qemu-user-static (LP: #1740219)
  * d/p/ubuntu/lp-1761372-*: provide pseries-bionic-2.11-sxxm type as
    convenience with all meltdown/spectre workarounds enabled by default.
    This is not the default type following upstream and x86 on that.
    (LP: #1761372).
  * d/p/ubuntu/lp-1704312-1-* provide means to manually handle filesystem-dax
    with pmem by backporting align and unarmed options (LP: #1704312).
  * d/p/ubuntu/lp-1762315-slirp-Add-domainname.patch: slirp: Add domainname
    option to slirp's DHCP server (LP: #1762315)

 -- Christian Ehrhardt <christian.ehrha...@canonical.com>  Wed, 04 Apr
2018 15:16:07 +0200

** Changed in: qemu (Ubuntu)
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Fix Released

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

[Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time

Reply via email to