On Fri, Jun 12, 2015 at 01:56:37PM +0200, Christian Borntraeger wrote: > Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin: > > On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote: > >> On 09/06/15 13:37, Christian Borntraeger wrote: > >>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev: > >>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer, > >>>> when Linux is under severe memory pressure. Various mechanisms are > >>>> responsible for correct virtio_balloon memory management. Nevertheless it > >>>> is often the case that these control tools does not have enough time to > >>>> react on fast changing memory load. As a result OS runs out of memory and > >>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon > >>>> should not cause the termination of processes while there are pages in > >>>> the > >>>> balloon. Now there is no way for virtio balloon driver to free memory at > >>>> the last moment before some process get killed by OOM-killer. > >>>> > >>>> This does not provide a security breach as balloon itself is running > >>>> inside Guest OS and is working in the cooperation with the host. Thus > >>>> some improvements from Guest side should be considered as normal. > >>>> > >>>> To solve the problem, introduce a virtio_balloon callback which is > >>>> expected to be called from the oom notifier call chain in out_of_memory() > >>>> function. If virtio balloon could release some memory, it will make the > >>>> system return and retry the allocation that forced the out of memory > >>>> killer to run. > >>>> > >>>> This behavior should be enabled if and only if appropriate feature bit > >>>> is set on the device. It is off by default. > >>> The balloon frees pages in this way > >>> > >>> static void balloon_page(void *addr, int deflate) > >>> { > >>> #if defined(__linux__) > >>> if (!kvm_enabled() || kvm_has_sync_mmu()) > >>> qemu_madvise(addr, TARGET_PAGE_SIZE, > >>> deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED); > >>> #endif > >>> } > >>> > >>> The guest can re-touch that page and get a empty zero or the old page > >>> back without > >>> tampering the host integrity. This should work for all cases I am aware > >>> of (without sync_mmu its a nop anyway) so why not enable that by default? > >>> Anything that I missed? > >>> > >>> Christian > >> > >> I'd like to do that :) Actually original version of kernel patch > >> has enabled this unconditionally. But Michael asked to make > >> it configurable and off by default. > >> > >> Den > > > > That's not the question here. The question is why is it limited by > > kvm_has_sync_mmu. > > Well we have two interesting options here: > > VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM > > For any sane host with ondemand paging just re-accessing the page > should simply work. So the common case could be > VIRTIO_BALLOON_F_MUST_TELL_HOST == off
Disabling this breaks useful optimizations such as ability not to migrate memory in the balloon. > VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on AFAIK management tools depend on balloon not deflating below host-specified threshold to avoid OOM on the host. So I don't think we can make this a default, management needs to enable this explicitly. > Only for the rare case of hypervisors without paging or other memory > related restrictions we have to enable MUST_TELL_HOST. > Now: QEMU knows exactly which case we have, so why not let QEMU tell > the guest what the capabilities are. (e.g. sync_mmu ---> no need to > tell the host). > > I can at least imaging that some admin wants to make the the oom case > configurable, but a sane default seems to be to not kill random > guest processes. > > Christian -- MST