Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On Wed, Jan 05, 2011 at 03:27:37PM -0600, Michael Roth wrote: On 01/05/2011 02:35 PM, Andrea Arcangeli wrote: On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote: Yah you're right, but I've seen several discussions about using mempath for tmpfs/ram-backed files for things like numa/zram/etc so tend to think of it as something potentially more than just a hook for hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK stuff should still be immediately applicable. Yes, MADV_DONTFORK should be set all on all guest physical memory without options so I hope the new patch I just posted is fine to stop the spurious -ENOMEM failures in fork. The patch in this thread? A couple paths still aren't covered when using -mem-path. Something like this should get them all: Well the reason of MADV_DONTFORK is to avoid accounting issues with anonymous memory, mem-path don't have that issue as hugetlbfs skips the accounting (it has to because hugetlbfs are not always taken from the regular page allocator). It could be however considered a minor performance optimization. Now you mention that you want to use -mem-path for other things too, so maybe that's why you need it there too. BTW, if you ever need it for more than hugetlbfs, I'm afraid this MAP_PRIVATE I see when mem_prealloc isn't set, is going to screw any other potential useful usage besides hugetlbfs, not exactly sure why it makes any sense to use MAP_PRIVATE there and not only MAP_SHARED.
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 01/06/2011 11:49 AM, Andrea Arcangeli wrote: On Wed, Jan 05, 2011 at 03:27:37PM -0600, Michael Roth wrote: On 01/05/2011 02:35 PM, Andrea Arcangeli wrote: On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote: Yah you're right, but I've seen several discussions about using mempath for tmpfs/ram-backed files for things like numa/zram/etc so tend to think of it as something potentially more than just a hook for hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK stuff should still be immediately applicable. Yes, MADV_DONTFORK should be set all on all guest physical memory without options so I hope the new patch I just posted is fine to stop the spurious -ENOMEM failures in fork. The patch in this thread? A couple paths still aren't covered when using -mem-path. Something like this should get them all: Well the reason of MADV_DONTFORK is to avoid accounting issues with anonymous memory, mem-path don't have that issue as hugetlbfs skips the accounting (it has to because hugetlbfs are not always taken from the regular page allocator). It could be however considered a minor performance optimization. That's one case, but there's also a wonky fallback in that path that defaults to the normal qemu_vmalloc(): if (mem_path) { #if defined (__linux__) !defined(TARGET_S390X) new_block-host = file_ram_alloc(new_block, size, mem_path); if (!new_block-host) { new_block-host = qemu_vmalloc(size); qemu_madvise(new_block-host, size, QEMU_MADV_MERGEABLE); } May make sense to only add coverage for that specific case though. If file_ram_alloc() is generalized we could deal with it then. Now you mention that you want to use -mem-path for other things too, so maybe that's why you need it there too. BTW, if you ever need it for more than hugetlbfs, I'm afraid this MAP_PRIVATE I see when mem_prealloc isn't set, is going to screw any other potential useful usage besides hugetlbfs, not exactly sure why it makes any sense to use MAP_PRIVATE there and not only MAP_SHARED. Not sure either...but if it's another hugetlbfs-ism it shouldn't matter since using mem-path for something other than hugetlbfs would ideally be configurable with a -mem path=/dev/shm/vm0.mem,hugetlbfs=off or something along that line, which wouldn't necessarily set MAP_PRIVATE.
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
The bug is still there so I rediffed the old patch against current code. On a related topic: could somebody give me advice on how to implement a command line (command line seems enough, the other option would be monitor command) to make the MADV_MERGEABLE conditional? I got KSM on THP working fine but KSM may decrease performance by increasing the number of copy on write and by splitting hugepages, so we'd like to be able to turn off KSM on a per-VM basis (not on the whole host, which of course we already can by setting /sys/kernel/mm/ksm/run to 0) so that high perf VMs will keep running at maximum speed with KSM off but others may still benefit from KSM. For that I need to make the below MADV_MERGEABLE madvise conditional to something and the code itself will be trivial, we've just to converge on a command line option (hopefully quickly ;). == Subject: avoid allocation failures during fork/exec for migrate/hotplug From: Andrea Arcangeli aarca...@redhat.com Mark all guest physical memory as MADV_DONTFORK to avoid false positive allocation failures during fork in migrate/netdev hotplug. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- diff --git a/exec.c b/exec.c index db9ff55..9e2aa12 100644 --- a/exec.c +++ b/exec.c @@ -2851,6 +2851,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, new_block-host = qemu_vmalloc(size); #endif qemu_madvise(new_block-host, size, QEMU_MADV_MERGEABLE); + +/* no allocation failures during fork/exec for migrate/hotplug */ +qemu_madvise(new_block-host, size, QEMU_MADV_DONTFORK); } }
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: The bug is still there so I rediffed the old patch against current code. On a related topic: could somebody give me advice on how to implement a command line (command line seems enough, the other option would be monitor command) to make the MADV_MERGEABLE conditional? I got KSM on THP working fine but KSM may decrease performance by increasing the number of copy on write and by splitting hugepages, so we'd like to be able to turn off KSM on a per-VM basis (not on the whole host, which of course we already can by setting /sys/kernel/mm/ksm/run to 0) so that high perf VMs will keep running at maximum speed with KSM off but others may still benefit from KSM. For that I need to make the below MADV_MERGEABLE madvise conditional to something and the code itself will be trivial, we've just to converge on a command line option (hopefully quickly ;). There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. And for consistency you should probably make both your proposed changes for -mem-path'd memory as well.
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 05.01.2011, at 19:02, Michael Roth wrote: On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: The bug is still there so I rediffed the old patch against current code. On a related topic: could somebody give me advice on how to implement a command line (command line seems enough, the other option would be monitor command) to make the MADV_MERGEABLE conditional? I got KSM on THP working fine but KSM may decrease performance by increasing the number of copy on write and by splitting hugepages, so we'd like to be able to turn off KSM on a per-VM basis (not on the whole host, which of course we already can by setting /sys/kernel/mm/ksm/run to 0) so that high perf VMs will keep running at maximum speed with KSM off but others may still benefit from KSM. For that I need to make the below MADV_MERGEABLE madvise conditional to something and the code itself will be trivial, we've just to converge on a command line option (hopefully quickly ;). There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? -mem size=512,populate=on,ksm=off and default -m to something reasonable with the new syntax. Alex
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
Hello everyone, On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote: On 05.01.2011, at 19:02, Michael Roth wrote: On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: The bug is still there so I rediffed the old patch against current code. On a related topic: could somebody give me advice on how to implement a command line (command line seems enough, the other option would be monitor command) to make the MADV_MERGEABLE conditional? I got KSM on THP working fine but KSM may decrease performance by increasing the number of copy on write and by splitting hugepages, so we'd like to be able to turn off KSM on a per-VM basis (not on the whole host, which of course we already can by setting /sys/kernel/mm/ksm/run to 0) so that high perf VMs will keep running at maximum speed with KSM off but others may still benefit from KSM. For that I need to make the below MADV_MERGEABLE madvise conditional to something and the code itself will be trivial, we've just to converge on a command line option (hopefully quickly ;). There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? -mem size=512,populate=on,ksm=off and default -m to something reasonable with the new syntax. I'm neutral... so feel free to decide what I should implement ;). One comment on combining -m ksm=off (or -mem_nomerge) with -mem-path. It seems unnecessary because ksm can't be turned on on VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only makes sense if used in combination with hugetlbfs (which sets VM_HUGETLB of course).
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 05.01.2011, at 20:54, Andrea Arcangeli wrote: Hello everyone, On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote: On 05.01.2011, at 19:02, Michael Roth wrote: On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: The bug is still there so I rediffed the old patch against current code. On a related topic: could somebody give me advice on how to implement a command line (command line seems enough, the other option would be monitor command) to make the MADV_MERGEABLE conditional? I got KSM on THP working fine but KSM may decrease performance by increasing the number of copy on write and by splitting hugepages, so we'd like to be able to turn off KSM on a per-VM basis (not on the whole host, which of course we already can by setting /sys/kernel/mm/ksm/run to 0) so that high perf VMs will keep running at maximum speed with KSM off but others may still benefit from KSM. For that I need to make the below MADV_MERGEABLE madvise conditional to something and the code itself will be trivial, we've just to converge on a command line option (hopefully quickly ;). There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? -mem size=512,populate=on,ksm=off and default -m to something reasonable with the new syntax. I'm neutral... so feel free to decide what I should implement ;). One comment on combining -m ksm=off (or -mem_nomerge) with -mem-path. It seems unnecessary because ksm can't be turned on on VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only makes sense if used in combination with hugetlbfs (which sets VM_HUGETLB of course). Sure, not all combinations make sense. But -mem size=1G,path=/dev/shm/vm1.ram,populate=on would make sense, no? TPH should go along the same lines here too. It'd just be a flag tph that defaults to on if available. That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though. Alex
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 01/05/2011 01:44 PM, Alexander Graf wrote: On 05.01.2011, at 19:02, Michael Roth wrote: On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: The bug is still there so I rediffed the old patch against current code. On a related topic: could somebody give me advice on how to implement a command line (command line seems enough, the other option would be monitor command) to make the MADV_MERGEABLE conditional? I got KSM on THP working fine but KSM may decrease performance by increasing the number of copy on write and by splitting hugepages, so we'd like to be able to turn off KSM on a per-VM basis (not on the whole host, which of course we already can by setting /sys/kernel/mm/ksm/run to 0) so that high perf VMs will keep running at maximum speed with KSM off but others may still benefit from KSM. For that I need to make the below MADV_MERGEABLE madvise conditional to something and the code itself will be trivial, we've just to converge on a command line option (hopefully quickly ;). There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? -mem size=512,populate=on,ksm=off and default -m to something reasonable with the new syntax. Yeah, that does make sense. Maybe we should consider folding it in with the -numa option too? Regards, Anthony Liguori Alex
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 01/05/2011 02:00 PM, Alexander Graf wrote: On 05.01.2011, at 20:54, Andrea Arcangeli wrote: Hello everyone, On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote: On 05.01.2011, at 19:02, Michael Roth wrote: On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: The bug is still there so I rediffed the old patch against current code. On a related topic: could somebody give me advice on how to implement a command line (command line seems enough, the other option would be monitor command) to make the MADV_MERGEABLE conditional? I got KSM on THP working fine but KSM may decrease performance by increasing the number of copy on write and by splitting hugepages, so we'd like to be able to turn off KSM on a per-VM basis (not on the whole host, which of course we already can by setting /sys/kernel/mm/ksm/run to 0) so that high perf VMs will keep running at maximum speed with KSM off but others may still benefit from KSM. For that I need to make the below MADV_MERGEABLE madvise conditional to something and the code itself will be trivial, we've just to converge on a command line option (hopefully quickly ;). There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? -mem size=512,populate=on,ksm=off and default -m to something reasonable with the new syntax. I'm neutral... so feel free to decide what I should implement ;). One comment on combining -m ksm=off (or -mem_nomerge) with -mem-path. It seems unnecessary because ksm can't be turned on on VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only makes sense if used in combination with hugetlbfs (which sets VM_HUGETLB of course). Sure, not all combinations make sense. But -mem size=1G,path=/dev/shm/vm1.ram,populate=on would make sense, no? TPH should go along the same lines here too. It'd just be a flag tph that defaults to on if available. That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though. Where it's really helpful is in the ever elusive configuration file format. -mem becomes: [mem] size=1G path=/dev/shm/vm1.ram populate=on Which is nice from a grouping perspective. Regards, Anthony Liguori Alex
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On Wed, Jan 05, 2011 at 09:00:45PM +0100, Alexander Graf wrote: Sure, not all combinations make sense. But -mem size=1G,path=/dev/shm/vm1.ram,populate=on would make sense, no? TPH I was referring to Michael's suggestion, sure many options make sense for mem-path too, in fact populate makes more sense for mem-path (considering if hugetlbfs allocation fails guest may die if libhugetlbfs isn't trapping the -ENOMEM and doing munmap creating hole and replacing hole with regular anoymous memory). should go along the same lines here too. It'd just be a flag tph that defaults to on if available. THP is already default on (thp only requires us to 2m align the virtual address of guest physical memory which we should do by default even for qemu no-kvm). We can add a thp=off switch on the same lines of ksm=off. That way we could also do all the sanity checks in a single place. I really like the idea of combining memory management command line parameters into a single option :). In the end I'd assume it's Anthony's call though. Ok, let me know ;). Sounds like a lot of trouble compared to -mem_nomerge, but I agree it may be worth it for the long term. Thanks, Andrea
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 01/05/2011 01:54 PM, Andrea Arcangeli wrote: Hello everyone, On Wed, Jan 05, 2011 at 08:44:38PM +0100, Alexander Graf wrote: On 05.01.2011, at 19:02, Michael Roth wrote: On 01/05/2011 09:10 AM, Andrea Arcangeli wrote: The bug is still there so I rediffed the old patch against current code. On a related topic: could somebody give me advice on how to implement a command line (command line seems enough, the other option would be monitor command) to make the MADV_MERGEABLE conditional? I got KSM on THP working fine but KSM may decrease performance by increasing the number of copy on write and by splitting hugepages, so we'd like to be able to turn off KSM on a per-VM basis (not on the whole host, which of course we already can by setting /sys/kernel/mm/ksm/run to 0) so that high perf VMs will keep running at maximum speed with KSM off but others may still benefit from KSM. For that I need to make the below MADV_MERGEABLE madvise conditional to something and the code itself will be trivial, we've just to converge on a command line option (hopefully quickly ;). There was a -mem_prealloc option added a while back to set MAP_POPULATE on memory mapped in via the -mem-path option. So an analogous -mem_nomerge option or something along that line seems reasonable for conditionally unsetting QEMU_MADV_MERGEABLE. And for consistency you should probably make both your proposed changes for -mem-path'd memory as well. Why not clean up all that mess and introduce a new -mem option that would just take all of the several options as parameters? -mem size=512,populate=on,ksm=off and default -m to something reasonable with the new syntax. I'm neutral... so feel free to decide what I should implement ;). Have to agree the consolidated approach definitely seems nicer. One comment on combining -m ksm=off (or -mem_nomerge) with -mem-path. It seems unnecessary because ksm can't be turned on on VM_HUGETLB vmas (MADV_MERGEABLE will return -EINVAL) and mem-path only makes sense if used in combination with hugetlbfs (which sets VM_HUGETLB of course). Yah you're right, but I've seen several discussions about using mempath for tmpfs/ram-backed files for things like numa/zram/etc so tend to think of it as something potentially more than just a hook for hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK stuff should still be immediately applicable.
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote: Yah you're right, but I've seen several discussions about using mempath for tmpfs/ram-backed files for things like numa/zram/etc so tend to think of it as something potentially more than just a hook for hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK stuff should still be immediately applicable. Yes, MADV_DONTFORK should be set all on all guest physical memory without options so I hope the new patch I just posted is fine to stop the spurious -ENOMEM failures in fork. The ksm part should go incremental but 99% of it will be about changing the command line, making the madvise conditional will be trivial as well.
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 01/05/2011 02:35 PM, Andrea Arcangeli wrote: On Wed, Jan 05, 2011 at 02:26:19PM -0600, Michael Roth wrote: Yah you're right, but I've seen several discussions about using mempath for tmpfs/ram-backed files for things like numa/zram/etc so tend to think of it as something potentially more than just a hook for hugetlbfs, which is becoming less and less useful. But the MADV_DONTFORK stuff should still be immediately applicable. Yes, MADV_DONTFORK should be set all on all guest physical memory without options so I hope the new patch I just posted is fine to stop the spurious -ENOMEM failures in fork. The patch in this thread? A couple paths still aren't covered when using -mem-path. Something like this should get them all: diff --git a/exec.c b/exec.c index 49c28b1..cbdcb16 100644 --- a/exec.c +++ b/exec.c @@ -2853,6 +2853,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, #endif qemu_madvise(new_block-host, size, QEMU_MADV_MERGEABLE); } + +/* no allocation failures during fork/exec for migrate/hotplug */ +qemu_madvise(new_block-host, size, QEMU_MADV_DONTFORK); }
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
On 15.09.2010, at 19:08, Andrea Arcangeli wrote: From: Andrea Arcangeli aarca...@redhat.com All allocated guest physical memory shall be marked MADV_DONTFORK, otherwise fork will fail because of accounting issues preventing migration or netdev_add when the guest allocated more than half of host physical memory. Signed-off-by: Andrea Arcangeli aarca...@redhat.com The madvise logic is just about to get changed now. Please look at Andreas' patches and coordinate with him there. Alex
Re: [Qemu-devel] [PATCH] add MADV_DONTFORK to guest physical memory
Am 15.09.2010 um 19:08 schrieb Andrea Arcangeli: From: Andrea Arcangeli aarca...@redhat.com All allocated guest physical memory shall be marked MADV_DONTFORK, otherwise fork will fail because of accounting issues preventing migration or netdev_add when the guest allocated more than half of host physical memory. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- diff --git a/exec.c b/exec.c index 380dab5..e2bdf19 100644 --- a/exec.c +++ b/exec.c @@ -2861,6 +2861,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, #ifdef MADV_MERGEABLE madvise(new_block-host, size, MADV_MERGEABLE); #endif +#ifdef MADV_DONTFORK +madvise(new_block-host, size, MADV_DONTFORK); +#endif There is a pending patch which introduces qemu_madvise() and will eliminate the need for these #ifdefs, which was demanded by Blue. I'll cc you on v6. Regards, Andreas