Re: [PATCH 5/4] update ksm userspace interfaces
Gerd Hoffmann wrote: Izik Eidus wrote: The main problem that ksm will face when removing the fd interface is: right now when you register memory into ksm, you open fd, and then ksm do get_task_mm(), we will do mmput when the file will be closed Did you test whenever it really cleans up in case you kill -9 qemu? I recently did something simliar with the result that the extra reference hold on mm_struct prevented the process memory from being zapped ... cheers, Gerd Did you use mmput() after you called get_task_mm() ??? get_task_mm() do nothing beside atomic_inc(mm-mm_users); and mmput() do nothing beside dec this counter and check if no reference are available to this Am i missing anything? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
Izik Eidus wrote: Gerd Hoffmann wrote: Did you test whenever it really cleans up in case you kill -9 qemu? I recently did something simliar with the result that the extra reference hold on mm_struct prevented the process memory from being zapped ... cheers, Gerd Did you use mmput() after you called get_task_mm() ??? get_task_mm() do nothing beside atomic_inc(mm-mm_users); mmput() call was in -release() callback, -release() in turn never was called because the kernel didn't zap the mappings because of the reference ... The driver *also* created mappings which ksmctl doesn't, so it could be you don't run into this issue. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
* Gerd Hoffmann (kra...@redhat.com) wrote: mmput() call was in -release() callback, -release() in turn never was called because the kernel didn't zap the mappings because of the reference ... Don't have this issue. That mmput() is not tied to zapping mappings, rather zapping files. IOW, I think you're saying exit_mmap() wasn't running due to your get_task_mm() (quite understandable, you still hold a ref), whereas this ref is tied to exit_files(). So do_exit would do: exit_mm mmput -- not dropped yet exit_files -release mmput -- dropped here thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
Chris Wright wrote: * Anthony Liguori (anth...@codemonkey.ws) wrote: Using an interface like madvise() would force the issue to be dealt with properly from the start :-) Yeah, I'm not at all opposed to it. This updates to madvise for register and sysfs for control. madvise issues: - MADV_SHAREABLE - register only ATM, can add MADV_UNSHAREABLE to allow an app to proactively unregister, but need a cleanup when -mm goes away via exit/exec - will register a region per vma, should probably push the whole thing into vma rather than keep [mm,addr,len] tuple in ksm The main problem that ksm will face when removing the fd interface is: right now when you register memory into ksm, you open fd, and then ksm do get_task_mm(), we will do mmput when the file will be closed (note that this doesnt mean that if you fork and not close the fd the memory wont go away, get_task_mm() doesnt protect the vmas inside the mm strcture and therefore they will be able to get removed) So if we move into madvice and we remove the get_task_mm() usage, we will have to add notification to exit_mm() so ksm will know it should stop using this mm strcture, and drop it from all the trees data... Is this what we want? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
On Wed, Apr 01, 2009 at 10:31:14PM -0700, Chris Wright wrote: - register only ATM, can add MADV_UNSHAREABLE to allow an app to proactively unregister, but need a cleanup when -mm goes away via exit/exec The unregister cleanup must happen at the vma level (with unregister when vma goes away or is overwritten) for this to provide sane madvise semantics (not just in exit/exec, but in unmap/mmap too). Otherwise this is all but madvise. Basically we need a chunk of code in core VM when KSM=y/m, even if we keep returning -EINVAL when KSM=n (for backwards compatibility, -ENOSYS not). Example, vma must be split in two if you MAP_SHARABLE only part of it etc... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
* Andrea Arcangeli (aarca...@redhat.com) wrote: On Wed, Apr 01, 2009 at 10:31:14PM -0700, Chris Wright wrote: - register only ATM, can add MADV_UNSHAREABLE to allow an app to proactively unregister, but need a cleanup when -mm goes away via exit/exec The unregister cleanup must happen at the vma level (with unregister when vma goes away or is overwritten) for this to provide sane madvise semantics (not just in exit/exec, but in unmap/mmap too). Otherwise this is all but madvise. Basically we need a chunk of code in core VM when KSM=y/m, even if we keep returning -EINVAL when KSM=n (for backwards compatibility, -ENOSYS not). Example, vma must be split in two if you MAP_SHARABLE only part of it etc... Yes, of course. I mentioned that (push whole thing into vma). Current api is really at -mm level, it's vma agnostic. Simply put: watch for pages in this -mm between start and start+len and (more or less regardless of the vma). To do it purely at the vma level would mean a vma unmap would cause the watch to go away. So, question is...do we need something in -mm as well (like mlockall)? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
* Izik Eidus (iei...@redhat.com) wrote: Is this what we want? How about baby steps... admit that ioctl to control plane is better done via sysfs? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/4] update ksm userspace interfaces
Chris Wright wrote: * Izik Eidus (iei...@redhat.com) wrote: Is this what we want? How about baby steps... admit that ioctl to control plane is better done via sysfs? Yes -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/4] update ksm userspace interfaces
* Anthony Liguori (anth...@codemonkey.ws) wrote: Using an interface like madvise() would force the issue to be dealt with properly from the start :-) Yeah, I'm not at all opposed to it. This updates to madvise for register and sysfs for control. madvise issues: - MADV_SHAREABLE - register only ATM, can add MADV_UNSHAREABLE to allow an app to proactively unregister, but need a cleanup when -mm goes away via exit/exec - will register a region per vma, should probably push the whole thing into vma rather than keep [mm,addr,len] tuple in ksm sysfs issues: - none really, i added a reporting mechanism for number of pages shared, doesn't decrement on COW - could use some extra sanity checks It compiles! Diff output is hard to read, I can send a 4/4 w/ this patch rolled in for easier review. Signed-off-by: Chris Wright chr...@redhat.com --- include/asm-generic/mman.h |1 + include/linux/ksm.h| 63 + mm/ksm.c | 352 mm/madvise.c | 18 +++ 4 files changed, 149 insertions(+), 285 deletions(-) diff --git a/include/asm-generic/mman.h b/include/asm-generic/mman.h index 5e3dde2..a1c1d5c 100644 --- a/include/asm-generic/mman.h +++ b/include/asm-generic/mman.h @@ -34,6 +34,7 @@ #define MADV_REMOVE9 /* remove these pages resources */ #define MADV_DONTFORK 10 /* don't inherit across fork */ #define MADV_DOFORK11 /* do inherit across fork */ +#define MADV_SHAREABLE 12 /* can share identical pages */ /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 5776dce..e032f6f 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -1,69 +1,8 @@ #ifndef __LINUX_KSM_H #define __LINUX_KSM_H -/* - * Userspace interface for /dev/ksm - kvm shared memory - */ - -#include linux/types.h -#include linux/ioctl.h - -#include asm/types.h - -#define KSM_API_VERSION 1 - #define ksm_control_flags_run 1 -/* for KSM_REGISTER_MEMORY_REGION */ -struct ksm_memory_region { - __u32 npages; /* number of pages to share */ - __u32 pad; - __u64 addr; /* the begining of the virtual address */ -__u64 reserved_bits; -}; - -struct ksm_kthread_info { - __u32 sleep; /* number of microsecoends to sleep */ - __u32 pages_to_scan; /* number of pages to scan */ - __u32 flags; /* control flags */ -__u32 pad; -__u64 reserved_bits; -}; - -#define KSMIO 0xAB - -/* ioctls for /dev/ksm */ - -#define KSM_GET_API_VERSION _IO(KSMIO, 0x00) -/* - * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd - */ -#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO, 0x01) /* return SMA fd */ -/* - * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed - * (can stop the kernel thread from working by setting running = 0) - */ -#define KSM_START_STOP_KTHREAD _IOW(KSMIO, 0x02,\ - struct ksm_kthread_info) -/* - * KSM_GET_INFO_KTHREAD - return information about the kernel thread - * scanning speed. - */ -#define KSM_GET_INFO_KTHREAD_IOW(KSMIO, 0x03,\ - struct ksm_kthread_info) - - -/* ioctls for SMA fds */ - -/* - * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be - * scanned by kvm. - */ -#define KSM_REGISTER_MEMORY_REGION _IOW(KSMIO, 0x20,\ - struct ksm_memory_region) -/* - * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm. - */ -#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO, 0x21) +long ksm_register_memory(struct vm_area_struct *, unsigned long, unsigned long); #endif diff --git a/mm/ksm.c b/mm/ksm.c index eba4c09..fcbf76e 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -17,7 +17,6 @@ #include linux/errno.h #include linux/mm.h #include linux/fs.h -#include linux/miscdevice.h #include linux/vmalloc.h #include linux/file.h #include linux/mman.h @@ -38,6 +37,7 @@ #include linux/rbtree.h #include linux/anon_inodes.h #include linux/ksm.h +#include linux/kobject.h #include asm/tlbflush.h @@ -55,20 +55,11 @@ MODULE_PARM_DESC(rmap_hash_size, Hash table size for the reverse mapping); */ struct ksm_mem_slot { struct list_head link; - struct list_head sma_link; struct mm_struct *mm; unsigned long addr; /* the begining of the virtual address */ unsigned npages;/* number of pages to share */ }; -/* - * ksm_sma - shared memory area, each process have its own sma that contain the - * information about the slots that it own - */ -struct ksm_sma { - struct list_head sma_slots; -}; - /** * struct ksm_scan - cursor for scanning * @slot_index: the current slot we are scanning @@ -190,6 +181,7 @@ static struct kmem_cache *rmap_item_cache; static int