Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-03 Thread Izik Eidus

Gerd Hoffmann wrote:

Izik Eidus wrote:
  

The main problem that ksm will face when removing the fd interface is:
right now when you register memory into ksm, you open fd, and then ksm
do get_task_mm(), we will do mmput when the file will be closed



Did you test whenever it really cleans up in case you kill -9 qemu?

I recently did something simliar with the result that the extra
reference hold on mm_struct prevented the process memory from being
zapped ...

cheers,
  Gerd
  

Did you use mmput() after you called get_task_mm() ???
get_task_mm() do nothing beside atomic_inc(mm-mm_users);

and mmput() do nothing beside dec this counter and check if no reference 
are available to this


Am i missing anything?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-03 Thread Gerd Hoffmann
Izik Eidus wrote:
 Gerd Hoffmann wrote:
 Did you test whenever it really cleans up in case you kill -9 qemu?

 I recently did something simliar with the result that the extra
 reference hold on mm_struct prevented the process memory from being
 zapped ...

 cheers,
   Gerd
   
 Did you use mmput() after you called get_task_mm() ???
 get_task_mm() do nothing beside atomic_inc(mm-mm_users);

mmput() call was in -release() callback, -release() in turn never was
called because the kernel didn't zap the mappings because of the
reference ...

The driver *also* created mappings which ksmctl doesn't, so it could be
you don't run into this issue.

cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-03 Thread Chris Wright
* Gerd Hoffmann (kra...@redhat.com) wrote:
 mmput() call was in -release() callback, -release() in turn never was
 called because the kernel didn't zap the mappings because of the
 reference ...

Don't have this issue.  That mmput() is not tied to zapping mappings,
rather zapping files.  IOW, I think you're saying exit_mmap() wasn't
running due to your get_task_mm() (quite understandable, you still hold
a ref), whereas this ref is tied to exit_files().

So do_exit would do:

exit_mm
  mmput -- not dropped yet
exit_files
  -release
mmput -- dropped here

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-02 Thread Izik Eidus

Chris Wright wrote:

* Anthony Liguori (anth...@codemonkey.ws) wrote:
  
Using an interface like madvise() would force the issue to be dealt with  
properly from the start :-)



Yeah, I'm not at all opposed to it.

This updates to madvise for register and sysfs for control.

madvise issues:
- MADV_SHAREABLE
  - register only ATM, can add MADV_UNSHAREABLE to allow an app to proactively
unregister, but need a cleanup when -mm goes away via exit/exec
  - will register a region per vma, should probably push the whole thing
into vma rather than keep [mm,addr,len] tuple in ksm

  

The main problem that ksm will face when removing the fd interface is:
right now when you register memory into ksm, you open fd, and then ksm 
do get_task_mm(), we will do mmput when the file will be closed
(note that this doesnt mean that if you fork and not close the fd the 
memory wont go away, get_task_mm() doesnt protect the vmas inside 
the mm strcture and therefore they will be able to get removed)


So if we move into madvice and we remove the get_task_mm() usage, we 
will have to add notification to exit_mm() so ksm will know it should 
stop using this mm strcture, and drop it from all the trees data...


Is this what we want?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-02 Thread Andrea Arcangeli
On Wed, Apr 01, 2009 at 10:31:14PM -0700, Chris Wright wrote:
   - register only ATM, can add MADV_UNSHAREABLE to allow an app to proactively
 unregister, but need a cleanup when -mm goes away via exit/exec

The unregister cleanup must happen at the vma level (with unregister
when vma goes away or is overwritten) for this to provide sane madvise
semantics (not just in exit/exec, but in unmap/mmap too). Otherwise
this is all but madvise. Basically we need a chunk of code in core VM
when KSM=y/m, even if we keep returning -EINVAL when KSM=n (for
backwards compatibility, -ENOSYS not). Example, vma must be split in
two if you MAP_SHARABLE only part of it etc...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-02 Thread Chris Wright
* Andrea Arcangeli (aarca...@redhat.com) wrote:
 On Wed, Apr 01, 2009 at 10:31:14PM -0700, Chris Wright wrote:
- register only ATM, can add MADV_UNSHAREABLE to allow an app to 
  proactively
  unregister, but need a cleanup when -mm goes away via exit/exec
 
 The unregister cleanup must happen at the vma level (with unregister
 when vma goes away or is overwritten) for this to provide sane madvise
 semantics (not just in exit/exec, but in unmap/mmap too). Otherwise
 this is all but madvise. Basically we need a chunk of code in core VM
 when KSM=y/m, even if we keep returning -EINVAL when KSM=n (for
 backwards compatibility, -ENOSYS not). Example, vma must be split in
 two if you MAP_SHARABLE only part of it etc...

Yes, of course.  I mentioned that (push whole thing into vma).
Current api is really at -mm level, it's vma agnostic.  Simply put:
watch for pages in this -mm between start and start+len and (more or
less regardless of the vma).  To do it purely at the vma level would
mean a vma unmap would cause the watch to go away.  So, question is...do
we need something in -mm as well (like mlockall)?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-02 Thread Chris Wright
* Izik Eidus (iei...@redhat.com) wrote:
 Is this what we want?

How about baby steps...

admit that ioctl to control plane is better done via sysfs?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-02 Thread Izik Eidus

Chris Wright wrote:

* Izik Eidus (iei...@redhat.com) wrote:
  

Is this what we want?



How about baby steps...

admit that ioctl to control plane is better done via sysfs?
  

Yes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/4] update ksm userspace interfaces

2009-04-01 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 Using an interface like madvise() would force the issue to be dealt with  
 properly from the start :-)

Yeah, I'm not at all opposed to it.

This updates to madvise for register and sysfs for control.

madvise issues:
- MADV_SHAREABLE
  - register only ATM, can add MADV_UNSHAREABLE to allow an app to proactively
unregister, but need a cleanup when -mm goes away via exit/exec
  - will register a region per vma, should probably push the whole thing
into vma rather than keep [mm,addr,len] tuple in ksm
sysfs issues:
- none really, i added a reporting mechanism for number of pages shared,
  doesn't decrement on COW
- could use some extra sanity checks

It compiles!  Diff output is hard to read, I can send a 4/4 w/ this
patch rolled in for easier review.

Signed-off-by: Chris Wright chr...@redhat.com
---
 include/asm-generic/mman.h |1 +
 include/linux/ksm.h|   63 +
 mm/ksm.c   |  352 
 mm/madvise.c   |   18 +++
 4 files changed, 149 insertions(+), 285 deletions(-)

diff --git a/include/asm-generic/mman.h b/include/asm-generic/mman.h
index 5e3dde2..a1c1d5c 100644
--- a/include/asm-generic/mman.h
+++ b/include/asm-generic/mman.h
@@ -34,6 +34,7 @@
 #define MADV_REMOVE9   /* remove these pages  resources */
 #define MADV_DONTFORK  10  /* don't inherit across fork */
 #define MADV_DOFORK11  /* do inherit across fork */
+#define MADV_SHAREABLE 12  /* can share identical pages */
 
 /* compatibility flags */
 #define MAP_FILE   0
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 5776dce..e032f6f 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -1,69 +1,8 @@
 #ifndef __LINUX_KSM_H
 #define __LINUX_KSM_H
 
-/*
- * Userspace interface for /dev/ksm - kvm shared memory
- */
-
-#include linux/types.h
-#include linux/ioctl.h
-
-#include asm/types.h
-
-#define KSM_API_VERSION 1
-
 #define ksm_control_flags_run 1
 
-/* for KSM_REGISTER_MEMORY_REGION */
-struct ksm_memory_region {
-   __u32 npages; /* number of pages to share */
-   __u32 pad;
-   __u64 addr; /* the begining of the virtual address */
-__u64 reserved_bits;
-};
-
-struct ksm_kthread_info {
-   __u32 sleep; /* number of microsecoends to sleep */
-   __u32 pages_to_scan; /* number of pages to scan */
-   __u32 flags; /* control flags */
-__u32 pad;
-__u64 reserved_bits;
-};
-
-#define KSMIO 0xAB
-
-/* ioctls for /dev/ksm */
-
-#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
-/*
- * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
- */
-#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd */
-/*
- * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed
- * (can stop the kernel thread from working by setting running = 0)
- */
-#define KSM_START_STOP_KTHREAD  _IOW(KSMIO,  0x02,\
- struct ksm_kthread_info)
-/*
- * KSM_GET_INFO_KTHREAD - return information about the kernel thread
- * scanning speed.
- */
-#define KSM_GET_INFO_KTHREAD_IOW(KSMIO,  0x03,\
- struct ksm_kthread_info)
-
-
-/* ioctls for SMA fds */
-
-/*
- * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
- * scanned by kvm.
- */
-#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
- struct ksm_memory_region)
-/*
- * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
- */
-#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
+long ksm_register_memory(struct vm_area_struct *, unsigned long, unsigned 
long);
 
 #endif
diff --git a/mm/ksm.c b/mm/ksm.c
index eba4c09..fcbf76e 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -17,7 +17,6 @@
 #include linux/errno.h
 #include linux/mm.h
 #include linux/fs.h
-#include linux/miscdevice.h
 #include linux/vmalloc.h
 #include linux/file.h
 #include linux/mman.h
@@ -38,6 +37,7 @@
 #include linux/rbtree.h
 #include linux/anon_inodes.h
 #include linux/ksm.h
+#include linux/kobject.h
 
 #include asm/tlbflush.h
 
@@ -55,20 +55,11 @@ MODULE_PARM_DESC(rmap_hash_size, Hash table size for the 
reverse mapping);
  */
 struct ksm_mem_slot {
struct list_head link;
-   struct list_head sma_link;
struct mm_struct *mm;
unsigned long addr; /* the begining of the virtual address */
unsigned npages;/* number of pages to share */
 };
 
-/*
- * ksm_sma - shared memory area, each process have its own sma that contain the
- * information about the slots that it own
- */
-struct ksm_sma {
-   struct list_head sma_slots;
-};
-
 /**
  * struct ksm_scan - cursor for scanning
  * @slot_index: the current slot we are scanning
@@ -190,6 +181,7 @@ static struct kmem_cache *rmap_item_cache;
 
 static int