Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-18 Thread Andrea Arcangeli
On Tue, Apr 14, 2009 at 03:09:29PM -0700, Andrew Morton wrote:
 We need a comment here explaining why we can't use the much preferable
 lock_page().
 
 Why can't we use the much preferable lock_page()?

We might but then it'd risk to waste time waiting. It's not worth
waiting, we want kksmd to be allowed to keep one (in future more than
one as we scale it smp/numa) CPU busy at all times running memcmp and
not schedule (other than for need_resched()) to try to free memory at
the fastest peace possible.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-16 Thread Andrea Arcangeli
On Wed, Apr 15, 2009 at 05:43:03PM -0700, Jeremy Fitzhardinge wrote:
 Shouldn't that be kmap_atomic's job anyway?  Otherwise it would be hard to 

No because those are full noops in no-highmem kernels. I commented in
other email why I think it's safe thanks to the wrprotect + smp tlb
flush of the userland PTE.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-16 Thread Jeremy Fitzhardinge

Andrea Arcangeli wrote:

On Wed, Apr 15, 2009 at 05:43:03PM -0700, Jeremy Fitzhardinge wrote:
  
Shouldn't that be kmap_atomic's job anyway?  Otherwise it would be hard to 



No because those are full noops in no-highmem kernels. I commented in
other email why I think it's safe thanks to the wrprotect + smp tlb
flush of the userland PTE.
  


I think Andrew's query was about data cache synchronization in 
architectures with virtually indexed d-cache.  On x86 it's a non-issue, 
but on architectures for which it is an issue, I assume kmap_atomic does 
any necessary cache flushes, as it does tlb flushes on x86 (which may be 
none at all, if no mapping actually happens).


   J

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-15 Thread Izik Eidus

Andrew Morton wrote:

On Thu,  9 Apr 2009 06:58:41 +0300
Izik Eidus iei...@redhat.com wrote:

  


Confused.  In the covering email you indicated that v2 of the patchset
had abandoned ioctls and had moved the interface to sysfs.
  
We have abandoned the ioctls that control the ksm behavior (how much cpu 
it take, how much kernel pages it may allocate and so on...)
But we still use ioctls to register the application memory to be used 
with ksm.



It would be good to completely (and briefly) describe KSM's proposed
userspace intefaces in the changelog or somewhere.  I'm a bit confused.
  


I will post new clean description for the ksm api with V4.




  


+static pte_t *get_pte(struct mm_struct *mm, unsigned long addr)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *ptep = NULL;
+
+   pgd = pgd_offset(mm, addr);
+   if (!pgd_present(*pgd))
+   goto out;
+
+   pud = pud_offset(pgd, addr);
+   if (!pud_present(*pud))
+   goto out;
+
+   pmd = pmd_offset(pud, addr);
+   if (!pmd_present(*pmd))
+   goto out;
+
+   ptep = pte_offset_map(pmd, addr);
+out:
+   return ptep;
+}



hm, this looks very generic.  Does it duplicate anything which core
kernel already provides? 


I dont think so.


 If not, perhaps core kernel should provide
this (perhaps after some reorganisation).
  


Quick grep on the code show me at least 2 places that can use this function
one is:
remove_migration_pte() inside migrate.c
and the other is:
page_check_address() inside rmap.c

I will post with V4 an inline get_ptep() function, worst case i will get 
nacked.


  

...

+static int rmap_hash_init(void)
+{
+   if (!rmap_hash_size) {
+   struct sysinfo sinfo;
+
+   si_meminfo(sinfo);
+   rmap_hash_size = sinfo.totalram / 10;



One slot per ten pages of physical memory?  Is this too large, too
small or just right?
  


Highly depend on the number of processes / memory regions that will be 
registered inside ksm

It is a module parameter and so user can change it to how much it want.

  

+   }
+   nrmaps_hash = rmap_hash_size;
+   rmap_hash = vmalloc(nrmaps_hash * sizeof(struct hlist_head));
+   if (!rmap_hash)
+   return -ENOMEM;
+   memset(rmap_hash, 0, nrmaps_hash * sizeof(struct hlist_head));
+   return 0;
+}
+

...

+static void break_cow(struct mm_struct *mm, unsigned long addr)
+{
+   struct page *page[1];
+
+   down_read(mm-mmap_sem);
+   if (get_user_pages(current, mm, addr, 1, 1, 0, page, NULL)) {
+   put_page(page[0]);
+   }
+   up_read(mm-mmap_sem);
+}



- unneeded brakes around single statement

- that single statement is over-indented.

- and it seems wrong.  If get_user_pages() returned, say, -ENOMEM, we
  end up doing put_page(random-uninitialised-address-from-stack-go-oops)?
  


Good catch.

  

...

+static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
+   struct ksm_memory_region *mem)
+{
+   struct ksm_mem_slot *slot;
+   int ret = -EPERM;
+
+   slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
+   if (!slot) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   slot-mm = get_task_mm(current);
+   if (!slot-mm)
+   goto out_free;
+   slot-addr = mem-addr;
+   slot-npages = mem-npages;
+
+   down_write(slots_lock);
+
+   list_add_tail(slot-link, slots);
+   list_add_tail(slot-sma_link, ksm_sma-sma_slots);
+
+   up_write(slots_lock);
+   return 0;
+
+out_free:
+   kfree(slot);
+out:
+   return ret;
+}



So this function pins the mm_struct.  I wonder what the implications of
this are. 


The mm struct wont go away until the file will be closed... (Application 
close the file descriptor, or the Application die)



 Not much, I guess.  Some comments in the code which explain
the object lifecycles would be nice.

  


...

+static int memcmp_pages(struct page *page1, struct page *page2)
+{
+   char *addr1, *addr2;
+   int r;
+
+   addr1 = kmap_atomic(page1, KM_USER0);
+   addr2 = kmap_atomic(page2, KM_USER1);
+   r = memcmp(addr1, addr2, PAGE_SIZE);
+   kunmap_atomic(addr1, KM_USER0);
+   kunmap_atomic(addr2, KM_USER1);
+   return r;
+}



I wonder if this code all does enough cpu cache flushing to be able to
guarantee that it's looking at valid data.  Not my area, and presumably
not an issue on x86.
  


Andrea pointed in previous reply that due to the fact that we are 
running page_wrprotect() on this pages memcmp_pages should be stable.


  

...

+static int try_to_merge_one_page(struct mm_struct *mm,
+struct vm_area_struct *vma,
+struct page *oldpage,
+struct page *newpage,
+

Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-15 Thread Andrew Morton
On Thu, 16 Apr 2009 01:37:25 +0300
Izik Eidus iei...@redhat.com wrote:

 Andrew Morton wrote:
  On Thu,  9 Apr 2009 06:58:41 +0300
  Izik Eidus iei...@redhat.com wrote:
 

 
  Confused.  In the covering email you indicated that v2 of the patchset
  had abandoned ioctls and had moved the interface to sysfs.

 We have abandoned the ioctls that control the ksm behavior (how much cpu 
 it take, how much kernel pages it may allocate and so on...)
 But we still use ioctls to register the application memory to be used 
 with ksm.

hm. ioctls make kernel people weep and gnash teeth.

An appropriate interface would be to add new syscalls.  But as ksm is
an optional thing and can even be modprobed, that doesn't work.  And
having a driver in mm/ which can be modprobed is kinda neat.

I can't immediately think of a nicer interface.  You could always poke
numbers into some pseudo-file but to me that seems as ugly, or uglier
than an ioctl (others seem to disagee).

Ho hum.  Please design the ioctl interface so that it doesn't need any
compat handling if poss.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-15 Thread Andrea Arcangeli
On Wed, Apr 15, 2009 at 03:50:58PM -0700, Andrew Morton wrote:
 an optional thing and can even be modprobed, that doesn't work.  And
 having a driver in mm/ which can be modprobed is kinda neat.

Agreed. I think madvise with all its vma split requirements and
ksm-unregistering invoked at vma destruction time (under CONFIG_KSM ||
CONFIG_KSM_MODULE) is clean approach only if ksm is considered a piece
of the core kernel VM. As long as only certain users out there use ksm
(i.e. only virtualization servers and LHC computations) the pseduochar
ioctl interface keeps it out of the kernel, so core kernel MM API
remains almost unaffected by ksm.

It's kinda neat it's external as self-contained module, but the whole
point is that to be self-contained it has to use ioctl.

Another thing is that madvise usually doesn't require mangling sysfs
to be effective. madvise without enabling ksm with sysfs would be
entirely useless. So doing it as madvise that returns success and has
no effect unless 'root' does something, is kind of weird.

Thinking about the absolute worst case: if this really turns out to be
wrong decision, simply /dev/ksm won't exist anymore and no app could
ever break as they will graceful handle the missing pseudochar. They
won't run the ioctl and just continue like if ksm.ko wasn't loaded. As
there are only a few (but critically important) apps using KSM,
converting them to fallback on madvise is a few liner trivial change
(kvm-userland will have 10 more lines to keep opening /dev/ksm before
calling madvise if we ever later decide KSM has to become a VM core
kernel functionality with madvise or its own per-arch syscall).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-15 Thread Jeremy Fitzhardinge

Andrew Morton wrote:

+static pte_t *get_pte(struct mm_struct *mm, unsigned long addr)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *ptep = NULL;
+
+   pgd = pgd_offset(mm, addr);
+   if (!pgd_present(*pgd))
+   goto out;
+
+   pud = pud_offset(pgd, addr);
+   if (!pud_present(*pud))
+   goto out;
+
+   pmd = pmd_offset(pud, addr);
+   if (!pmd_present(*pmd))
+   goto out;
+
+   ptep = pte_offset_map(pmd, addr);
+out:
+   return ptep;
+}



hm, this looks very generic.  Does it duplicate anything which core
kernel already provides?  If not, perhaps core kernel should provide
this (perhaps after some reorganisation).
  


It is lookup_address() which works on user addresses, and as such is 
very useful.  But it would need to deal with returning a level so it can 
deal with large pages in usermode, and have some well-defined semantics 
on whether the caller is responsible for unmapping the returned thing 
(ie, only if its a pte).


I implemented this myself a couple of months ago, but I can't find it 
anywhere...



+static int memcmp_pages(struct page *page1, struct page *page2)
+{
+   char *addr1, *addr2;
+   int r;
+
+   addr1 = kmap_atomic(page1, KM_USER0);
+   addr2 = kmap_atomic(page2, KM_USER1);
+   r = memcmp(addr1, addr2, PAGE_SIZE);
+   kunmap_atomic(addr1, KM_USER0);
+   kunmap_atomic(addr2, KM_USER1);
+   return r;
+}



I wonder if this code all does enough cpu cache flushing to be able to
guarantee that it's looking at valid data.  Not my area, and presumably
not an issue on x86.
  


Shouldn't that be kmap_atomic's job anyway?  Otherwise it would be hard 
to use on any virtual-tag/indexed cache machine.


   J
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-15 Thread Izik Eidus

Jeremy Fitzhardinge wrote:

Andrew Morton wrote:

+static pte_t *get_pte(struct mm_struct *mm, unsigned long addr)
+{
+pgd_t *pgd;
+pud_t *pud;
+pmd_t *pmd;
+pte_t *ptep = NULL;
+
+pgd = pgd_offset(mm, addr);
+if (!pgd_present(*pgd))
+goto out;
+
+pud = pud_offset(pgd, addr);
+if (!pud_present(*pud))
+goto out;
+
+pmd = pmd_offset(pud, addr);
+if (!pmd_present(*pmd))
+goto out;
+
+ptep = pte_offset_map(pmd, addr);
+out:
+return ptep;
+}



hm, this looks very generic.  Does it duplicate anything which core
kernel already provides?  If not, perhaps core kernel should provide
this (perhaps after some reorganisation).
  


It is lookup_address() which works on user addresses, and as such is 
very useful.  


But ksm need the pgd offset of an mm struct, not the kernel pgd, so 
maybe changing it to get the pgd offset would be nice..


Another thing it is just for x86 right now, so probably it need to go 
out to the common code


But it would need to deal with returning a level so it can deal with 
large pages in usermode, and have some well-defined semantics on 
whether the caller is responsible for unmapping the returned thing 
(ie, only if its a pte).


I implemented this myself a couple of months ago, but I can't find it 
anywhere...



+static int memcmp_pages(struct page *page1, struct page *page2)
+{
+char *addr1, *addr2;
+int r;
+
+addr1 = kmap_atomic(page1, KM_USER0);
+addr2 = kmap_atomic(page2, KM_USER1);
+r = memcmp(addr1, addr2, PAGE_SIZE);
+kunmap_atomic(addr1, KM_USER0);
+kunmap_atomic(addr2, KM_USER1);
+return r;
+}



I wonder if this code all does enough cpu cache flushing to be able to
guarantee that it's looking at valid data.  Not my area, and presumably
not an issue on x86.
  


Shouldn't that be kmap_atomic's job anyway?  Otherwise it would be 
hard to use on any virtual-tag/indexed cache machine.


   J


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-14 Thread Andrew Morton
On Thu,  9 Apr 2009 06:58:41 +0300
Izik Eidus iei...@redhat.com wrote:

 Ksm is driver that allow merging identical pages between one or more
 applications in way unvisible to the application that use it.
 Pages that are merged are marked as readonly and are COWed when any
 application try to change them.
 
 Ksm is used for cases where using fork() is not suitable,
 one of this cases is where the pages of the application keep changing
 dynamicly and the application cannot know in advance what pages are
 going to be identical.
 
 Ksm works by walking over the memory pages of the applications it
 scan in order to find identical pages.
 It uses a two sorted data strctures called stable and unstable trees
 to find in effective way the identical pages.
 
 When ksm finds two identical pages, it marks them as readonly and merges
 them into single one page,
 after the pages are marked as readonly and merged into one page, linux
 will treat this pages as normal copy_on_write pages and will fork them
 when write access will happen to them.
 
 Ksm scan just memory areas that were registred to be scanned by it.
 
 Ksm api:
 
 KSM_GET_API_VERSION:
 Give the userspace the api version of the module.
 
 KSM_CREATE_SHARED_MEMORY_AREA:
 Create shared memory reagion fd, that latter allow the user to register
 the memory region to scan by using:
 KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION
 
 KSM_REGISTER_MEMORY_REGION:
 Register userspace virtual address range to be scanned by ksm.
 This ioctl is using the ksm_memory_region structure:
 ksm_memory_region:
 __u32 npages;
  number of pages to share inside this memory region.
 __u32 pad;
 __u64 addr:
 the begining of the virtual address of this region.
 __u64 reserved_bits;
 reserved bits for future usage.
 
 KSM_REMOVE_MEMORY_REGION:
 Remove memory region from ksm.
 

 ...

 +/* ioctls for /dev/ksm */

Confused.  In the covering email you indicated that v2 of the patchset
had abandoned ioctls and had moved the interface to sysfs.

It would be good to completely (and briefly) describe KSM's proposed
userspace intefaces in the changelog or somewhere.  I'm a bit confused.


 ...

 +/*
 + * slots_lock protect against removing and adding memory regions while a 
 scanner
 + * is in the middle of scanning.
 + */

protects

 +static DECLARE_RWSEM(slots_lock);
 +
 +/* The stable and unstable trees heads. */
 +struct rb_root root_stable_tree = RB_ROOT;
 +struct rb_root root_unstable_tree = RB_ROOT;
 +
 +
 +/* The number of linked list members inside the hash table */
 +static int nrmaps_hash;

A signed type doesn't seem appropriate.

 +/* rmap_hash hash table */
 +static struct hlist_head *rmap_hash;
 +
 +static struct kmem_cache *tree_item_cache;
 +static struct kmem_cache *rmap_item_cache;
 +
 +/* the number of nodes inside the stable tree */
 +static unsigned long nnodes_stable_tree;
 +
 +/* the number of kernel allocated pages outside the stable tree */
 +static unsigned long nkpage_out_tree;
 +
 +static int kthread_sleep; /* sleep time of the kernel thread */
 +static int kthread_pages_to_scan; /* npages to scan for the kernel thread */
 +static int kthread_max_kernel_pages; /* number of unswappable pages allowed 
 */

The kthread_max_kernel_pages isn't very illuminating.

The use of kthread in the identifier makes is look like part of the
kthread subsystem.

 +static unsigned long ksm_pages_shared;
 +static struct ksm_scan kthread_ksm_scan;
 +static int ksmd_flags;
 +static struct task_struct *kthread;
 +static DECLARE_WAIT_QUEUE_HEAD(kthread_wait);
 +static DECLARE_RWSEM(kthread_lock);
 +
 +

 ...

 +static pte_t *get_pte(struct mm_struct *mm, unsigned long addr)
 +{
 + pgd_t *pgd;
 + pud_t *pud;
 + pmd_t *pmd;
 + pte_t *ptep = NULL;
 +
 + pgd = pgd_offset(mm, addr);
 + if (!pgd_present(*pgd))
 + goto out;
 +
 + pud = pud_offset(pgd, addr);
 + if (!pud_present(*pud))
 + goto out;
 +
 + pmd = pmd_offset(pud, addr);
 + if (!pmd_present(*pmd))
 + goto out;
 +
 + ptep = pte_offset_map(pmd, addr);
 +out:
 + return ptep;
 +}

hm, this looks very generic.  Does it duplicate anything which core
kernel already provides?  If not, perhaps core kernel should provide
this (perhaps after some reorganisation).


 ...

 +static int rmap_hash_init(void)
 +{
 + if (!rmap_hash_size) {
 + struct sysinfo sinfo;
 +
 + si_meminfo(sinfo);
 + rmap_hash_size = sinfo.totalram / 10;

One slot per ten pages of physical memory?  Is this too large, too
small or just right?

 + }
 + nrmaps_hash = rmap_hash_size;
 + rmap_hash = vmalloc(nrmaps_hash * sizeof(struct hlist_head));
 + if (!rmap_hash)
 + return -ENOMEM;
 + memset(rmap_hash, 0, nrmaps_hash * sizeof(struct hlist_head));
 + return 0;
 +}
 +

 ...

 +static void break_cow(struct mm_struct *mm, unsigned long addr)
 +{
 + struct page *page[1];
 +
 + 

[PATCH 4/4] add ksm kernel shared memory driver.

2009-04-08 Thread Izik Eidus
Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Ksm api:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_REGISTER_MEMORY_REGION:
Register userspace virtual address range to be scanned by ksm.
This ioctl is using the ksm_memory_region structure:
ksm_memory_region:
__u32 npages;
 number of pages to share inside this memory region.
__u32 pad;
__u64 addr:
the begining of the virtual address of this region.
__u64 reserved_bits;
reserved bits for future usage.

KSM_REMOVE_MEMORY_REGION:
Remove memory region from ksm.

Signed-off-by: Izik Eidus iei...@redhat.com
Signed-off-by: Chris Wright chr...@redhat.com
Signed-off-by: Andrea Arcangeli aarca...@redhat.com
---
 include/linux/ksm.h|   48 ++
 include/linux/miscdevice.h |1 +
 mm/Kconfig |6 +
 mm/Makefile|1 +
 mm/ksm.c   | 1674 
 5 files changed, 1730 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/ksm.h
 create mode 100644 mm/ksm.c

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
new file mode 100644
index 000..2c11e9a
--- /dev/null
+++ b/include/linux/ksm.h
@@ -0,0 +1,48 @@
+#ifndef __LINUX_KSM_H
+#define __LINUX_KSM_H
+
+/*
+ * Userspace interface for /dev/ksm - kvm shared memory
+ */
+
+#include linux/types.h
+#include linux/ioctl.h
+
+#include asm/types.h
+
+#define KSM_API_VERSION 1
+
+#define ksm_control_flags_run 1
+
+/* for KSM_REGISTER_MEMORY_REGION */
+struct ksm_memory_region {
+   __u32 npages; /* number of pages to share */
+   __u32 pad;
+   __u64 addr; /* the begining of the virtual address */
+__u64 reserved_bits;
+};
+
+#define KSMIO 0xAB
+
+/* ioctls for /dev/ksm */
+
+#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
+/*
+ * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
+ */
+#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd */
+
+/* ioctls for SMA fds */
+
+/*
+ * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
+ * scanned by kvm.
+ */
+#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
+ struct ksm_memory_region)
+/*
+ * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
+ */
+#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
+
+#endif
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index beb6ec9..297c0bb 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -30,6 +30,7 @@
 #define HPET_MINOR 228
 #define FUSE_MINOR 229
 #define KVM_MINOR  232
+#define KSM_MINOR  233
 #define MISC_DYNAMIC_MINOR 255
 
 struct device;
diff --git a/mm/Kconfig b/mm/Kconfig
index b53427a..3f3fd04 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -223,3 +223,9 @@ config HAVE_MLOCKED_PAGE_BIT
 
 config MMU_NOTIFIER
bool
+
+config KSM
+   tristate Enable KSM for page sharing
+   help
+ Enable the KSM kernel module to allow page sharing of equal pages
+ among different tasks.
diff --git a/mm/Makefile b/mm/Makefile
index ec73c68..b885513 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
 obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_SLUB) += slub.o
diff --git a/mm/ksm.c b/mm/ksm.c
new file mode 100644
index 000..a15a92d
--- /dev/null
+++ b/mm/ksm.c
@@ -0,0 +1,1674 @@
+/*
+ * Memory merging driver for Linux
+ *
+ * This module enables dynamic sharing of identical pages 

Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-06 Thread Izik Eidus

Andrey Panin wrote:

On 094, 04 04, 2009 at 05:35:22PM +0300, Izik Eidus wrote:

SNIP

  

+static inline u32 calc_checksum(struct page *page)
+{
+   u32 checksum;
+   void *addr = kmap_atomic(page, KM_USER0);
+   checksum = jhash(addr, PAGE_SIZE, 17);



Why jhash2() is not used here ? It's faster and leads to smaller code size.
  


Beacuse i didnt know, i will check that and change.

Thanks.

(We should really use in cpu crc for Intel Nehalem, and dirty bit for 
the rest of the architactures...)


  

+   kunmap_atomic(addr, KM_USER0);
+   return checksum;
+}



  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] add ksm kernel shared memory driver.

2009-04-04 Thread Izik Eidus
Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Ksm api:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_REGISTER_MEMORY_REGION:
Register userspace virtual address range to be scanned by ksm.
This ioctl is using the ksm_memory_region structure:
ksm_memory_region:
__u32 npages;
 number of pages to share inside this memory region.
__u32 pad;
__u64 addr:
the begining of the virtual address of this region.
__u64 reserved_bits;
reserved bits for future usage.

KSM_REMOVE_MEMORY_REGION:
Remove memory region from ksm.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/ksm.h|   48 ++
 include/linux/miscdevice.h |1 +
 mm/Kconfig |6 +
 mm/Makefile|1 +
 mm/ksm.c   | 1668 
 5 files changed, 1724 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/ksm.h
 create mode 100644 mm/ksm.c

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
new file mode 100644
index 000..2c11e9a
--- /dev/null
+++ b/include/linux/ksm.h
@@ -0,0 +1,48 @@
+#ifndef __LINUX_KSM_H
+#define __LINUX_KSM_H
+
+/*
+ * Userspace interface for /dev/ksm - kvm shared memory
+ */
+
+#include linux/types.h
+#include linux/ioctl.h
+
+#include asm/types.h
+
+#define KSM_API_VERSION 1
+
+#define ksm_control_flags_run 1
+
+/* for KSM_REGISTER_MEMORY_REGION */
+struct ksm_memory_region {
+   __u32 npages; /* number of pages to share */
+   __u32 pad;
+   __u64 addr; /* the begining of the virtual address */
+__u64 reserved_bits;
+};
+
+#define KSMIO 0xAB
+
+/* ioctls for /dev/ksm */
+
+#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
+/*
+ * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
+ */
+#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd */
+
+/* ioctls for SMA fds */
+
+/*
+ * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
+ * scanned by kvm.
+ */
+#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
+ struct ksm_memory_region)
+/*
+ * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
+ */
+#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
+
+#endif
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index beb6ec9..297c0bb 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -30,6 +30,7 @@
 #define HPET_MINOR 228
 #define FUSE_MINOR 229
 #define KVM_MINOR  232
+#define KSM_MINOR  233
 #define MISC_DYNAMIC_MINOR 255
 
 struct device;
diff --git a/mm/Kconfig b/mm/Kconfig
index b53427a..3f3fd04 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -223,3 +223,9 @@ config HAVE_MLOCKED_PAGE_BIT
 
 config MMU_NOTIFIER
bool
+
+config KSM
+   tristate Enable KSM for page sharing
+   help
+ Enable the KSM kernel module to allow page sharing of equal pages
+ among different tasks.
diff --git a/mm/Makefile b/mm/Makefile
index ec73c68..b885513 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
 obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_SLUB) += slub.o
diff --git a/mm/ksm.c b/mm/ksm.c
new file mode 100644
index 000..fb59a08
--- /dev/null
+++ b/mm/ksm.c
@@ -0,0 +1,1668 @@
+/*
+ * Memory merging driver for Linux
+ *
+ * This module enables dynamic sharing of identical pages found in different
+ * memory areas, even if they are not shared by fork()
+ *
+ * Copyright (C) 

Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-02 Thread Avi Kivity

Anthony Liguori wrote:


I'm often afraid of what sort of bugs we'd uncover in kvm if we passed 
the fds around via SCM_RIGHTS and started poking around :-/


kvm checks the mm doesn't change underneath.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-02 Thread Andrea Arcangeli
On Wed, Apr 01, 2009 at 09:36:31PM -0500, Anthony Liguori wrote:
 on this behavior to unregister memory regions, you could potentially have 
 badness happen in the kernel if ksm attempted to access an invalid memory 
 region.

How could you possibly come to this conclusion? If badness could ever
happen then the original task with access to /dev/ksm could make the
same badness happen in the first place without needing to exec or pass
the fd to anybody else with IPC.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-02 Thread Izik Eidus

Anthony Liguori wrote:

Chris Wright wrote:

* Anthony Liguori (anth...@codemonkey.ws) wrote:
 
The ioctl() interface is quite bad for what you're doing.  You're  
telling the kernel extra information about a VA range in 
userspace.   That's what madvise is for.  You're tweaking simple 
read/write values of  kernel infrastructure.  That's what sysfs is for.



I agree re: sysfs (brought it up myself before).  As far as madvise vs.
ioctl, the one thing that comes from the ioctl is fops-release to
automagically unregister memory on exit.


This is precisely why ioctl() is a bad interface.  fops-release isn't 
tied to the process but rather tied to the open file.  The file can 
stay open long after the process exits either by a fork()'d child 
inheriting the file descriptor or through something more sinister like 
SCM_RIGHTS.


In fact, a common mistake is to leak file descriptors by not closing 
them when exec()'ing a process.  Instead of just delaying a close, if 
you rely on this behavior to unregister memory regions, you could 
potentially have badness happen in the kernel if ksm attempted to 
access an invalid memory region. 

How could such badness ever happen in the kernel?
Ksm work by virtual addresses!, it fetch the pages by using 
get_user_pages(), and the mm struct is protected by get_task_mm(), in 
addion we take the down_read(mmap_sem)


So how could ksm ever acces to invalid memory region unless the host 
page table or get_task_mm() would stop working!


When someone register memory for scan, we do get_task_mm() when the file 
is closed or when he say that he dont want this to be registered anymore 
he call the unregister ioctl



You can aurgoment about API, but this is mathamathical thing to say Ksm 
is insecure, please show me senario!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-01 Thread Izik Eidus

KAMEZAWA Hiroyuki wrote:

On Tue, 31 Mar 2009 15:21:53 +0300
Izik Eidus iei...@redhat.com wrote:
  
  
  

kpage is actually what going to be KsmPage - the shared page...

Right now this pages are not swappable..., after ksm will be merged we 
will make this pages swappable as well...




sure.

  

If so, please
 - show the amount of kpage
 
 - allow users to set limit for usage of kpages. or preserve kpages at boot or

   by user's command.
  
  
kpage actually save memory..., and limiting the number of them, would 
make you limit the number of shared pages...





Ah, I'm working for memory control cgroup. And *KSM* will be out of control.
It's ok to make the default limit value as INFINITY. but please add knobs.
  
Sure, when i will post V2 i will take care for this issue (i will do it 
after i get little bit more review for ksm.c :-))


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-01 Thread Izik Eidus

Anthony Liguori wrote:

Andrea Arcangeli wrote:

On Tue, Mar 31, 2009 at 10:54:57AM -0500, Anthony Liguori wrote:
 
You can still disable ksm and simply return ENOSYS for the MADV_ 
flag.  You 



Anthony, the biggest problem about madvice() is that it is a real system 
call api, i wouldnt want in that stage of ksm commit into api changes of 
linux...


The ioctl itself is restricting, madvice is much more...,

Can we draft this issue to after ksm is merged, and after all the big 
new fetures that we want to add to ksm will be merge
(then the api would be much more stable, and we will be able to ask ppl 
in the list about changing of api, but for new driver that it yet to be 
merged, it is kind of overkill to add api to linux)


What do you think?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-01 Thread Anthony Liguori

Izik Eidus wrote:
Anthony, the biggest problem about madvice() is that it is a real 
system call api, i wouldnt want in that stage of ksm commit into api 
changes of linux...


The ioctl itself is restricting, madvice is much more...,

Can we draft this issue to after ksm is merged, and after all the big 
new fetures that we want to add to ksm will be merge
(then the api would be much more stable, and we will be able to ask 
ppl in the list about changing of api, but for new driver that it yet 
to be merged, it is kind of overkill to add api to linux)


What do you think?


You can't change ABIs after something is merged or you break userspace.  
So you need to figure out the right ABI first.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-01 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 You can't change ABIs after something is merged or you break userspace.   
 So you need to figure out the right ABI first.

Absolutely.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-01 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 The ioctl() interface is quite bad for what you're doing.  You're  
 telling the kernel extra information about a VA range in userspace.   
 That's what madvise is for.  You're tweaking simple read/write values of  
 kernel infrastructure.  That's what sysfs is for.

I agree re: sysfs (brought it up myself before).  As far as madvise vs.
ioctl, the one thing that comes from the ioctl is fops-release to
automagically unregister memory on exit.  This needs to be handled
anyway if some -p pid is added to add a process after it's running,
so less weight there.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-01 Thread Anthony Liguori

Chris Wright wrote:

* Anthony Liguori (anth...@codemonkey.ws) wrote:
  
The ioctl() interface is quite bad for what you're doing.  You're  
telling the kernel extra information about a VA range in userspace.   
That's what madvise is for.  You're tweaking simple read/write values of  
kernel infrastructure.  That's what sysfs is for.



I agree re: sysfs (brought it up myself before).  As far as madvise vs.
ioctl, the one thing that comes from the ioctl is fops-release to
automagically unregister memory on exit.


This is precisely why ioctl() is a bad interface.  fops-release isn't 
tied to the process but rather tied to the open file.  The file can stay 
open long after the process exits either by a fork()'d child inheriting 
the file descriptor or through something more sinister like SCM_RIGHTS.


In fact, a common mistake is to leak file descriptors by not closing 
them when exec()'ing a process.  Instead of just delaying a close, if 
you rely on this behavior to unregister memory regions, you could 
potentially have badness happen in the kernel if ksm attempted to access 
an invalid memory region.


So you absolutely have to automatically unregister regions in something 
other than the fops-release handler based on something that's tied to 
the pid's life cycle.


Using an interface like madvise() would force the issue to be dealt with 
properly from the start :-)


I'm often afraid of what sort of bugs we'd uncover in kvm if we passed 
the fds around via SCM_RIGHTS and started poking around :-/


Regards,

Anthony Liguori



  This needs to be handled
anyway if some -p pid is added to add a process after it's running,
so less weight there.

thanks,
-chris
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Izik Eidus

KAMEZAWA Hiroyuki wrote:

On Tue, 31 Mar 2009 02:59:20 +0300
Izik Eidus iei...@redhat.com wrote:

  

Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Ksm api:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_START_STOP_KTHREAD:
Return information about the kernel thread, the inforamtion is returned
using the ksm_kthread_info structure:
ksm_kthread_info:
__u32 sleep:
number of microsecoends to sleep between each iteration of
scanning.

__u32 pages_to_scan:
number of pages to scan for each iteration of scanning.

__u32 max_pages_to_merge:
maximum number of pages to merge in each iteration of scanning
(so even if there are still more pages to scan, we stop this
iteration)

__u32 flags:
   flags to control ksmd (right now just ksm_control_flags_run
  available)

KSM_REGISTER_MEMORY_REGION:
Register userspace virtual address range to be scanned by ksm.
This ioctl is using the ksm_memory_region structure:
ksm_memory_region:
__u32 npages;
 number of pages to share inside this memory region.
__u32 pad;
__u64 addr:
the begining of the virtual address of this region.

KSM_REMOVE_MEMORY_REGION:
Remove memory region from ksm.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/ksm.h|   69 +++
 include/linux/miscdevice.h |1 +
 mm/Kconfig |6 +
 mm/Makefile|1 +
 mm/ksm.c   | 1431 
 5 files changed, 1508 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/ksm.h
 create mode 100644 mm/ksm.c

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
new file mode 100644
index 000..5776dce
--- /dev/null
+++ b/include/linux/ksm.h
@@ -0,0 +1,69 @@
+#ifndef __LINUX_KSM_H
+#define __LINUX_KSM_H
+
+/*
+ * Userspace interface for /dev/ksm - kvm shared memory
+ */
+
+#include linux/types.h
+#include linux/ioctl.h
+
+#include asm/types.h
+
+#define KSM_API_VERSION 1
+
+#define ksm_control_flags_run 1
+
+/* for KSM_REGISTER_MEMORY_REGION */
+struct ksm_memory_region {
+   __u32 npages; /* number of pages to share */
+   __u32 pad;
+   __u64 addr; /* the begining of the virtual address */
+__u64 reserved_bits;
+};
+
+struct ksm_kthread_info {
+   __u32 sleep; /* number of microsecoends to sleep */
+   __u32 pages_to_scan; /* number of pages to scan */
+   __u32 flags; /* control flags */
+__u32 pad;
+__u64 reserved_bits;
+};
+
+#define KSMIO 0xAB
+
+/* ioctls for /dev/ksm */
+
+#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
+/*
+ * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
+ */
+#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd */
+/*
+ * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed
+ * (can stop the kernel thread from working by setting running = 0)
+ */
+#define KSM_START_STOP_KTHREAD  _IOW(KSMIO,  0x02,\
+ struct ksm_kthread_info)
+/*
+ * KSM_GET_INFO_KTHREAD - return information about the kernel thread
+ * scanning speed.
+ */
+#define KSM_GET_INFO_KTHREAD_IOW(KSMIO,  0x03,\
+ struct ksm_kthread_info)
+
+
+/* ioctls for SMA fds */
+
+/*
+ * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
+ * scanned by kvm.
+ */
+#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
+ struct ksm_memory_region)
+/*
+ * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
+ */
+#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
+
+#endif
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 

Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Izik Eidus

Anthony Liguori wrote:

Izik Eidus wrote:

Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Ksm api:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_START_STOP_KTHREAD:
Return information about the kernel thread, the inforamtion is returned
using the ksm_kthread_info structure:
ksm_kthread_info:
__u32 sleep:
number of microsecoends to sleep between each iteration of
scanning.

__u32 pages_to_scan:
number of pages to scan for each iteration of scanning.

__u32 max_pages_to_merge:
maximum number of pages to merge in each iteration of scanning
(so even if there are still more pages to scan, we stop this
iteration)

__u32 flags:
   flags to control ksmd (right now just ksm_control_flags_run
  available)
  


Wouldn't this make more sense as a sysfs interface?


I belive using ioctl for registering memory of applications make it 
easier
Ksm doesnt have any complicated API that would benefit from sysfs 
(beside adding more complexity)


That is, the KSM_START_STOP_KTHREAD part, not necessarily the rest of 
the API.


What you mean?


Regards,

Anthony Liguori



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Anthony Liguori

Izik Eidus wrote:


I belive using ioctl for registering memory of applications make it 
easier


Yes, I completely agree.

Ksm doesnt have any complicated API that would benefit from sysfs 
(beside adding more complexity)


That is, the KSM_START_STOP_KTHREAD part, not necessarily the rest of 
the API.


What you mean?


The ioctl(KSM_START_STOP_KTHREAD) API is distinct from the rest of the 
API.  Whereas the rest of the API is used by applications to register 
their memory with KSM, this API is used by ksmctl to allow parameters to 
be tweaked in userspace.


These parameters are just simple values like enable, pages_to_scan, 
sleep_time.  Then there is KSM_GET_INFO_KTHREAD which provides a read 
interface to these parameters.


You could drop KSM_START_STOP_KTHREAD and KSM_GET_INFO_KTHREAD 
altogether, and introduce a sysfs hierarchy:


/sysfs/some/path/ksm/{enable,pages_to_scan,sleep_time}

That eliminates the need for ksmctl altogether, cleanly separates the 
two APIs, and provides a stronger interface.


The main problem with the current API is that it uses a single device to 
do both the administrative task and the userspace interface.  That means 
that any application that has access to registering its memory with KSM 
also has the ability to disable KSM.  That seems like a security concern 
to me since registering a memory region ought to be an unprivileged 
action whereas enabling/disabling KSM ought to be a privileged action.


Regards,

Anthony Liguori



Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Andrea Arcangeli
On Tue, Mar 31, 2009 at 08:31:31AM -0500, Anthony Liguori wrote:
 You could drop KSM_START_STOP_KTHREAD and KSM_GET_INFO_KTHREAD altogether, 
 and introduce a sysfs hierarchy:

 /sysfs/some/path/ksm/{enable,pages_to_scan,sleep_time}

Introducing a sysfs hierarchy sounds a bit of overkill.

 the ability to disable KSM.  That seems like a security concern to me since 
 registering a memory region ought to be an unprivileged action whereas 
 enabling/disabling KSM ought to be a privileged action.

sysfs files would then only be writeable by admin, so if we want to
allow only admin to start/stop/tune ksm it'd be enough to plug an
admin capability check in the ioctl to provide equivalent permissions.

I could imagine converting the enable/pages_to_scan/sleep_time to
module params and tweaking them through /sys/module/ksm/parameters,
but for enable to work that way, we'd need to intercept the write so
we can at least weakup the kksmd daemon, which doesn't seem possible
with /sys/module/ksm/parameters, so in the end if we stick to the
ioctl for registering regions, it seems simpler to use it for
start/stop/tune too.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Anthony Liguori

Andrea Arcangeli wrote:

the ability to disable KSM.  That seems like a security concern to me since 
registering a memory region ought to be an unprivileged action whereas 
enabling/disabling KSM ought to be a privileged action.



sysfs files would then only be writeable by admin, so if we want to
allow only admin to start/stop/tune ksm it'd be enough to plug an
admin capability check in the ioctl to provide equivalent permissions.
  


Caps are not very granular unless you introduce a new capability.  
Furthermore, it's a bit more difficult to associate a capability with a 
user/group.


With sysfs, you use file based permissions to control the API.  It also 
fits into things like selinux a lot better.


In the very least, if you insist on not using sysfs, you should have a 
separate character device that's used for control (like /dev/ksmctl).


Regards,

Anthony Liguori


I could imagine converting the enable/pages_to_scan/sleep_time to
module params and tweaking them through /sys/module/ksm/parameters,
but for enable to work that way, we'd need to intercept the write so
we can at least weakup the kksmd daemon, which doesn't seem possible
with /sys/module/ksm/parameters, so in the end if we stick to the
ioctl for registering regions, it seems simpler to use it for
start/stop/tune too.
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Andrea Arcangeli
On Tue, Mar 31, 2009 at 09:37:17AM -0500, Anthony Liguori wrote:
 In the very least, if you insist on not using sysfs, you should have a 
 separate character device that's used for control (like /dev/ksmctl).

I'm fine to use sysfs that's not the point, if you've to add a ksmctl
device, then sysfs is surely better. Besides ksm would normally be
enabled at boot, tasks jailed by selinux will better not start/stop
this thing.

If people wants /sys/kernel/mm/ksm instead of the start_stop ioctl we
surely can add it (provided there's a way to intercept write to the
sysfs file). Problem is registering memory could also be done with
'echo 0 -1 /proc/self/ksm' and be inherited by childs, it's not just
start/stop. I mean this is more a matter of taste I'm
afraid... Personally I'm more concerned about the registering of the
ram API than the start/stop thing which I cannot care less about, so
my logic is that as long as this pseudodevice exists, we should use it
for everything. If we go away from it, then we should remove it as a
whole.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Anthony Liguori

Andrea Arcangeli wrote:

On Tue, Mar 31, 2009 at 09:37:17AM -0500, Anthony Liguori wrote:
  
In the very least, if you insist on not using sysfs, you should have a 
separate character device that's used for control (like /dev/ksmctl).



I'm fine to use sysfs that's not the point, if you've to add a ksmctl
device, then sysfs is surely better. Besides ksm would normally be
enabled at boot, tasks jailed by selinux will better not start/stop
this thing.

If people wants /sys/kernel/mm/ksm instead of the start_stop ioctl we
surely can add it (provided there's a way to intercept write to the
sysfs file). Problem is registering memory could also be done with
'echo 0 -1 /proc/self/ksm' and be inherited by childs, it's not just
start/stop. I mean this is more a matter of taste I'm
afraid... Personally I'm more concerned about the registering of the
ram API than the start/stop thing which I cannot care less about,


I don't think the registering of ram should be done via sysfs.  That 
would be a pretty bad interface IMHO.  But I do think the functionality 
that ksmctl provides along with the security issues I mentioned earlier 
really suggest that there ought to be a separate API for control vs. 
registration and that control API would make a lot of sense as a sysfs API.


If you wanted to explore alternative APIs for registration, madvise() 
seems like the obvious candidate to me.


madvise(start, size, MADV_SHARABLE) seems like a pretty obvious API to me.

So combining a sysfs interface for control and an madvise() interface 
for registration seems like a really nice interface to me.


Regards,

Anthony Liguori


 so
my logic is that as long as this pseudodevice exists, we should use it
for everything. If we go away from it, then we should remove it as a
whole.
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Andrea Arcangeli
On Tue, Mar 31, 2009 at 10:09:24AM -0500, Anthony Liguori wrote:
 I don't think the registering of ram should be done via sysfs.  That would 
 be a pretty bad interface IMHO.  But I do think the functionality that 
 ksmctl provides along with the security issues I mentioned earlier really 
 suggest that there ought to be a separate API for control vs. registration 
 and that control API would make a lot of sense as a sysfs API.

 If you wanted to explore alternative APIs for registration, madvise() seems 
 like the obvious candidate to me.

 madvise(start, size, MADV_SHARABLE) seems like a pretty obvious API to me.

madvise to me would sound appropriate, only if ksm would be always-in,
which is not the case as it won't even be built if it's configured to
N.

Besides madvise is sus covered syscall, and this is linux specific detail.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Anthony Liguori

Andrea Arcangeli wrote:

On Tue, Mar 31, 2009 at 10:09:24AM -0500, Anthony Liguori wrote:
  
I don't think the registering of ram should be done via sysfs.  That would 
be a pretty bad interface IMHO.  But I do think the functionality that 
ksmctl provides along with the security issues I mentioned earlier really 
suggest that there ought to be a separate API for control vs. registration 
and that control API would make a lot of sense as a sysfs API.


If you wanted to explore alternative APIs for registration, madvise() seems 
like the obvious candidate to me.


madvise(start, size, MADV_SHARABLE) seems like a pretty obvious API to me.



madvise to me would sound appropriate, only if ksm would be always-in,
which is not the case as it won't even be built if it's configured to
N.
  


You can still disable ksm and simply return ENOSYS for the MADV_ flag.  
You could even keep it as a module if you liked by separating the 
madvise bits from the ksm bits.  The madvise() bits could just provide 
the tracking infrastructure for determine which vmas were currently 
marked as sharable.


You could then have ksm as loadable module that consumed that interface 
to then perform scanning.



Besides madvise is sus covered syscall, and this is linux specific detail.
  


A number of MADV_ flags are Linux specific (like MADV_DOFORK/MADV_DONTFORK).

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Andrea Arcangeli
On Tue, Mar 31, 2009 at 10:54:57AM -0500, Anthony Liguori wrote:
 You can still disable ksm and simply return ENOSYS for the MADV_ flag.  You 

-EINVAL if something, -ENOSYS would tell userland that it shall stop
trying to use madvise, including the other MADV_ too.

 could even keep it as a module if you liked by separating the madvise bits 
 from the ksm bits.  The madvise() bits could just provide the tracking 
 infrastructure for determine which vmas were currently marked as sharable.
 You could then have ksm as loadable module that consumed that interface to 
 then perform scanning.

What's the point of making ksm a module if one has part of ksm code
loaded in the kernel and not being possible to avoid compiling in?
People that says KSM=N in their .config (like embedded running with 1M
of ram), don't want that tracking overhead compiled into the kernel.

Returning -EINVAL would be an option but again I think madvise is core
syscall for SuS and I don't like that those core VM parts returns
-EINVAL at will depend on certain kernel modules being loaded.

 A number of MADV_ flags are Linux specific (like 
 MADV_DOFORK/MADV_DONTFORK).

But those aren't kernel module related, so they're in line with the
standard ones and could be adapted by other OS.

KSM is not a core VM functionality, madvise is a core VM
functionality, so I don't see fit. KSM as ioctl or KSM creating
/proc/pid/ksm when loaded, sounds fine to me instead. If open of
either one fails, application won't register in. It's up to you to
choose KSM=M/N, if you want it as core functionality just build as
KSM=Y but leave the option to others to save memory.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Anthony Liguori

Andrea Arcangeli wrote:

On Tue, Mar 31, 2009 at 10:54:57AM -0500, Anthony Liguori wrote:
  
You can still disable ksm and simply return ENOSYS for the MADV_ flag.  You 



-EINVAL if something, -ENOSYS would tell userland that it shall stop
trying to use madvise, including the other MADV_ too.

  
could even keep it as a module if you liked by separating the madvise bits 
from the ksm bits.  The madvise() bits could just provide the tracking 
infrastructure for determine which vmas were currently marked as sharable.
You could then have ksm as loadable module that consumed that interface to 
then perform scanning.



What's the point of making ksm a module if one has part of ksm code
loaded in the kernel and not being possible to avoid compiling in?
People that says KSM=N in their .config (like embedded running with 1M
of ram), don't want that tracking overhead compiled into the kernel.
  


You have two things here.  CONFIG_MEM_SHARABLE and CONFIG_KSM.  
CONFIG_MEM_SHARABLE cannot be a module. If it's set to =n, then 
madvise(MADV_SHARABLE) == -ENOSYS.


If CONFIG_MEM_SHARABLE=y, then madvise(MADV_SHARABLE) will keep track of 
all sharable memory regions.  Independently of that, CONFIG_KSM can be 
set to n,m,y.  It depends on CONFIG_MEM_SHARABLE and when it's loaded, 
it consumes the list of sharable vmas.


But honestly, CONFIG_MEM_SHARABLE shouldn't a lot of code so I don't see 
why you'd even need to make it configable.


A number of MADV_ flags are Linux specific (like 
MADV_DOFORK/MADV_DONTFORK).



But those aren't kernel module related, so they're in line with the
standard ones and could be adapted by other OS.

KSM is not a core VM functionality, madvise is a core VM
functionality, so I don't see fit. KSM as ioctl or KSM creating
/proc/pid/ksm when loaded, sounds fine to me instead. If open of
either one fails, application won't register in. It's up to you to
choose KSM=M/N, if you want it as core functionality just build as
KSM=Y but leave the option to others to save memory.
  


The ioctl() interface is quite bad for what you're doing.  You're 
telling the kernel extra information about a VA range in userspace.  
That's what madvise is for.  You're tweaking simple read/write values of 
kernel infrastructure.  That's what sysfs is for.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Andrea Arcangeli
On Tue, Mar 31, 2009 at 11:51:14AM -0500, Anthony Liguori wrote:
 You have two things here.  CONFIG_MEM_SHARABLE and CONFIG_KSM.  
 CONFIG_MEM_SHARABLE cannot be a module. If it's set to =n, then 
 madvise(MADV_SHARABLE) == -ENOSYS.

Where the part that -ENOSYS tell userland madvise syscall table is
empty, which is obviously not the case, wasn't clear?

 If CONFIG_MEM_SHARABLE=y, then madvise(MADV_SHARABLE) will keep track of 
 all sharable memory regions.  Independently of that, CONFIG_KSM can be set 
 to n,m,y.  It depends on CONFIG_MEM_SHARABLE and when it's loaded, it 
 consumes the list of sharable vmas.

And what do you gain by creating two config params when only one is
needed other than more pain for the poor user doing make oldconfig and
being asked new zillon of questions that aren't necessary?

 But honestly, CONFIG_MEM_SHARABLE shouldn't a lot of code so I don't see 
 why you'd even need to make it configable.

Even if you were to move the registration code in madvise with a
-EINVAL retval if KSM was set to N for embedded, CONFIG_KSM would be
enough: the registration code would be surrounded by CONFIG_KSM_MODULE
|| CONFIG_KSM, just like page_wrprotect/replace_page. This
CONFIG_MEM_SHARABLE in addition to CONFIG_KSM is beyond what can make
sense to me.

 The ioctl() interface is quite bad for what you're doing.  You're telling 
 the kernel extra information about a VA range in userspace.  That's what 

The ioctl can be extended to also tell which pid to share without
having to specify VA range, and having the feature inherited by the
child. Not everyone wants to deal with VA.

But my main issue with madvise is that it's core kernel functionality
while KSM clearly is not.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread Andrea Arcangeli
Hello,

I attach below some benchmark of the new ksm tree algorithm, showing
ksm performance in best and worst case scenarios.

---
Here a program ksmpages.c that tries to create the worst case scenario
for the ksm tree algorithm.

---
/* ksmpages.c: exercise KSM (C) Red Hat Inc. GPL'd */

#include stdlib.h
#include malloc.h
#include unistd.h
#include fcntl.h
#include stdio.h
#include ksm.h

#define SIZE (1UL*1024*1024*1024)

#define PAGE_SIZE 4096
#define PAGES (SIZE/PAGE_SIZE)

int ksm_register_memory(char * p)
{
int fd;
int ksm_fd;
int r = 1;
struct ksm_memory_region ksm_region;
 
fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
if (fd == -1)
goto out;
 
ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA);
if (ksm_fd == -1)
goto out_free;
 
ksm_region.npages = PAGES;
ksm_region.addr = (unsigned long) p;
r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region);
if (r)
goto out_free1;
 
return r;
 
out_free1:
close(ksm_fd);
out_free:
close(fd);
out:
return r;
}

int main(void)
{
unsigned long page;
char *p = memalign(PAGE_SIZE, PAGES*PAGE_SIZE);
if (!p)
perror(memalign), exit(1);

if (ksm_register_memory(p))
printf(failed to register into ksm, run inside VM\n);
else
printf(registered into ksm, run outside VM\n);

for (page = 0; page  PAGES; page++) {
char *ppage;
ppage = p + page * PAGE_SIZE +
PAGE_SIZE - sizeof(unsigned long);
*(unsigned long *)ppage = page;
}

pause();

return 0;
}
---

ksmpages exercises ksm tree algorithm worst case where pages are all
equal except for the last bytes, so the memcmp breaks after having
accessed the worst-case amount of memory (i.e. almost 4096 bytes for
each level of the stable or unstable tree).

Top after running the first copy of ksmpages:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
16473 andrea20   0 1027m 1.0g  328 S0 25.9   0:01.14 ksmpages

Below is vmstat 1 while running a second copy with kksmd running at
100% CPU load:

---
 1  0   3104 2806044  60256  4553200 0 0  912  338  0 25 74  0
 1  0   3104 2805700  60256  4553200 0 0  676  171  0 27 73  0
 1  0   3104 2805452  60264  4552400 036  708  172  0 23 77  0
 1  0   3104 2806428  60264  4553200 0 0  787  210  0 25 75  0
 1  0   3104 2806212  60264  4552400 0 0  643  132  0 25 75  0
 1  0   3104 2805864  60264  4552400 0 0  685  157  0 27 73  0
 1  0   3104 2805616  60264  4552400 0 0  640  128  0 23 77  0
 1  0   3104 2805368  60264  4552400 0 0  637  131  0 25 75  0
 1  0   3104 2804996  60280  4550800 076  704  165  0 25 75  0
 2  0   3104 2804748  60280  4552400 0 0  636  131  0 27 73  0
 1  0   3104 2804500  60280  4552400 0 0  641  133  0 23 77  0

Here the second copy of ksmpages is started.

 2  0   3104 2660544  60280  4552400 0 0  711  178  0 28 72  0
 1  0   3104 1754096  60280  4552400 0 0  839  172  1 47 53  0

1G of ram has been allocated and initialized by ksmpages.

 1  0   3104 1753848  60280  4552400 0 0  632  122  0 27 73  0
 1  0   3104 1753328  60280  4552400 0 0  661  167  0 23 77  0
 1  0   3104 1753104  60280  4552400 0 0  635  129  0 25 75  0
 1  0   3104 1752856  60280  4552400 0 0  635  127  0 25 75  0
 1  0   3104 1752608  60280  4552400 0 0  677  158  0 27 73  0
 1  0   3104 1752360  60280  4552400 0 0  636  132  0 23 77  0
 1  0   3104 1752112  60280  4552400 0 0  638  133  0 25 75  0
 1  0   3104 1751864  60280  4552400 0 0  665  149  0 25 75  0

It takes around 8 seconds for kksmd to complete a full scan of the 1G
indexed in the unstable tree plus the refresh of the checksum of the
whole 2G registered.

 1  0   3104 1758944  60280  4552400 0 0  649  122  0 27 73  0
 1  0   3104 1772316  60280  4552400 0 0  660  128  0 23 77  0
 1  0   3104 1784668  60280  4552400 0 0  711  159  0 25 75  0
 1  0   3104 1796252  60280  4552400 0 0  669  138  0 25 75  0
 1  0   3104 1807908  60280  4552400 0 0  653  124  0 27 73  0
 1  0   3104 1819044  60280  4552400 0 0  677  148  0 23 77  0
 1  0   3104 1829684  60280  4552400 0 0  649  

Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread KAMEZAWA Hiroyuki
On Tue, 31 Mar 2009 15:21:53 +0300
Izik Eidus iei...@redhat.com wrote:

 kpage is actually what going to be KsmPage - the shared page...
 
 Right now this pages are not swappable..., after ksm will be merged we 
 will make this pages swappable as well...
 
sure.

  If so, please
   - show the amount of kpage
   
   - allow users to set limit for usage of kpages. or preserve kpages at boot 
  or
 by user's command.

 
 kpage actually save memory..., and limiting the number of them, would 
 make you limit the number of shared pages...
 

Ah, I'm working for memory control cgroup. And *KSM* will be out of control.
It's ok to make the default limit value as INFINITY. but please add knobs.

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] add ksm kernel shared memory driver.

2009-03-30 Thread Izik Eidus
Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Ksm api:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_START_STOP_KTHREAD:
Return information about the kernel thread, the inforamtion is returned
using the ksm_kthread_info structure:
ksm_kthread_info:
__u32 sleep:
number of microsecoends to sleep between each iteration of
scanning.

__u32 pages_to_scan:
number of pages to scan for each iteration of scanning.

__u32 max_pages_to_merge:
maximum number of pages to merge in each iteration of scanning
(so even if there are still more pages to scan, we stop this
iteration)

__u32 flags:
   flags to control ksmd (right now just ksm_control_flags_run
  available)

KSM_REGISTER_MEMORY_REGION:
Register userspace virtual address range to be scanned by ksm.
This ioctl is using the ksm_memory_region structure:
ksm_memory_region:
__u32 npages;
 number of pages to share inside this memory region.
__u32 pad;
__u64 addr:
the begining of the virtual address of this region.

KSM_REMOVE_MEMORY_REGION:
Remove memory region from ksm.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/ksm.h|   69 +++
 include/linux/miscdevice.h |1 +
 mm/Kconfig |6 +
 mm/Makefile|1 +
 mm/ksm.c   | 1431 
 5 files changed, 1508 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/ksm.h
 create mode 100644 mm/ksm.c

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
new file mode 100644
index 000..5776dce
--- /dev/null
+++ b/include/linux/ksm.h
@@ -0,0 +1,69 @@
+#ifndef __LINUX_KSM_H
+#define __LINUX_KSM_H
+
+/*
+ * Userspace interface for /dev/ksm - kvm shared memory
+ */
+
+#include linux/types.h
+#include linux/ioctl.h
+
+#include asm/types.h
+
+#define KSM_API_VERSION 1
+
+#define ksm_control_flags_run 1
+
+/* for KSM_REGISTER_MEMORY_REGION */
+struct ksm_memory_region {
+   __u32 npages; /* number of pages to share */
+   __u32 pad;
+   __u64 addr; /* the begining of the virtual address */
+__u64 reserved_bits;
+};
+
+struct ksm_kthread_info {
+   __u32 sleep; /* number of microsecoends to sleep */
+   __u32 pages_to_scan; /* number of pages to scan */
+   __u32 flags; /* control flags */
+__u32 pad;
+__u64 reserved_bits;
+};
+
+#define KSMIO 0xAB
+
+/* ioctls for /dev/ksm */
+
+#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
+/*
+ * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
+ */
+#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd */
+/*
+ * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed
+ * (can stop the kernel thread from working by setting running = 0)
+ */
+#define KSM_START_STOP_KTHREAD  _IOW(KSMIO,  0x02,\
+ struct ksm_kthread_info)
+/*
+ * KSM_GET_INFO_KTHREAD - return information about the kernel thread
+ * scanning speed.
+ */
+#define KSM_GET_INFO_KTHREAD_IOW(KSMIO,  0x03,\
+ struct ksm_kthread_info)
+
+
+/* ioctls for SMA fds */
+
+/*
+ * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
+ * scanned by kvm.
+ */
+#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
+ struct ksm_memory_region)
+/*
+ * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
+ */
+#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
+
+#endif
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index a820f81..6d4f8df 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -29,6 +29,7 @@
 

Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-30 Thread Anthony Liguori

Izik Eidus wrote:

Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Ksm api:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_START_STOP_KTHREAD:
Return information about the kernel thread, the inforamtion is returned
using the ksm_kthread_info structure:
ksm_kthread_info:
__u32 sleep:
number of microsecoends to sleep between each iteration of
scanning.

__u32 pages_to_scan:
number of pages to scan for each iteration of scanning.

__u32 max_pages_to_merge:
maximum number of pages to merge in each iteration of scanning
(so even if there are still more pages to scan, we stop this
iteration)

__u32 flags:
   flags to control ksmd (right now just ksm_control_flags_run
  available)
  


Wouldn't this make more sense as a sysfs interface?  That is, the 
KSM_START_STOP_KTHREAD part, not necessarily the rest of the API.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-30 Thread KAMEZAWA Hiroyuki
On Tue, 31 Mar 2009 02:59:20 +0300
Izik Eidus iei...@redhat.com wrote:

 Ksm is driver that allow merging identical pages between one or more
 applications in way unvisible to the application that use it.
 Pages that are merged are marked as readonly and are COWed when any
 application try to change them.
 
 Ksm is used for cases where using fork() is not suitable,
 one of this cases is where the pages of the application keep changing
 dynamicly and the application cannot know in advance what pages are
 going to be identical.
 
 Ksm works by walking over the memory pages of the applications it
 scan in order to find identical pages.
 It uses a two sorted data strctures called stable and unstable trees
 to find in effective way the identical pages.
 
 When ksm finds two identical pages, it marks them as readonly and merges
 them into single one page,
 after the pages are marked as readonly and merged into one page, linux
 will treat this pages as normal copy_on_write pages and will fork them
 when write access will happen to them.
 
 Ksm scan just memory areas that were registred to be scanned by it.
 
 Ksm api:
 
 KSM_GET_API_VERSION:
 Give the userspace the api version of the module.
 
 KSM_CREATE_SHARED_MEMORY_AREA:
 Create shared memory reagion fd, that latter allow the user to register
 the memory region to scan by using:
 KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION
 
 KSM_START_STOP_KTHREAD:
 Return information about the kernel thread, the inforamtion is returned
 using the ksm_kthread_info structure:
 ksm_kthread_info:
 __u32 sleep:
 number of microsecoends to sleep between each iteration of
 scanning.
 
 __u32 pages_to_scan:
 number of pages to scan for each iteration of scanning.
 
 __u32 max_pages_to_merge:
 maximum number of pages to merge in each iteration of scanning
 (so even if there are still more pages to scan, we stop this
 iteration)
 
 __u32 flags:
flags to control ksmd (right now just ksm_control_flags_run
 available)
 
 KSM_REGISTER_MEMORY_REGION:
 Register userspace virtual address range to be scanned by ksm.
 This ioctl is using the ksm_memory_region structure:
 ksm_memory_region:
 __u32 npages;
  number of pages to share inside this memory region.
 __u32 pad;
 __u64 addr:
 the begining of the virtual address of this region.
 
 KSM_REMOVE_MEMORY_REGION:
 Remove memory region from ksm.
 
 Signed-off-by: Izik Eidus iei...@redhat.com
 ---
  include/linux/ksm.h|   69 +++
  include/linux/miscdevice.h |1 +
  mm/Kconfig |6 +
  mm/Makefile|1 +
  mm/ksm.c   | 1431 
 
  5 files changed, 1508 insertions(+), 0 deletions(-)
  create mode 100644 include/linux/ksm.h
  create mode 100644 mm/ksm.c
 
 diff --git a/include/linux/ksm.h b/include/linux/ksm.h
 new file mode 100644
 index 000..5776dce
 --- /dev/null
 +++ b/include/linux/ksm.h
 @@ -0,0 +1,69 @@
 +#ifndef __LINUX_KSM_H
 +#define __LINUX_KSM_H
 +
 +/*
 + * Userspace interface for /dev/ksm - kvm shared memory
 + */
 +
 +#include linux/types.h
 +#include linux/ioctl.h
 +
 +#include asm/types.h
 +
 +#define KSM_API_VERSION 1
 +
 +#define ksm_control_flags_run 1
 +
 +/* for KSM_REGISTER_MEMORY_REGION */
 +struct ksm_memory_region {
 + __u32 npages; /* number of pages to share */
 + __u32 pad;
 + __u64 addr; /* the begining of the virtual address */
 +__u64 reserved_bits;
 +};
 +
 +struct ksm_kthread_info {
 + __u32 sleep; /* number of microsecoends to sleep */
 + __u32 pages_to_scan; /* number of pages to scan */
 + __u32 flags; /* control flags */
 +__u32 pad;
 +__u64 reserved_bits;
 +};
 +
 +#define KSMIO 0xAB
 +
 +/* ioctls for /dev/ksm */
 +
 +#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
 +/*
 + * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
 + */
 +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd 
 */
 +/*
 + * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed
 + * (can stop the kernel thread from working by setting running = 0)
 + */
 +#define KSM_START_STOP_KTHREAD_IOW(KSMIO,  0x02,\
 +   struct ksm_kthread_info)
 +/*
 + * KSM_GET_INFO_KTHREAD - return information about the kernel thread
 + * scanning speed.
 + */
 +#define KSM_GET_INFO_KTHREAD  _IOW(KSMIO,  0x03,\
 +   struct ksm_kthread_info)
 +
 +
 +/* ioctls for SMA fds */
 +
 +/*
 + * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
 + * scanned by kvm.
 + */
 +#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
 +   struct ksm_memory_region)
 +/*
 + * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
 + */
 +#define KSM_REMOVE_MEMORY_REGION