Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-07 Thread Andrea Arcangeli
On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote:
 From v1 to v2:
 
 1)Fixed security issue found by Chris Wright:
 Ksm was checking if page is a shared page by running !PageAnon.
 Beacuse that Ksm scan only anonymous memory, all !PageAnons
 inside ksm data strctures are shared page, however there might
 be a case for do_wp_page() when the VM_SHARED is used where
 do_wp_page() would instead of copying the page into new anonymos
 page, would reuse the page, it was fixed by adding check for the
 dirty_bit of the virtual addresses pointing into the shared page.
 I was not finding any VM code tha would clear the dirty bit from
 this virtual address (due to the fact that we allocate the page
 using page_alloc() - kernel allocated pages), ~but i still want
 confirmation about this from the vm guys - thanks.~

As far as I can tell this wasn't a bug and this change is
unnecessary. I already checked this bit but I may have missed
something, so I ask here to be sure.

As far as I can tell when VM_SHARED is set, no anonymous page can ever
be allocated by in that vma range, hence no KSM page can ever be
generated in that vma either. MAP_SHARED|MAP_ANONYMOUS is only a
different API for /dev/shm, IPCSHM backing, no anonymous pages can
live there. It surely worked like that in older 2.6, reading latest
code it seems to still work like that, but if something has changed
Hugh will surely correct me in a jiffy ;).

I still see this in the file=null path.
  
  } else if (vm_flags  VM_SHARED) {
error = shmem_zero_setup(vma);
  if (error)
goto free_vma;
}


So you can revert your change for now.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Nikola Ciprich
Hi Izik,
Is there some user documentation available? (apart from RTFS?:))
I've compiled kernel with v2 of Your patches, loaded ksm module,
did echo 1  /proc/sys/kernel/mm/ksm/run, but I think it didn't do
anything, at least no pages were collected..
Could You advise me a bit?
thanks a lot in advance...
I can't wait to try it on our hosts runing 50-60 KVMs :)
BR
nik


On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote:
 From v1 to v2:
 
 1)Fixed security issue found by Chris Wright:
 Ksm was checking if page is a shared page by running !PageAnon.
 Beacuse that Ksm scan only anonymous memory, all !PageAnons
 inside ksm data strctures are shared page, however there might
 be a case for do_wp_page() when the VM_SHARED is used where
 do_wp_page() would instead of copying the page into new anonymos
 page, would reuse the page, it was fixed by adding check for the
 dirty_bit of the virtual addresses pointing into the shared page.
 I was not finding any VM code tha would clear the dirty bit from
 this virtual address (due to the fact that we allocate the page
 using page_alloc() - kernel allocated pages), ~but i still want
 confirmation about this from the vm guys - thanks.~
 
 2)Moved to sysfs to control ksm:
 It was requested as a better way to control the ksm scanning
 thread than ioctls.
 the sysfs api:
 dir: /sys/kernel/mm/ksm/
 
 kernel_pages_allocated - information about how many kernel pages
 ksm have allocated, this pages are not swappable, and each page
 like that is used by ksm to share pages with identical content
 
 pages_shared - how many pages were shared by ksm
 
 run - set to 1 when you want ksm to run, 0 when no
 
 max_kernel_pages - set the maximum amount of kernel pages
 to be allocated by ksm, set 0 for unlimited.
 
 pages_to_scan - how many pages to scan before ksm will sleep
 
 sleep - how much usecs ksm will sleep.
 
 3)Add sysfs paramater to control the maximum kernel pages to be by
 ksm.
 
 4)Add statistics about how much pages are really shared.
 
 
 One issue still to be discussed:
 There was a suggestion to use madvice(SHAREABLE) instead of using
 ioctls to register memory that need to be scanned by ksm.
 Such change is outside the area of ksm.c and would required adding
 new madvice api, and change some parts of the vm and the kernel
 code, so first thing to do, is realized if we really want this.
 
 I dont know any other open issues.
 
 Thanks.
 
 This is from the first post:
 (The kvm part, togather with the kvm-userspace part, was post with V1
 before about a week, whoever want to test ksm may download the
 patch from lkml archive)
 
 KSM is a linux driver that allows dynamicly sharing identical memory
 pages between one or more processes.
 
 Unlike tradtional page sharing that is made at the allocation of the
 memory, ksm do it dynamicly after the memory was created.
 Memory is periodically scanned; identical pages are identified and
 merged.
 The sharing is unnoticeable by the process that use this memory.
 (the shared pages are marked as readonly, and in case of write
 do_wp_page() take care to create new copy of the page)
 
 To find identical pages ksm use algorithm that is split into three
 primery levels:
 
 1) Ksm will start scan the memory and will calculate checksum for each
page that is registred to be scanned.
(In the first round of the scanning, ksm would only calculate
 this checksum for all the pages)
 
 2) Ksm will go again on the whole memory and will recalculate the
checmsum of the pages, pages that are found to have the same
checksum value, would be considered pages that are most likely
wont changed
Ksm will insert this pages into sorted by page content RB-tree that
is called unstable tree, the reason that this tree is called
unstable is due to the fact that the page contents might changed
while they are still inside the tree, and therefore the tree would
become corrupted.
Due to this problem ksm take two more steps in addition to the
checksum calculation:
a) Ksm will throw and recreate the entire unstable tree each round
   of memory scanning - so if we have corruption, it will be fixed
   when we will rebuild the tree.
b) Ksm is using RB-tree, that its balancing is made by the node color
   and not by the content, so even if the page get corrupted, it still
   would take the same amount of time to search on it.
 
 3) In addition to the unstable tree, ksm hold another tree that is called
stable tree - this tree is RB-tree that is sorted by the pages
content and all its pages are write protected, and therefore it cant get
corrupted.
Each time ksm will find two identcial pages using the unstable tree,
it will create new write-protected shared page, and this page will be
inserted into the stable tree, and would be saved there, the
stable tree, unlike the 

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Andrea Arcangeli
On Mon, Apr 06, 2009 at 05:04:49PM +1000, Nick Piggin wrote:
 They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc.
 Presumably they will probably want to control it to interleave it over
 all numa nodes and use hugepages for it. It would be very little work.

I thought it's the intermediate result of the computations that leads
to lots of equal data too, in which case ksm is the only way to share
it all.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-04 Thread Izik Eidus
From v1 to v2:

1)Fixed security issue found by Chris Wright:
Ksm was checking if page is a shared page by running !PageAnon.
Beacuse that Ksm scan only anonymous memory, all !PageAnons
inside ksm data strctures are shared page, however there might
be a case for do_wp_page() when the VM_SHARED is used where
do_wp_page() would instead of copying the page into new anonymos
page, would reuse the page, it was fixed by adding check for the
dirty_bit of the virtual addresses pointing into the shared page.
I was not finding any VM code tha would clear the dirty bit from
this virtual address (due to the fact that we allocate the page
using page_alloc() - kernel allocated pages), ~but i still want
confirmation about this from the vm guys - thanks.~

2)Moved to sysfs to control ksm:
It was requested as a better way to control the ksm scanning
thread than ioctls.
the sysfs api:
dir: /sys/kernel/mm/ksm/

kernel_pages_allocated - information about how many kernel pages
ksm have allocated, this pages are not swappable, and each page
like that is used by ksm to share pages with identical content

pages_shared - how many pages were shared by ksm

run - set to 1 when you want ksm to run, 0 when no

max_kernel_pages - set the maximum amount of kernel pages
to be allocated by ksm, set 0 for unlimited.

pages_to_scan - how many pages to scan before ksm will sleep

sleep - how much usecs ksm will sleep.

3)Add sysfs paramater to control the maximum kernel pages to be by
ksm.

4)Add statistics about how much pages are really shared.


One issue still to be discussed:
There was a suggestion to use madvice(SHAREABLE) instead of using
ioctls to register memory that need to be scanned by ksm.
Such change is outside the area of ksm.c and would required adding
new madvice api, and change some parts of the vm and the kernel
code, so first thing to do, is realized if we really want this.

I dont know any other open issues.

Thanks.

This is from the first post:
(The kvm part, togather with the kvm-userspace part, was post with V1
before about a week, whoever want to test ksm may download the
patch from lkml archive)

KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

To find identical pages ksm use algorithm that is split into three
primery levels:

1) Ksm will start scan the memory and will calculate checksum for each
   page that is registred to be scanned.
   (In the first round of the scanning, ksm would only calculate
this checksum for all the pages)

2) Ksm will go again on the whole memory and will recalculate the
   checmsum of the pages, pages that are found to have the same
   checksum value, would be considered pages that are most likely
   wont changed
   Ksm will insert this pages into sorted by page content RB-tree that
   is called unstable tree, the reason that this tree is called
   unstable is due to the fact that the page contents might changed
   while they are still inside the tree, and therefore the tree would
   become corrupted.
   Due to this problem ksm take two more steps in addition to the
   checksum calculation:
   a) Ksm will throw and recreate the entire unstable tree each round
  of memory scanning - so if we have corruption, it will be fixed
  when we will rebuild the tree.
   b) Ksm is using RB-tree, that its balancing is made by the node color
  and not by the content, so even if the page get corrupted, it still
  would take the same amount of time to search on it.

3) In addition to the unstable tree, ksm hold another tree that is called
   stable tree - this tree is RB-tree that is sorted by the pages
   content and all its pages are write protected, and therefore it cant get
   corrupted.
   Each time ksm will find two identcial pages using the unstable tree,
   it will create new write-protected shared page, and this page will be
   inserted into the stable tree, and would be saved there, the
   stable tree, unlike the unstable tree, is never throwen away, so each
   page that we find would be saved inside it.

Taking into account the three levels that described above, the algorithm
work like that:

search primary tree (sorted by entire page contents, pages write protected)
- if match found, merge
- if no match found...
  - search secondary tree (sorted by entire page contents, pages not write
protected)
- if match found, merge
  - remove from secondary tree and insert merged page into primary tree
- if no match found...
  - 

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-28 Thread Dmitri Monakhov
Izik Eidus [EMAIL PROTECTED] writes:

 (From v1 to v2 the main change is much more documentation)

 KSM is a linux driver that allows dynamicly sharing identical memory
 pages between one or more processes.

 Unlike tradtional page sharing that is made at the allocation of the
 memory, ksm do it dynamicly after the memory was created.
 Memory is periodically scanned; identical pages are identified and
 merged.
 The sharing is unnoticeable by the process that use this memory.
 (the shared pages are marked as readonly, and in case of write
 do_wp_page() take care to create new copy of the page)

 This driver is very useful for KVM as in cases of runing multiple guests
 operation system of the same type.
Hi Izik, approach that was used in the driver commonly known as
content based search. Where are several variants of it
most commons are:
1: with guest TM support
2: w/o guest vm support.
You have implemented second one, but seems it already was patented
http://www.google.com/patents?vid=USPAT6789156
I'm not a lawyer but IMHO we have direct conflict here.
From other point of view they have patented the WEEL, but at least we
have to know about this.
 (For desktop work loads we have achived more than x2 memory overcommit
 (more like x3))

 This driver have found users other than KVM, for example CERN,
 Fons Rademakers:
 on many-core machines we run one large detector simulation program per core.
 These simulation programs are identical but run each in their own process and
 need about 2 - 2.5 GB RAM.
 We typically buy machines with 2GB RAM per core and so have a problem to run
 one of these programs per core.
 Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field
 maps, detector geometry, etc.
 Currently people have been trying to start one program, initialize the 
 geometry
 and field maps and then fork it N times, to have the data shared.
 With KSM this would be done automatically by the system so it sounded 
 extremely
 attractive when Andrea presented it.

 (We have are already started to test KSM on their systems...)

 KSM can run as kernel thread or as userspace application or both

 example for how to control the kernel thread:

 #include stdio.h
 #include stdlib.h
 #include string.h
 #include sys/types.h
 #include sys/stat.h
 #include sys/ioctl.h
 #include fcntl.h
 #include sys/mman.h
 #include unistd.h
 #include ksm.h

 int main(int argc, char *argv[])
 {
   int fd;
   int used = 0;
   int fd_start;
   struct ksm_kthread_info info;
   

   if (argc  2) {
   fprintf(stderr,
   usage: %s {start npages sleep | stop | info}\n,
   argv[0]);
   exit(1);
   }

   fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
   if (fd == -1) {
   fprintf(stderr, could not open /dev/ksm\n);
   exit(1);
   }

   if (!strncmp(argv[1], start, strlen(argv[1]))) {
   used = 1;
   if (argc  4) {
   fprintf(stderr,
   usage: %s start npages_to_scan max_pages_to_merge sleep\n,
   argv[0]);
   exit(1);
   }
   info.pages_to_scan = atoi(argv[2]);
   info.max_pages_to_merge = atoi(argv[3]);
   info.sleep = atoi(argv[4]);
   info.flags = ksm_control_flags_run;

   fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
   if (fd_start == -1) {
   fprintf(stderr, KSM_START_KTHREAD failed\n);
   exit(1);
   }
   printf(created scanner\n);
   }

   if (!strncmp(argv[1], stop, strlen(argv[1]))) {
   used = 1;
   info.flags = 0;
   fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
   printf(stopped scanner\n);
   }

   if (!strncmp(argv[1], info, strlen(argv[1]))) {
   used = 1;
   ioctl(fd, KSM_GET_INFO_KTHREAD, info);
printf(flags %d, pages_to_scan %d npages_merge %d, sleep_time %d\n,
info.flags, info.pages_to_scan, info.max_pages_to_merge, info.sleep);
   }

   if (!used)
   fprintf(stderr, unknown command %s\n, argv[1]);

   return 0;
 }

 example of how to register qemu to ksm (or any userspace application)

 diff --git a/qemu/vl.c b/qemu/vl.c
 index 4721fdd..7785bf9 100644
 --- a/qemu/vl.c
 +++ b/qemu/vl.c
 @@ -21,6 +21,7 @@
   * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
   * DEALINGS IN
   * THE SOFTWARE.
   */
 +#include ksm.h
  #include hw/hw.h
  #include hw/boards.h
  #include hw/usb.h
 @@ -5799,6 +5800,37 @@ static void termsig_setup(void)
  
  #endif
  
 +int ksm_register_memory(void)
 +{
 +int fd;
 +int ksm_fd;
 +int r = 1;
 +struct ksm_memory_region ksm_region;
 +
 +fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
 +if (fd == -1)
 +goto out;
 +
 +

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-28 Thread Alan Cox
 You have implemented second one, but seems it already was patented
 http://www.google.com/patents?vid=USPAT6789156
 I'm not a lawyer but IMHO we have direct conflict here.
 From other point of view they have patented the WEEL, but at least we
 have to know about this.

Its an old idea and appeared for Linux in March 1998: Little project from
Philipp Reisner called mergemem.

http://groups.google.com/group/muc.lists.linux-kernel/browse_thread/thread/387af278089c7066?ie=utf-8oe=utf-8q=share+identical+pages#b3d4f68fb5dd4f88

so if there is a patent which is relevant (and thats a question for
lawyers and legal patent search people) perhaps the Linux Foundation and
some of the patent busters could take a look at mergemem and
re-examination.

Alan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Izik Eidus

ציטוט Ryota OZAKI:

Hi Izik,

I've tried your patch set, but ksm doesn't work in my machine.

I compiled linux patched with the four patches and configured with KSM
and KVM enabled. After boot with the linux, I run two VMs running linux
using QEMU with a patch in your mail and started KSM scanner with your
script, then the host linux caused panic with the following oops.
  


Yes you are right, we are missing pte_unmap(pte); in get_pte()!
that will effect just 32bits with highmem so this why you see it
thanks for the reporting, i will fix it for v3

below patch should fix it (i cant test it now, will test it for v3)

can you report if it fix your problem? thanks


== BEGINNING of OOPS
kernel BUG at arch/x86/mm/highmem_32.c:87!
invalid opcode:  [#1] SMP
last sysfs file: /sys/class/net/vnet-ssh2/address
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in: netconsole autofs4 nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables
x_tables loop kvm_intel kvm iTCO_wdt iTCO_vendor_support igb
netxen_nic button ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore
[last unloaded: microcode]

Pid: 343, comm: kksmd Not tainted
(2.6.28-rc5-linus-head-20081119-sparsemem #1) X7DWA
EIP: 0060:[c041eff9] EFLAGS: 00010206 CPU: 6
EIP is at kmap_atomic_prot+0x7d/0xeb
EAX: c0008d94 EBX: c1ff6240 ECX: 0163 EDX: 7e00
ESI: 0154 EDI: 0055 EBP: f5cdbf10 ESP: f5cdbef8
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process kksmd (pid: 343, ti=f5cda000 task=f617b140 task.ti=f5cda000)
Stack:
 7fa12163 f000 c204efbc f50479e8 9eb7e000 c08a34d0 f5cdbf18 c041f07a
 f5cdbf28 c048339c  f5c271e0 f5cdbf30 c04833bc f5cdbfb0 c0483b0d
 f5cdbf50 c0425845  0064 0009 c08a34d0 f5cdbfb0 c06384c1
Call Trace:
 [c041f07a] ? kmap_atomic+0x13/0x15
 [c048339c] ? get_pte+0x50/0x63
 [c04833bc] ? is_present_pte+0xd/0x1f
 [c0483b0d] ? ksm_scan_start+0x9a/0x7ac
 [c0425845] ? finish_task_switch+0x29/0xa4
 [c06384c1] ? schedule+0x6bf/0x719
 [c041b3fc] ? default_spin_lock_flags+0x8/0xc
 [c043bffa] ? finish_wait+0x49/0x4e
 [c04845f4] ? kthread_ksm_scan_thread+0x0/0xdc
 [c048462e] ? kthread_ksm_scan_thread+0x3a/0xdc
 [c043bf31] ? autoremove_wake_function+0x0/0x38
 [c043be3e] ? kthread+0x40/0x66
 [c043bdfe] ? kthread+0x0/0x66
 [c0404997] ? kernel_thread_helper+0x7/0x10
Code: 86 00 00 00 64 a1 04 a0 82 c0 6b c0 0d 8d 3c 30 a1 78 b0 77 c0
8d 34 bd 00 00 00 00 89 45 ec a1 0c d0 84 c0 29 f0 83 38 00 74 04 0f
0b eb fe c1 ea 1a 8b 04 d5 80 32 8a c0 83 e0 fc 29 c3 c1 fb
EIP: [c041eff9] kmap_atomic_prot+0x7d/0xeb SS:ESP 0068:f5cdbef8
Kernel panic - not syncing: Fatal exception
== END of OOPS
  


diff --git a/mm/ksm.c b/mm/ksm.c
index 707be52..e14448a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -562,6 +562,7 @@ static pte_t *get_pte(struct mm_struct *mm, unsigned long 
addr)
goto out;
 
ptep = pte_offset_map(pmd, addr);
+   pte_unmap(ptep);
 out:
return ptep;
 }


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Izik Eidus

ציטוט Izik Eidus:

ציטוט Ryota OZAKI:

Hi Izik,

I've tried your patch set, but ksm doesn't work in my machine.

I compiled linux patched with the four patches and configured with KSM
and KVM enabled. After boot with the linux, I run two VMs running linux
using QEMU with a patch in your mail and started KSM scanner with your
script, then the host linux caused panic with the following oops.
  


Yes you are right, we are missing pte_unmap(pte); in get_pte()!
that will effect just 32bits with highmem so this why you see it
thanks for the reporting, i will fix it for v3

below patch should fix it (i cant test it now, will test it for v3)

can you report if it fix your problem? thanks


Thinking about what i just did, it is wrong,
this patch is the right one (still wasnt tested), but if you are going 
to apply something then use this one.


thanks
diff --git a/mm/ksm.c b/mm/ksm.c
index 707be52..c842c29 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -569,14 +569,16 @@ out:
 static int is_present_pte(struct mm_struct *mm, unsigned long addr)
 {
pte_t *ptep;
+   int r;
 
ptep = get_pte(mm, addr);
if (!ptep)
return 0;
 
-   if (pte_present(*ptep))
-   return 1;
-   return 0;
+   r = pte_present(*ptep);
+   pte_unmap(ptep);
+
+   return r;
 }
 
 #define PAGEHASH_LEN 128
@@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm,
if (!orig_ptep)
goto out_unlock;
orig_pte = *orig_ptep;
+   pte_unmap(orig_ptep);
if (!pte_present(orig_pte))
goto out_unlock;
if (page_to_pfn(oldpage) != pte_pfn(orig_pte))


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Ryota OZAKI
2008/11/20 Izik Eidus [EMAIL PROTECTED]:
 ציטוט Izik Eidus:

 ציטוט Ryota OZAKI:

 Hi Izik,

 I've tried your patch set, but ksm doesn't work in my machine.

 I compiled linux patched with the four patches and configured with KSM
 and KVM enabled. After boot with the linux, I run two VMs running linux
 using QEMU with a patch in your mail and started KSM scanner with your
 script, then the host linux caused panic with the following oops.


 Yes you are right, we are missing pte_unmap(pte); in get_pte()!
 that will effect just 32bits with highmem so this why you see it
 thanks for the reporting, i will fix it for v3

 below patch should fix it (i cant test it now, will test it for v3)

 can you report if it fix your problem? thanks

 Thinking about what i just did, it is wrong,
 this patch is the right one (still wasnt tested), but if you are going to
 apply something then use this one.

Great! Applied the 2nd patch, ksm works with both HIGHMEM enabled and disabled.

Thanks for your quick response,
  ozaki-r


 thanks

 diff --git a/mm/ksm.c b/mm/ksm.c
 index 707be52..c842c29 100644
 --- a/mm/ksm.c
 +++ b/mm/ksm.c
 @@ -569,14 +569,16 @@ out:
  static int is_present_pte(struct mm_struct *mm, unsigned long addr)
  {
pte_t *ptep;
 +   int r;

ptep = get_pte(mm, addr);
if (!ptep)
return 0;

 -   if (pte_present(*ptep))
 -   return 1;
 -   return 0;
 +   r = pte_present(*ptep);
 +   pte_unmap(ptep);
 +
 +   return r;
  }

  #define PAGEHASH_LEN 128
 @@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm,
if (!orig_ptep)
goto out_unlock;
orig_pte = *orig_ptep;
 +   pte_unmap(orig_ptep);
if (!pte_present(orig_pte))
goto out_unlock;
if (page_to_pfn(oldpage) != pte_pfn(orig_pte))




[PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-16 Thread Izik Eidus
(From v1 to v2 the main change is much more documentation)

KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

This driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type.
(For desktop work loads we have achived more than x2 memory overcommit
(more like x3))

This driver have found users other than KVM, for example CERN,
Fons Rademakers:
on many-core machines we run one large detector simulation program per core.
These simulation programs are identical but run each in their own process and
need about 2 - 2.5 GB RAM.
We typically buy machines with 2GB RAM per core and so have a problem to run
one of these programs per core.
Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field
maps, detector geometry, etc.
Currently people have been trying to start one program, initialize the geometry
and field maps and then fork it N times, to have the data shared.
With KSM this would be done automatically by the system so it sounded extremely
attractive when Andrea presented it.

(We have are already started to test KSM on their systems...)

KSM can run as kernel thread or as userspace application or both

example for how to control the kernel thread:

#include stdio.h
#include stdlib.h
#include string.h
#include sys/types.h
#include sys/stat.h
#include sys/ioctl.h
#include fcntl.h
#include sys/mman.h
#include unistd.h
#include ksm.h

int main(int argc, char *argv[])
{
int fd;
int used = 0;
int fd_start;
struct ksm_kthread_info info;


if (argc  2) {
fprintf(stderr,
usage: %s {start npages sleep | stop | info}\n,
argv[0]);
exit(1);
}

fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
if (fd == -1) {
fprintf(stderr, could not open /dev/ksm\n);
exit(1);
}

if (!strncmp(argv[1], start, strlen(argv[1]))) {
used = 1;
if (argc  4) {
fprintf(stderr,
usage: %s start npages_to_scan max_pages_to_merge sleep\n,
argv[0]);
exit(1);
}
info.pages_to_scan = atoi(argv[2]);
info.max_pages_to_merge = atoi(argv[3]);
info.sleep = atoi(argv[4]);
info.flags = ksm_control_flags_run;

fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
if (fd_start == -1) {
fprintf(stderr, KSM_START_KTHREAD failed\n);
exit(1);
}
printf(created scanner\n);
}

if (!strncmp(argv[1], stop, strlen(argv[1]))) {
used = 1;
info.flags = 0;
fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
printf(stopped scanner\n);
}

if (!strncmp(argv[1], info, strlen(argv[1]))) {
used = 1;
ioctl(fd, KSM_GET_INFO_KTHREAD, info);
 printf(flags %d, pages_to_scan %d npages_merge %d, sleep_time %d\n,
 info.flags, info.pages_to_scan, info.max_pages_to_merge, info.sleep);
}

if (!used)
fprintf(stderr, unknown command %s\n, argv[1]);

return 0;
}

example of how to register qemu to ksm (or any userspace application)

diff --git a/qemu/vl.c b/qemu/vl.c
index 4721fdd..7785bf9 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -21,6 +21,7 @@
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN
  * THE SOFTWARE.
  */
+#include ksm.h
 #include hw/hw.h
 #include hw/boards.h
 #include hw/usb.h
@@ -5799,6 +5800,37 @@ static void termsig_setup(void)
 
 #endif
 
+int ksm_register_memory(void)
+{
+int fd;
+int ksm_fd;
+int r = 1;
+struct ksm_memory_region ksm_region;
+
+fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
+if (fd == -1)
+goto out;
+
+ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA);
+if (ksm_fd == -1)
+goto out_free;
+
+ksm_region.npages = phys_ram_size / TARGET_PAGE_SIZE;
+ksm_region.addr = phys_ram_base;
+r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region);
+if (r)
+goto out_free1;
+
+return r;
+
+out_free1:
+close(ksm_fd);
+out_free:
+close(fd);
+out:
+return r;
+}
+
 int main(int argc, char **argv)
 {
 #ifdef CONFIG_GDBSTUB
@@ -6735,6 +6767,8 @@ int main(int argc, char **argv)