Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-20 Thread Nick Piggin
On Friday 17 April 2009 17:08:07 Jared Hulbert wrote:
  As everyone knows, my favourite thing is to say nasty things about any
  new feature that adds complexity to common code. I feel like crying to
  hear about how many more instances of MS Office we can all run, if only
  we apply this patch. And the poorly written HPC app just sounds like
  scrapings from the bottom of justification barrel.
 
  I'm sorry, maybe I'm way off with my understanding of how important
  this is. There isn't too much help in the changelog. A discussion of
  where the memory savings comes from, and how far does things like
  sharing of fs image, or ballooning goes and how much extra savings we
  get from this... with people from other hypervisors involved as well.
  Have I missed this kind of discussion?
 
 Nick,
 
 I don't know about other hypervisors, fs and balloonings, but I have
 tried this out.  It works.  It works on apps I don't consider, poorly
 written.  I'm very excited about this.  I got 10% saving in a
 roughly off the shelf embedded system.  No user noticeable performance
 impact.

OK well that's what I want to hear. Thanks, that means a lot to me.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-17 Thread Jared Hulbert
 As everyone knows, my favourite thing is to say nasty things about any
 new feature that adds complexity to common code. I feel like crying to
 hear about how many more instances of MS Office we can all run, if only
 we apply this patch. And the poorly written HPC app just sounds like
 scrapings from the bottom of justification barrel.

 I'm sorry, maybe I'm way off with my understanding of how important
 this is. There isn't too much help in the changelog. A discussion of
 where the memory savings comes from, and how far does things like
 sharing of fs image, or ballooning goes and how much extra savings we
 get from this... with people from other hypervisors involved as well.
 Have I missed this kind of discussion?

Nick,

I don't know about other hypervisors, fs and balloonings, but I have
tried this out.  It works.  It works on apps I don't consider, poorly
written.  I'm very excited about this.  I got 10% saving in a
roughly off the shelf embedded system.  No user noticeable performance
impact.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-16 Thread Nick Piggin
On Wednesday 15 April 2009 08:09:03 Andrew Morton wrote:
 On Thu,  9 Apr 2009 06:58:37 +0300
 Izik Eidus iei...@redhat.com wrote:
 
  KSM is a linux driver that allows dynamicly sharing identical memory
  pages between one or more processes.
 
 Generally looks OK to me.  But that doesn't mean much.  We should rub
 bottles with words like hugh and nick on them to be sure.

I haven't looked too closely at it yet sorry. Hugh has a great eye for
these details, though, hint hint :)

As everyone knows, my favourite thing is to say nasty things about any
new feature that adds complexity to common code. I feel like crying to
hear about how many more instances of MS Office we can all run, if only
we apply this patch. And the poorly written HPC app just sounds like
scrapings from the bottom of justification barrel.

I'm sorry, maybe I'm way off with my understanding of how important
this is. There isn't too much help in the changelog. A discussion of
where the memory savings comes from, and how far does things like
sharing of fs image, or ballooning goes and how much extra savings we
get from this... with people from other hypervisors involved as well.
Have I missed this kind of discussion?

Careful what you wish for, ay? :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-16 Thread Izik Eidus

Nick Piggin wrote:

On Wednesday 15 April 2009 08:09:03 Andrew Morton wrote:
  

On Thu,  9 Apr 2009 06:58:37 +0300
Izik Eidus iei...@redhat.com wrote:



KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.
  

Generally looks OK to me.  But that doesn't mean much.  We should rub
bottles with words like hugh and nick on them to be sure.



I haven't looked too closely at it yet sorry. Hugh has a great eye for
these details, though, hint hint :)

As everyone knows, my favourite thing is to say nasty things about any
new feature that adds complexity to common code.


The whole idea and the way i wrote it so it wont touch common code, i 
didnt change the linux mm logic no where.

The worst thing that we have add is helper functions.


 I feel like crying to
hear about how many more instances of MS Office we can all run, if only
we apply this patch.


And more instances of linux guests...


 And the poorly written HPC app just sounds like
scrapings from the bottom of justification barrel.
  


So if you have a big rendering application that load gigas of 
geometrical data that is handled by many threads
and you have a case that each thread sometimes change this geometrical 
data and you dont want the other threads will notice it.
How would you share it in traditional way?, after one time shared data 
will get cowed, how will you recollect it again when it become identical?

KSM do it for applications transparently

KSM writing motivation indeed was KVM where there it is highly needed 
you may check what VMware say about the fact that they have much better 
overcommit than Hyper-V / XEN:


http://blogs.vmware.com/virtualreality/2008/03/cheap-hyperviso.html

It is important to understand that in virtualization enviorments there 
are cases where memory is much more critical than any other resource for 
higher density.


Together with KSM, KVM will have the same memory overcommit abilitys 
such as VMware have.

I'm sorry, maybe I'm way off with my understanding of how important
this is. There isn't too much help in the changelog. A discussion of
where the memory savings comes from,


Memory saving come from identical librarys, identical kernels, zeroed 
pages - that is for virtualization.
The Librarys code will always be identical among similar guests, so why 
have this code at multiple places on the host memory?



 and how far does things like
sharing of fs image, or ballooning goes and how much extra savings we
get from this...


Ballooning is much worse when it come to performance, beacuse what it 
does is shrink the guest memory, with KSM we find identical pages and 
merge them into one page, so we dont get guest performance lose



 with people from other hypervisors involved as well.
Have I missed this kind of discussion?

Careful what you wish for, ay? :)
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-14 Thread Andrew Morton
On Thu,  9 Apr 2009 06:58:37 +0300
Izik Eidus iei...@redhat.com wrote:

 KSM is a linux driver that allows dynamicly sharing identical memory
 pages between one or more processes.

Generally looks OK to me.  But that doesn't mean much.  We should rub
bottles with words like hugh and nick on them to be sure.



 ...

  include/linux/ksm.h  |   48 ++
  include/linux/miscdevice.h   |1 +
  include/linux/mm.h   |5 +
  include/linux/mmu_notifier.h |   34 +
  include/linux/rmap.h |   11 +
  mm/Kconfig   |6 +
  mm/Makefile  |1 +
  mm/ksm.c | 1674 
 ++
  mm/memory.c  |   90 +++-
  mm/mmu_notifier.c|   20 +
  mm/rmap.c|  139 

And it's pretty unobtrusive for what it is.  I expect we can get this
into 2.6.31 unless there are some pratfalls which I missed.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-08 Thread Izik Eidus
From v2 to v3:

1)Remove unnessery check of is_dirty_pte() inside PageKsm()
   We have added the is_dirty_pte() chceck to protect against the
   reuse: case inside do_wp_page().
   Andrea pointed to me that such condtion couldnt ever happen,
   du to the fact that if VM_SHARED is set no Anonymous page can be
   on the vma, therefore it is unpossible that such Page would become
   KsmPage and therefore KsmPages would never trigger the reuse case
   (Checkout From v1 to v2 for more info)

2)Add !vm_file check in addition to PageKsm() to check if sharedpage
   Until now Ksm was checking whatever Pages are sharedpages (KsmPage)
   by just running get_user_page() and then check if Page != AnonPage.
   The problem raise as Ksm keep virtual addresses inside its data
   strctures and if the user will free page and allocate new !AnonPage
   Page, Ksm might think this page is shared page.
   To solve this problem we have added an additional check for Ksm,
   We are checking whatever the vma-vm_file is set to NULL, in case
   we see a virtual address that its vma-vm_file is NULL and the
   page that it pointing into it isnt AnonPage we can safetly know that
   this is shared page (KsmPage).
  
3)Replace jhash() with jhash2()
   Andrey Panin pointed that we should use jhash2 as it faster than
   jhash().

Thanks.

(Below is info from previous posts)

From v1 to v2:

1)Fixed security issue found by Chris Wright:
Ksm was checking if page is a shared page by running !PageAnon.
Beacuse that Ksm scan only anonymous memory, all !PageAnons
inside ksm data strctures are shared page, however there might
be a case for do_wp_page() when the VM_SHARED is used where
do_wp_page() would instead of copying the page into new anonymos
page, would reuse the page, it was fixed by adding check for the
dirty_bit of the virtual addresses pointing into the shared page.
I was not finding any VM code tha would clear the dirty bit from
this virtual address (due to the fact that we allocate the page
using page_alloc() - kernel allocated pages), ~but i still want
confirmation about this from the vm guys - thanks.~

2)Moved to sysfs to control ksm:
It was requested as a better way to control the ksm scanning
thread than ioctls.
the sysfs api:
dir: /sys/kernel/mm/ksm/

kernel_pages_allocated - information about how many kernel pages
ksm have allocated, this pages are not swappable, and each page
like that is used by ksm to share pages with identical content

pages_shared - how many pages were shared by ksm

run - set to 1 when you want ksm to run, 0 when no

max_kernel_pages - set the maximum amount of kernel pages
to be allocated by ksm, set 0 for unlimited.

pages_to_scan - how many pages to scan before ksm will sleep

sleep - how much usecs ksm will sleep.

3)Add sysfs paramater to control the maximum kernel pages to be by
ksm.

4)Add statistics about how much pages are really shared.


One issue still to be discussed:
There was a suggestion to use madvice(SHAREABLE) instead of using
ioctls to register memory that need to be scanned by ksm.
Such change is outside the area of ksm.c and would required adding
new madvice api, and change some parts of the vm and the kernel
code, so first thing to do, is realized if we really want this.

I dont know any other open issues.

Thanks.

This is from the first post:
(The kvm part, togather with the kvm-userspace part, was post with V1
before about a week, whoever want to test ksm may download the
patch from lkml archive)

KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

To find identical pages ksm use algorithm that is split into three
primery levels:

1) Ksm will start scan the memory and will calculate checksum for each
   page that is registred to be scanned.
   (In the first round of the scanning, ksm would only calculate
this checksum for all the pages)

2) Ksm will go again on the whole memory and will recalculate the
   checmsum of the pages, pages that are found to have the same
   checksum value, would be considered pages that are most likely
   wont changed
   Ksm will insert this pages into sorted by page content RB-tree that
   is called unstable tree, the reason that this tree is called
   unstable is due to the fact that the page contents might changed
   while they are still inside the tree, and therefore the tree would
   become corrupted.
   Due to this problem ksm take two more steps in addition to the
   checksum calculation:
   a) Ksm will throw 

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-07 Thread Andrea Arcangeli
On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote:
 From v1 to v2:
 
 1)Fixed security issue found by Chris Wright:
 Ksm was checking if page is a shared page by running !PageAnon.
 Beacuse that Ksm scan only anonymous memory, all !PageAnons
 inside ksm data strctures are shared page, however there might
 be a case for do_wp_page() when the VM_SHARED is used where
 do_wp_page() would instead of copying the page into new anonymos
 page, would reuse the page, it was fixed by adding check for the
 dirty_bit of the virtual addresses pointing into the shared page.
 I was not finding any VM code tha would clear the dirty bit from
 this virtual address (due to the fact that we allocate the page
 using page_alloc() - kernel allocated pages), ~but i still want
 confirmation about this from the vm guys - thanks.~

As far as I can tell this wasn't a bug and this change is
unnecessary. I already checked this bit but I may have missed
something, so I ask here to be sure.

As far as I can tell when VM_SHARED is set, no anonymous page can ever
be allocated by in that vma range, hence no KSM page can ever be
generated in that vma either. MAP_SHARED|MAP_ANONYMOUS is only a
different API for /dev/shm, IPCSHM backing, no anonymous pages can
live there. It surely worked like that in older 2.6, reading latest
code it seems to still work like that, but if something has changed
Hugh will surely correct me in a jiffy ;).

I still see this in the file=null path.
  
  } else if (vm_flags  VM_SHARED) {
error = shmem_zero_setup(vma);
  if (error)
goto free_vma;
}


So you can revert your change for now.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Nikola Ciprich
Hi Izik,
Is there some user documentation available? (apart from RTFS?:))
I've compiled kernel with v2 of Your patches, loaded ksm module,
did echo 1  /proc/sys/kernel/mm/ksm/run, but I think it didn't do
anything, at least no pages were collected..
Could You advise me a bit?
thanks a lot in advance...
I can't wait to try it on our hosts runing 50-60 KVMs :)
BR
nik


On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote:
 From v1 to v2:
 
 1)Fixed security issue found by Chris Wright:
 Ksm was checking if page is a shared page by running !PageAnon.
 Beacuse that Ksm scan only anonymous memory, all !PageAnons
 inside ksm data strctures are shared page, however there might
 be a case for do_wp_page() when the VM_SHARED is used where
 do_wp_page() would instead of copying the page into new anonymos
 page, would reuse the page, it was fixed by adding check for the
 dirty_bit of the virtual addresses pointing into the shared page.
 I was not finding any VM code tha would clear the dirty bit from
 this virtual address (due to the fact that we allocate the page
 using page_alloc() - kernel allocated pages), ~but i still want
 confirmation about this from the vm guys - thanks.~
 
 2)Moved to sysfs to control ksm:
 It was requested as a better way to control the ksm scanning
 thread than ioctls.
 the sysfs api:
 dir: /sys/kernel/mm/ksm/
 
 kernel_pages_allocated - information about how many kernel pages
 ksm have allocated, this pages are not swappable, and each page
 like that is used by ksm to share pages with identical content
 
 pages_shared - how many pages were shared by ksm
 
 run - set to 1 when you want ksm to run, 0 when no
 
 max_kernel_pages - set the maximum amount of kernel pages
 to be allocated by ksm, set 0 for unlimited.
 
 pages_to_scan - how many pages to scan before ksm will sleep
 
 sleep - how much usecs ksm will sleep.
 
 3)Add sysfs paramater to control the maximum kernel pages to be by
 ksm.
 
 4)Add statistics about how much pages are really shared.
 
 
 One issue still to be discussed:
 There was a suggestion to use madvice(SHAREABLE) instead of using
 ioctls to register memory that need to be scanned by ksm.
 Such change is outside the area of ksm.c and would required adding
 new madvice api, and change some parts of the vm and the kernel
 code, so first thing to do, is realized if we really want this.
 
 I dont know any other open issues.
 
 Thanks.
 
 This is from the first post:
 (The kvm part, togather with the kvm-userspace part, was post with V1
 before about a week, whoever want to test ksm may download the
 patch from lkml archive)
 
 KSM is a linux driver that allows dynamicly sharing identical memory
 pages between one or more processes.
 
 Unlike tradtional page sharing that is made at the allocation of the
 memory, ksm do it dynamicly after the memory was created.
 Memory is periodically scanned; identical pages are identified and
 merged.
 The sharing is unnoticeable by the process that use this memory.
 (the shared pages are marked as readonly, and in case of write
 do_wp_page() take care to create new copy of the page)
 
 To find identical pages ksm use algorithm that is split into three
 primery levels:
 
 1) Ksm will start scan the memory and will calculate checksum for each
page that is registred to be scanned.
(In the first round of the scanning, ksm would only calculate
 this checksum for all the pages)
 
 2) Ksm will go again on the whole memory and will recalculate the
checmsum of the pages, pages that are found to have the same
checksum value, would be considered pages that are most likely
wont changed
Ksm will insert this pages into sorted by page content RB-tree that
is called unstable tree, the reason that this tree is called
unstable is due to the fact that the page contents might changed
while they are still inside the tree, and therefore the tree would
become corrupted.
Due to this problem ksm take two more steps in addition to the
checksum calculation:
a) Ksm will throw and recreate the entire unstable tree each round
   of memory scanning - so if we have corruption, it will be fixed
   when we will rebuild the tree.
b) Ksm is using RB-tree, that its balancing is made by the node color
   and not by the content, so even if the page get corrupted, it still
   would take the same amount of time to search on it.
 
 3) In addition to the unstable tree, ksm hold another tree that is called
stable tree - this tree is RB-tree that is sorted by the pages
content and all its pages are write protected, and therefore it cant get
corrupted.
Each time ksm will find two identcial pages using the unstable tree,
it will create new write-protected shared page, and this page will be
inserted into the stable tree, and would be saved there, the
stable tree, unlike the 

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Andrea Arcangeli
On Mon, Apr 06, 2009 at 05:04:49PM +1000, Nick Piggin wrote:
 They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc.
 Presumably they will probably want to control it to interleave it over
 all numa nodes and use hugepages for it. It would be very little work.

I thought it's the intermediate result of the computations that leads
to lots of equal data too, in which case ksm is the only way to share
it all.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-04 Thread Izik Eidus
From v1 to v2:

1)Fixed security issue found by Chris Wright:
Ksm was checking if page is a shared page by running !PageAnon.
Beacuse that Ksm scan only anonymous memory, all !PageAnons
inside ksm data strctures are shared page, however there might
be a case for do_wp_page() when the VM_SHARED is used where
do_wp_page() would instead of copying the page into new anonymos
page, would reuse the page, it was fixed by adding check for the
dirty_bit of the virtual addresses pointing into the shared page.
I was not finding any VM code tha would clear the dirty bit from
this virtual address (due to the fact that we allocate the page
using page_alloc() - kernel allocated pages), ~but i still want
confirmation about this from the vm guys - thanks.~

2)Moved to sysfs to control ksm:
It was requested as a better way to control the ksm scanning
thread than ioctls.
the sysfs api:
dir: /sys/kernel/mm/ksm/

kernel_pages_allocated - information about how many kernel pages
ksm have allocated, this pages are not swappable, and each page
like that is used by ksm to share pages with identical content

pages_shared - how many pages were shared by ksm

run - set to 1 when you want ksm to run, 0 when no

max_kernel_pages - set the maximum amount of kernel pages
to be allocated by ksm, set 0 for unlimited.

pages_to_scan - how many pages to scan before ksm will sleep

sleep - how much usecs ksm will sleep.

3)Add sysfs paramater to control the maximum kernel pages to be by
ksm.

4)Add statistics about how much pages are really shared.


One issue still to be discussed:
There was a suggestion to use madvice(SHAREABLE) instead of using
ioctls to register memory that need to be scanned by ksm.
Such change is outside the area of ksm.c and would required adding
new madvice api, and change some parts of the vm and the kernel
code, so first thing to do, is realized if we really want this.

I dont know any other open issues.

Thanks.

This is from the first post:
(The kvm part, togather with the kvm-userspace part, was post with V1
before about a week, whoever want to test ksm may download the
patch from lkml archive)

KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

To find identical pages ksm use algorithm that is split into three
primery levels:

1) Ksm will start scan the memory and will calculate checksum for each
   page that is registred to be scanned.
   (In the first round of the scanning, ksm would only calculate
this checksum for all the pages)

2) Ksm will go again on the whole memory and will recalculate the
   checmsum of the pages, pages that are found to have the same
   checksum value, would be considered pages that are most likely
   wont changed
   Ksm will insert this pages into sorted by page content RB-tree that
   is called unstable tree, the reason that this tree is called
   unstable is due to the fact that the page contents might changed
   while they are still inside the tree, and therefore the tree would
   become corrupted.
   Due to this problem ksm take two more steps in addition to the
   checksum calculation:
   a) Ksm will throw and recreate the entire unstable tree each round
  of memory scanning - so if we have corruption, it will be fixed
  when we will rebuild the tree.
   b) Ksm is using RB-tree, that its balancing is made by the node color
  and not by the content, so even if the page get corrupted, it still
  would take the same amount of time to search on it.

3) In addition to the unstable tree, ksm hold another tree that is called
   stable tree - this tree is RB-tree that is sorted by the pages
   content and all its pages are write protected, and therefore it cant get
   corrupted.
   Each time ksm will find two identcial pages using the unstable tree,
   it will create new write-protected shared page, and this page will be
   inserted into the stable tree, and would be saved there, the
   stable tree, unlike the unstable tree, is never throwen away, so each
   page that we find would be saved inside it.

Taking into account the three levels that described above, the algorithm
work like that:

search primary tree (sorted by entire page contents, pages write protected)
- if match found, merge
- if no match found...
  - search secondary tree (sorted by entire page contents, pages not write
protected)
- if match found, merge
  - remove from secondary tree and insert merged page into primary tree
- if no match found...
  - 

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-04-02 Thread Jesper Juhl
Hi,

On Tue, 31 Mar 2009, Izik Eidus wrote:

 KSM is a linux driver that allows dynamicly sharing identical memory
 pages between one or more processes.
 
 Unlike tradtional page sharing that is made at the allocation of the
 memory, ksm do it dynamicly after the memory was created.
 Memory is periodically scanned; identical pages are identified and
 merged.
 The sharing is unnoticeable by the process that use this memory.
 (the shared pages are marked as readonly, and in case of write
 do_wp_page() take care to create new copy of the page)
 
 To find identical pages ksm use algorithm that is split into three
 primery levels:
 
 1) Ksm will start scan the memory and will calculate checksum for each
page that is registred to be scanned.
(In the first round of the scanning, ksm would only calculate
 this checksum for all the pages)
 

One question;

Calcolating a checksum is a fine way to find pages that are likely to be 
identical, but there is no guarantee that two pages with the same 
checksum really are identical - there *will* be checksum collisions 
eventually. So, I really hope that your implementation actually checks 
that two pages that it find that have identical checksums really are 100% 
identical by comparing them bit by bit before throwing one away.
If you rely only on a checksum then eventually a user will get bitten by a 
checksum collision and, in the best case, something will crash, and in the 
worst case, data will silently be corrupted.

Do you rely only on the checksum or do you actually compare pages to check 
they are 100% identical before sharing?

I must admit that I have not read through the patch to find the answer, I 
just read your description and became concerned.

-- 
Jesper Juhl j...@chaosbits.net http://www.chaosbits.net/
Plain text mails only, please  http://www.expita.com/nomime.html
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-04-02 Thread Chris Wright
* Jesper Juhl (j...@chaosbits.net) wrote:
 Do you rely only on the checksum or do you actually compare pages to check 
 they are 100% identical before sharing?

Checksum has absolutely nothing to do w/ finding if two pages match.
It's only used as a heuristic to suggest whether a single page has
changed.  If that page is changing we won't bother trying to find a
match for it.  Here's an example of the life of a page w.r.t checksum.

1. checksum = uninitialized
2. first time page is found, checksum it (checksum = A).
   if checksum has changed (uninitialize != A) don't go any further w/ that page
3. next time page is found, checksum it (checksum = B).
   if checksum has change (A != B) don't go any further w/ that page
4. next time page is found, checksum it (checksum = B).
   if checksum has changed (B == B)...it hasn't, continue processing the
   page

later if a match is found in the tree (which is sorted by _contents_,
i.e. memcmp) we'll attempt to merge the pages which at it's very core
does:

if (pages_identical(oldpage, newpage))
ret = replace_page(vma, oldpage, newpage, orig_pte, newprot);

pages_identical?  you guessed it...just does:

r = memcmp(addr1, addr2, PAGE_SIZE)

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-04-02 Thread Izik Eidus

Jesper Juhl wrote:

Hi,

On Tue, 31 Mar 2009, Izik Eidus wrote:

  

KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

To find identical pages ksm use algorithm that is split into three
primery levels:

1) Ksm will start scan the memory and will calculate checksum for each
   page that is registred to be scanned.
   (In the first round of the scanning, ksm would only calculate
this checksum for all the pages)




One question;

Calcolating a checksum is a fine way to find pages that are likely to be 
identical


I dont use checksum as with hash table, the checksum doesnt use to find 
identical pages by the way that they have similer data...
the checksum is used to let me know that the page was not changed for a 
while and it is worth checking for identical pages to it...
In the future we will want to use the page table dirty bit for it, as 
taking checksum is somewhat expensive


, but there is no guarantee that two pages with the same 
checksum really are identical - there *will* be checksum collisions 
eventually. So, I really hope that your implementation actually checks 
that two pages that it find that have identical checksums really are 100% 
identical by comparing them bit by bit before throwing one away.
  

We do that :-)

If you rely only on a checksum then eventually a user will get bitten by a 
checksum collision and, in the best case, something will crash, and in the 
worst case, data will silently be corrupted.


Do you rely only on the checksum or do you actually compare pages to check 
they are 100% identical before sharing?
  


I do 100% compare to the pages before i share them.

I must admit that I have not read through the patch to find the answer, I 
just read your description and became concerned.


  

Dont worry, me neither :-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-04-02 Thread Jesper Juhl
On Thu, 2 Apr 2009, Chris Wright wrote:

 * Jesper Juhl (j...@chaosbits.net) wrote:
  Do you rely only on the checksum or do you actually compare pages to check 
  they are 100% identical before sharing?
 
 Checksum has absolutely nothing to do w/ finding if two pages match.
 It's only used as a heuristic to suggest whether a single page has
 changed.  If that page is changing we won't bother trying to find a
 match for it.  Here's an example of the life of a page w.r.t checksum.
 
 1. checksum = uninitialized
 2. first time page is found, checksum it (checksum = A).
if checksum has changed (uninitialize != A) don't go any further w/ that 
 page
 3. next time page is found, checksum it (checksum = B).
if checksum has change (A != B) don't go any further w/ that page
 4. next time page is found, checksum it (checksum = B).
if checksum has changed (B == B)...it hasn't, continue processing the
page
 
 later if a match is found in the tree (which is sorted by _contents_,
 i.e. memcmp) we'll attempt to merge the pages which at it's very core
 does:
 
   if (pages_identical(oldpage, newpage))
   ret = replace_page(vma, oldpage, newpage, orig_pte, newprot);
 
 pages_identical?  you guessed it...just does:
 
   r = memcmp(addr1, addr2, PAGE_SIZE)
 

Thank you for that explanation, it set my mind at ease :-)


-- 
Jesper Juhl j...@chaosbits.net http://www.chaosbits.net/
Plain text mails only, please  http://www.expita.com/nomime.html
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-03-31 Thread Izik Eidus

Anthony Liguori wrote:

Izik Eidus wrote:

I am sending another seires of patchs for kvm kernel and kvm-userspace
that would allow users of kvm to test ksm with it.
The kvm patchs would apply to Avi git tree.
  
Any reason to not take these through upstream QEMU instead of 
kvm-userspace?  In principle, I don't see anything that would prevent 
normal QEMU from almost making use of this functionality.  That would 
make it one less thing to eventually have to merge...


The changes for the kvm-userspace were just provided for testing it...
After we will have ksm inside the kernel we will send another patch to 
qemu-devel that will add support for it.




Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-03-30 Thread Izik Eidus
KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

To find identical pages ksm use algorithm that is split into three
primery levels:

1) Ksm will start scan the memory and will calculate checksum for each
   page that is registred to be scanned.
   (In the first round of the scanning, ksm would only calculate
this checksum for all the pages)

2) Ksm will go again on the whole memory and will recalculate the
   checmsum of the pages, pages that are found to have the same
   checksum value, would be considered pages that are most likely
   wont changed
   Ksm will insert this pages into sorted by page content RB-tree that
   is called unstable tree, the reason that this tree is called
   unstable is due to the fact that the page contents might changed
   while they are still inside the tree, and therefore the tree would
   become corrupted.
   Due to this problem ksm take two more steps in addition to the
   checksum calculation:
   a) Ksm will throw and recreate the entire unstable tree each round
  of memory scanning - so if we have corruption, it will be fixed
  when we will rebuild the tree.
   b) Ksm is using RB-tree, that its balancing is made by the node color
  and not by the content, so even if the page get corrupted, it still
  would take the same amount of time to search on it.

3) In addition to the unstable tree, ksm hold another tree that is called
   stable tree - this tree is RB-tree that is sorted by the pages
   content and all its pages are write protected, and therefore it cant get
   corrupted.
   Each time ksm will find two identcial pages using the unstable tree,
   it will create new write-protected shared page, and this page will be
   inserted into the stable tree, and would be saved there, the
   stable tree, unlike the unstable tree, is never throwen away, so each
   page that we find would be saved inside it.

Taking into account the three levels that described above, the algorithm
work like that:

search primary tree (sorted by entire page contents, pages write protected)
- if match found, merge
- if no match found...
  - search secondary tree (sorted by entire page contents, pages not write
protected)
- if match found, merge
  - remove from secondary tree and insert merged page into primary tree
- if no match found...
  - checksum
- if checksum hasn't changed
  - insert into secondary tree
- if it has, store updated checksum (note: first time this page
  is handled it won't have a checksum, so checksum will appear
  as changed, so it takes two passes w/ no other matches to
  get into secondary tree)
  - do not insert into any tree, will see it again on next pass

The basic idea of this algorithm, is that even if the unstable tree doesnt
promise to us to find two identical pages in the first round, we would
probably find them in the second or the third or the tenth round,
then after we have found this two identical pages only once, we will insert
them into the stable tree, and then they would be protected there forever.
So the all idea of the unstable tree, is just to build the stable tree and
then we will find the identical pages using it.

The current implemantion can be improved alot:
we dont have to calculate exspensive checksum, we can just use the host
dirty bit.

currently we dont support shared pages swapping (other pages that are not
shared can be swapped (all the pages that we didnt find to be identical
to other pages...).

Walking on the tree, we keep call to get_user_pages(), we can optimized it
by saving the pfn, and using mmu notifiers to know when the virtual address
mapping was changed.

We currently scan just programs that were registred to be used by ksm, we
would later want to add the abilaty to tell ksm to scan PIDS (so you can
scan closed binary applications as well).

Right now ksm scanning is made by just one thread, multiple scanners
support might would be needed.

This driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type.
(For desktop work loads we have achived more than x2 memory overcommit
(more like x3))

This driver have found users other than KVM, for example CERN,
Fons Rademakers:
on many-core machines we run one large detector simulation program per core.
These simulation programs are identical but run each in their own process and
need about 2 - 2.5 GB RAM.
We typically buy machines with 2GB RAM per core and so have a problem to run
one of these programs per core.
Of 

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-03-30 Thread Anthony Liguori

Izik Eidus wrote:

I am sending another seires of patchs for kvm kernel and kvm-userspace
that would allow users of kvm to test ksm with it.
The kvm patchs would apply to Avi git tree.
  
Any reason to not take these through upstream QEMU instead of 
kvm-userspace?  In principle, I don't see anything that would prevent 
normal QEMU from almost making use of this functionality.  That would 
make it one less thing to eventually have to merge...


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-28 Thread Dmitri Monakhov
Izik Eidus [EMAIL PROTECTED] writes:

 (From v1 to v2 the main change is much more documentation)

 KSM is a linux driver that allows dynamicly sharing identical memory
 pages between one or more processes.

 Unlike tradtional page sharing that is made at the allocation of the
 memory, ksm do it dynamicly after the memory was created.
 Memory is periodically scanned; identical pages are identified and
 merged.
 The sharing is unnoticeable by the process that use this memory.
 (the shared pages are marked as readonly, and in case of write
 do_wp_page() take care to create new copy of the page)

 This driver is very useful for KVM as in cases of runing multiple guests
 operation system of the same type.
Hi Izik, approach that was used in the driver commonly known as
content based search. Where are several variants of it
most commons are:
1: with guest TM support
2: w/o guest vm support.
You have implemented second one, but seems it already was patented
http://www.google.com/patents?vid=USPAT6789156
I'm not a lawyer but IMHO we have direct conflict here.
From other point of view they have patented the WEEL, but at least we
have to know about this.
 (For desktop work loads we have achived more than x2 memory overcommit
 (more like x3))

 This driver have found users other than KVM, for example CERN,
 Fons Rademakers:
 on many-core machines we run one large detector simulation program per core.
 These simulation programs are identical but run each in their own process and
 need about 2 - 2.5 GB RAM.
 We typically buy machines with 2GB RAM per core and so have a problem to run
 one of these programs per core.
 Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field
 maps, detector geometry, etc.
 Currently people have been trying to start one program, initialize the 
 geometry
 and field maps and then fork it N times, to have the data shared.
 With KSM this would be done automatically by the system so it sounded 
 extremely
 attractive when Andrea presented it.

 (We have are already started to test KSM on their systems...)

 KSM can run as kernel thread or as userspace application or both

 example for how to control the kernel thread:

 #include stdio.h
 #include stdlib.h
 #include string.h
 #include sys/types.h
 #include sys/stat.h
 #include sys/ioctl.h
 #include fcntl.h
 #include sys/mman.h
 #include unistd.h
 #include ksm.h

 int main(int argc, char *argv[])
 {
   int fd;
   int used = 0;
   int fd_start;
   struct ksm_kthread_info info;
   

   if (argc  2) {
   fprintf(stderr,
   usage: %s {start npages sleep | stop | info}\n,
   argv[0]);
   exit(1);
   }

   fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
   if (fd == -1) {
   fprintf(stderr, could not open /dev/ksm\n);
   exit(1);
   }

   if (!strncmp(argv[1], start, strlen(argv[1]))) {
   used = 1;
   if (argc  4) {
   fprintf(stderr,
   usage: %s start npages_to_scan max_pages_to_merge sleep\n,
   argv[0]);
   exit(1);
   }
   info.pages_to_scan = atoi(argv[2]);
   info.max_pages_to_merge = atoi(argv[3]);
   info.sleep = atoi(argv[4]);
   info.flags = ksm_control_flags_run;

   fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
   if (fd_start == -1) {
   fprintf(stderr, KSM_START_KTHREAD failed\n);
   exit(1);
   }
   printf(created scanner\n);
   }

   if (!strncmp(argv[1], stop, strlen(argv[1]))) {
   used = 1;
   info.flags = 0;
   fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
   printf(stopped scanner\n);
   }

   if (!strncmp(argv[1], info, strlen(argv[1]))) {
   used = 1;
   ioctl(fd, KSM_GET_INFO_KTHREAD, info);
printf(flags %d, pages_to_scan %d npages_merge %d, sleep_time %d\n,
info.flags, info.pages_to_scan, info.max_pages_to_merge, info.sleep);
   }

   if (!used)
   fprintf(stderr, unknown command %s\n, argv[1]);

   return 0;
 }

 example of how to register qemu to ksm (or any userspace application)

 diff --git a/qemu/vl.c b/qemu/vl.c
 index 4721fdd..7785bf9 100644
 --- a/qemu/vl.c
 +++ b/qemu/vl.c
 @@ -21,6 +21,7 @@
   * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
   * DEALINGS IN
   * THE SOFTWARE.
   */
 +#include ksm.h
  #include hw/hw.h
  #include hw/boards.h
  #include hw/usb.h
 @@ -5799,6 +5800,37 @@ static void termsig_setup(void)
  
  #endif
  
 +int ksm_register_memory(void)
 +{
 +int fd;
 +int ksm_fd;
 +int r = 1;
 +struct ksm_memory_region ksm_region;
 +
 +fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
 +if (fd == -1)
 +goto out;
 +
 +

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-28 Thread Alan Cox
 You have implemented second one, but seems it already was patented
 http://www.google.com/patents?vid=USPAT6789156
 I'm not a lawyer but IMHO we have direct conflict here.
 From other point of view they have patented the WEEL, but at least we
 have to know about this.

Its an old idea and appeared for Linux in March 1998: Little project from
Philipp Reisner called mergemem.

http://groups.google.com/group/muc.lists.linux-kernel/browse_thread/thread/387af278089c7066?ie=utf-8oe=utf-8q=share+identical+pages#b3d4f68fb5dd4f88

so if there is a patent which is relevant (and thats a question for
lawyers and legal patent search people) perhaps the Linux Foundation and
some of the patent busters could take a look at mergemem and
re-examination.

Alan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Izik Eidus

ציטוט Ryota OZAKI:

Hi Izik,

I've tried your patch set, but ksm doesn't work in my machine.

I compiled linux patched with the four patches and configured with KSM
and KVM enabled. After boot with the linux, I run two VMs running linux
using QEMU with a patch in your mail and started KSM scanner with your
script, then the host linux caused panic with the following oops.
  


Yes you are right, we are missing pte_unmap(pte); in get_pte()!
that will effect just 32bits with highmem so this why you see it
thanks for the reporting, i will fix it for v3

below patch should fix it (i cant test it now, will test it for v3)

can you report if it fix your problem? thanks


== BEGINNING of OOPS
kernel BUG at arch/x86/mm/highmem_32.c:87!
invalid opcode:  [#1] SMP
last sysfs file: /sys/class/net/vnet-ssh2/address
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in: netconsole autofs4 nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables
x_tables loop kvm_intel kvm iTCO_wdt iTCO_vendor_support igb
netxen_nic button ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore
[last unloaded: microcode]

Pid: 343, comm: kksmd Not tainted
(2.6.28-rc5-linus-head-20081119-sparsemem #1) X7DWA
EIP: 0060:[c041eff9] EFLAGS: 00010206 CPU: 6
EIP is at kmap_atomic_prot+0x7d/0xeb
EAX: c0008d94 EBX: c1ff6240 ECX: 0163 EDX: 7e00
ESI: 0154 EDI: 0055 EBP: f5cdbf10 ESP: f5cdbef8
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process kksmd (pid: 343, ti=f5cda000 task=f617b140 task.ti=f5cda000)
Stack:
 7fa12163 f000 c204efbc f50479e8 9eb7e000 c08a34d0 f5cdbf18 c041f07a
 f5cdbf28 c048339c  f5c271e0 f5cdbf30 c04833bc f5cdbfb0 c0483b0d
 f5cdbf50 c0425845  0064 0009 c08a34d0 f5cdbfb0 c06384c1
Call Trace:
 [c041f07a] ? kmap_atomic+0x13/0x15
 [c048339c] ? get_pte+0x50/0x63
 [c04833bc] ? is_present_pte+0xd/0x1f
 [c0483b0d] ? ksm_scan_start+0x9a/0x7ac
 [c0425845] ? finish_task_switch+0x29/0xa4
 [c06384c1] ? schedule+0x6bf/0x719
 [c041b3fc] ? default_spin_lock_flags+0x8/0xc
 [c043bffa] ? finish_wait+0x49/0x4e
 [c04845f4] ? kthread_ksm_scan_thread+0x0/0xdc
 [c048462e] ? kthread_ksm_scan_thread+0x3a/0xdc
 [c043bf31] ? autoremove_wake_function+0x0/0x38
 [c043be3e] ? kthread+0x40/0x66
 [c043bdfe] ? kthread+0x0/0x66
 [c0404997] ? kernel_thread_helper+0x7/0x10
Code: 86 00 00 00 64 a1 04 a0 82 c0 6b c0 0d 8d 3c 30 a1 78 b0 77 c0
8d 34 bd 00 00 00 00 89 45 ec a1 0c d0 84 c0 29 f0 83 38 00 74 04 0f
0b eb fe c1 ea 1a 8b 04 d5 80 32 8a c0 83 e0 fc 29 c3 c1 fb
EIP: [c041eff9] kmap_atomic_prot+0x7d/0xeb SS:ESP 0068:f5cdbef8
Kernel panic - not syncing: Fatal exception
== END of OOPS
  


diff --git a/mm/ksm.c b/mm/ksm.c
index 707be52..e14448a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -562,6 +562,7 @@ static pte_t *get_pte(struct mm_struct *mm, unsigned long 
addr)
goto out;
 
ptep = pte_offset_map(pmd, addr);
+   pte_unmap(ptep);
 out:
return ptep;
 }


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Izik Eidus

ציטוט Izik Eidus:

ציטוט Ryota OZAKI:

Hi Izik,

I've tried your patch set, but ksm doesn't work in my machine.

I compiled linux patched with the four patches and configured with KSM
and KVM enabled. After boot with the linux, I run two VMs running linux
using QEMU with a patch in your mail and started KSM scanner with your
script, then the host linux caused panic with the following oops.
  


Yes you are right, we are missing pte_unmap(pte); in get_pte()!
that will effect just 32bits with highmem so this why you see it
thanks for the reporting, i will fix it for v3

below patch should fix it (i cant test it now, will test it for v3)

can you report if it fix your problem? thanks


Thinking about what i just did, it is wrong,
this patch is the right one (still wasnt tested), but if you are going 
to apply something then use this one.


thanks
diff --git a/mm/ksm.c b/mm/ksm.c
index 707be52..c842c29 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -569,14 +569,16 @@ out:
 static int is_present_pte(struct mm_struct *mm, unsigned long addr)
 {
pte_t *ptep;
+   int r;
 
ptep = get_pte(mm, addr);
if (!ptep)
return 0;
 
-   if (pte_present(*ptep))
-   return 1;
-   return 0;
+   r = pte_present(*ptep);
+   pte_unmap(ptep);
+
+   return r;
 }
 
 #define PAGEHASH_LEN 128
@@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm,
if (!orig_ptep)
goto out_unlock;
orig_pte = *orig_ptep;
+   pte_unmap(orig_ptep);
if (!pte_present(orig_pte))
goto out_unlock;
if (page_to_pfn(oldpage) != pte_pfn(orig_pte))


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Ryota OZAKI
2008/11/20 Izik Eidus [EMAIL PROTECTED]:
 ציטוט Izik Eidus:

 ציטוט Ryota OZAKI:

 Hi Izik,

 I've tried your patch set, but ksm doesn't work in my machine.

 I compiled linux patched with the four patches and configured with KSM
 and KVM enabled. After boot with the linux, I run two VMs running linux
 using QEMU with a patch in your mail and started KSM scanner with your
 script, then the host linux caused panic with the following oops.


 Yes you are right, we are missing pte_unmap(pte); in get_pte()!
 that will effect just 32bits with highmem so this why you see it
 thanks for the reporting, i will fix it for v3

 below patch should fix it (i cant test it now, will test it for v3)

 can you report if it fix your problem? thanks

 Thinking about what i just did, it is wrong,
 this patch is the right one (still wasnt tested), but if you are going to
 apply something then use this one.

Great! Applied the 2nd patch, ksm works with both HIGHMEM enabled and disabled.

Thanks for your quick response,
  ozaki-r


 thanks

 diff --git a/mm/ksm.c b/mm/ksm.c
 index 707be52..c842c29 100644
 --- a/mm/ksm.c
 +++ b/mm/ksm.c
 @@ -569,14 +569,16 @@ out:
  static int is_present_pte(struct mm_struct *mm, unsigned long addr)
  {
pte_t *ptep;
 +   int r;

ptep = get_pte(mm, addr);
if (!ptep)
return 0;

 -   if (pte_present(*ptep))
 -   return 1;
 -   return 0;
 +   r = pte_present(*ptep);
 +   pte_unmap(ptep);
 +
 +   return r;
  }

  #define PAGEHASH_LEN 128
 @@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm,
if (!orig_ptep)
goto out_unlock;
orig_pte = *orig_ptep;
 +   pte_unmap(orig_ptep);
if (!pte_present(orig_pte))
goto out_unlock;
if (page_to_pfn(oldpage) != pte_pfn(orig_pte))




[PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-16 Thread Izik Eidus
(From v1 to v2 the main change is much more documentation)

KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

This driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type.
(For desktop work loads we have achived more than x2 memory overcommit
(more like x3))

This driver have found users other than KVM, for example CERN,
Fons Rademakers:
on many-core machines we run one large detector simulation program per core.
These simulation programs are identical but run each in their own process and
need about 2 - 2.5 GB RAM.
We typically buy machines with 2GB RAM per core and so have a problem to run
one of these programs per core.
Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field
maps, detector geometry, etc.
Currently people have been trying to start one program, initialize the geometry
and field maps and then fork it N times, to have the data shared.
With KSM this would be done automatically by the system so it sounded extremely
attractive when Andrea presented it.

(We have are already started to test KSM on their systems...)

KSM can run as kernel thread or as userspace application or both

example for how to control the kernel thread:

#include stdio.h
#include stdlib.h
#include string.h
#include sys/types.h
#include sys/stat.h
#include sys/ioctl.h
#include fcntl.h
#include sys/mman.h
#include unistd.h
#include ksm.h

int main(int argc, char *argv[])
{
int fd;
int used = 0;
int fd_start;
struct ksm_kthread_info info;


if (argc  2) {
fprintf(stderr,
usage: %s {start npages sleep | stop | info}\n,
argv[0]);
exit(1);
}

fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
if (fd == -1) {
fprintf(stderr, could not open /dev/ksm\n);
exit(1);
}

if (!strncmp(argv[1], start, strlen(argv[1]))) {
used = 1;
if (argc  4) {
fprintf(stderr,
usage: %s start npages_to_scan max_pages_to_merge sleep\n,
argv[0]);
exit(1);
}
info.pages_to_scan = atoi(argv[2]);
info.max_pages_to_merge = atoi(argv[3]);
info.sleep = atoi(argv[4]);
info.flags = ksm_control_flags_run;

fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
if (fd_start == -1) {
fprintf(stderr, KSM_START_KTHREAD failed\n);
exit(1);
}
printf(created scanner\n);
}

if (!strncmp(argv[1], stop, strlen(argv[1]))) {
used = 1;
info.flags = 0;
fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
printf(stopped scanner\n);
}

if (!strncmp(argv[1], info, strlen(argv[1]))) {
used = 1;
ioctl(fd, KSM_GET_INFO_KTHREAD, info);
 printf(flags %d, pages_to_scan %d npages_merge %d, sleep_time %d\n,
 info.flags, info.pages_to_scan, info.max_pages_to_merge, info.sleep);
}

if (!used)
fprintf(stderr, unknown command %s\n, argv[1]);

return 0;
}

example of how to register qemu to ksm (or any userspace application)

diff --git a/qemu/vl.c b/qemu/vl.c
index 4721fdd..7785bf9 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -21,6 +21,7 @@
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN
  * THE SOFTWARE.
  */
+#include ksm.h
 #include hw/hw.h
 #include hw/boards.h
 #include hw/usb.h
@@ -5799,6 +5800,37 @@ static void termsig_setup(void)
 
 #endif
 
+int ksm_register_memory(void)
+{
+int fd;
+int ksm_fd;
+int r = 1;
+struct ksm_memory_region ksm_region;
+
+fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
+if (fd == -1)
+goto out;
+
+ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA);
+if (ksm_fd == -1)
+goto out_free;
+
+ksm_region.npages = phys_ram_size / TARGET_PAGE_SIZE;
+ksm_region.addr = phys_ram_base;
+r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region);
+if (r)
+goto out_free1;
+
+return r;
+
+out_free1:
+close(ksm_fd);
+out_free:
+close(fd);
+out:
+return r;
+}
+
 int main(int argc, char **argv)
 {
 #ifdef CONFIG_GDBSTUB
@@ -6735,6 +6767,8 @@ int main(int argc, char **argv)

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Andrew Morton
On Tue, 11 Nov 2008 15:21:37 +0200 Izik Eidus [EMAIL PROTECTED] wrote:

 KSM is a linux driver that allows dynamicly sharing identical memory pages
 between one or more processes.
 
 unlike tradtional page sharing that is made at the allocation of the
 memory, ksm do it dynamicly after the memory was created.
 Memory is periodically scanned; identical pages are identified and merged.
 the sharing is unnoticeable by the process that use this memory.
 (the shared pages are marked as readonly, and in case of write
 do_wp_page() take care to create new copy of the page)
 
 this driver is very useful for KVM as in cases of runing multiple guests
 operation system of the same type, many pages are sharable.
 this driver can be useful by OpenVZ as well.

These benefits should be quantified, please.  Also any benefits to any
other workloads should be identified and quantified.

The whole approach seems wrong to me.  The kernel lost track of these
pages and then we run around post-facto trying to fix that up again. 
Please explain (for the changelog) why the kernel cannot get this right
via the usual sharing, refcounting and COWing approaches.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Avi Kivity

Andrew Morton wrote:

The whole approach seems wrong to me.  The kernel lost track of these
pages and then we run around post-facto trying to fix that up again. 
Please explain (for the changelog) why the kernel cannot get this right

via the usual sharing, refcounting and COWing approaches.
  


For kvm, the kernel never knew those pages were shared.  They are loaded 
from independent (possibly compressed and encrypted) disk images.  These 
images are different; but some pages happen to be the same because they 
came from the same installation media.


For OpenVZ the situation is less clear, but if you allow users to 
independently upgrade their chroots you will eventually arrive at the 
same scenario (unless of course you apply the same merging strategy at 
the filesystem level).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus

Andrew Morton wrote:

On Tue, 11 Nov 2008 15:21:37 +0200 Izik Eidus [EMAIL PROTECTED] wrote:

  

KSM is a linux driver that allows dynamicly sharing identical memory pages
between one or more processes.

unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and merged.
the sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

this driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type, many pages are sharable.
this driver can be useful by OpenVZ as well.



These benefits should be quantified, please.  Also any benefits to any
other workloads should be identified and quantified.
  

Sure,
we have used KSM in production for about half year and the numbers that 
came from our QA is:
using KSM for desktop (KSM was tested just for windows desktop workload) 
you can run as many as
52 windows xp with 1 giga ram each on server with just 16giga ram. (this 
is more than 300% overcommit)
the reason is that most of the kernel/dlls of this guests is shared and 
in addition we are sharing the windows zero
(windows keep making all its free memory as zero, so every time windows 
release memory we take the page back to the host)

there is slide that give this numbers you can find at:
http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFiledo=gettarget=kdf2008_3.pdf 
(slide 27)

beside more i gave presentation about ksm that can be found at:
http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFiledo=gettarget=kdf2008_12.pdf

if more numbers are wanted for other workloads i can test it.
(the idea of ksm is to run it slowly slowy at low priority and let it 
merge pages when no one need the cpu)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus

Avi Kivity wrote:

Andrew Morton wrote:

The whole approach seems wrong to me.  The kernel lost track of these
pages and then we run around post-facto trying to fix that up again. 
Please explain (for the changelog) why the kernel cannot get this right

via the usual sharing, refcounting and COWing approaches.
  


For kvm, the kernel never knew those pages were shared.  They are 
loaded from independent (possibly compressed and encrypted) disk 
images.  These images are different; but some pages happen to be the 
same because they came from the same installation media.


As Avi said, in kvm we cannot know how the guest is going to map its 
pages, we have nothing to do but to scan for the identical pages
(you can have pages that are shared that are in whole different offset 
inside the guest)




For OpenVZ the situation is less clear, but if you allow users to 
independently upgrade their chroots you will eventually arrive at the 
same scenario (unless of course you apply the same merging strategy at 
the filesystem level).




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Andrew Morton
On Tue, 11 Nov 2008 20:48:16 +0200
Avi Kivity [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
  The whole approach seems wrong to me.  The kernel lost track of these
  pages and then we run around post-facto trying to fix that up again. 
  Please explain (for the changelog) why the kernel cannot get this right
  via the usual sharing, refcounting and COWing approaches.

 
 For kvm, the kernel never knew those pages were shared.  They are loaded 
 from independent (possibly compressed and encrypted) disk images.  These 
 images are different; but some pages happen to be the same because they 
 came from the same installation media.

What userspace-only changes could fix this?  Identify the common data,
write it to a flat file and mmap it, something like that?

 For OpenVZ the situation is less clear, but if you allow users to 
 independently upgrade their chroots you will eventually arrive at the 
 same scenario (unless of course you apply the same merging strategy at 
 the filesystem level).

hm.

There has been the occasional discussion about idenfifying all-zeroes
pages and scavenging them, repointing them at the zero page.  Could
this infrastructure be used for that?  (And how much would we gain from
it?)

[I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm
thing here ;) ]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus

Andrew Morton wrote:

On Tue, 11 Nov 2008 20:48:16 +0200
Avi Kivity [EMAIL PROTECTED] wrote:

  

Andrew Morton wrote:


The whole approach seems wrong to me.  The kernel lost track of these
pages and then we run around post-facto trying to fix that up again. 
Please explain (for the changelog) why the kernel cannot get this right

via the usual sharing, refcounting and COWing approaches.
  
  
For kvm, the kernel never knew those pages were shared.  They are loaded 
from independent (possibly compressed and encrypted) disk images.  These 
images are different; but some pages happen to be the same because they 
came from the same installation media.



What userspace-only changes could fix this?  Identify the common data,
write it to a flat file and mmap it, something like that?

  
For OpenVZ the situation is less clear, but if you allow users to 
independently upgrade their chroots you will eventually arrive at the 
same scenario (unless of course you apply the same merging strategy at 
the filesystem level).



hm.

There has been the occasional discussion about idenfifying all-zeroes
pages and scavenging them, repointing them at the zero page.  Could
this infrastructure be used for that?  (And how much would we gain from
it?)

[I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm
thing here ;) ]
KSM is separate driver , it doesn't change anything in the VM but adding 
two helper functions.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Andrew Morton
On Tue, 11 Nov 2008 21:07:10 +0200
Izik Eidus [EMAIL PROTECTED] wrote:

 we have used KSM in production for about half year and the numbers that 
 came from our QA is:
 using KSM for desktop (KSM was tested just for windows desktop workload) 
 you can run as many as
 52 windows xp with 1 giga ram each on server with just 16giga ram. (this 
 is more than 300% overcommit)
 the reason is that most of the kernel/dlls of this guests is shared and 
 in addition we are sharing the windows zero
 (windows keep making all its free memory as zero, so every time windows 
 release memory we take the page back to the host)
 there is slide that give this numbers you can find at:
 http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFiledo=gettarget=kdf2008_3.pdf
  
 (slide 27)
 beside more i gave presentation about ksm that can be found at:
 http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFiledo=gettarget=kdf2008_12.pdf

OK, 300% isn't chicken feed.

It is quite important that information such as this be prepared, added to
the patch changelogs and maintained.  For a start, without this basic
information, there is no reason for anyone to look at any of the code!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Avi Kivity

Andrew Morton wrote:
For kvm, the kernel never knew those pages were shared.  They are loaded 
from independent (possibly compressed and encrypted) disk images.  These 
images are different; but some pages happen to be the same because they 
came from the same installation media.



What userspace-only changes could fix this?  Identify the common data,
write it to a flat file and mmap it, something like that?

  


This was considered.  You can't scan the image, because it may be 
encrypted/compressed/offset (typical images _are_ offset because the 
first partition starts at sector 63...).  The data may come from the 
network and not a disk image.  You can't scan in userspace because the 
images belong to different users and contain sensitive data.  Pages may 
come from several images (multiple disk images per guest) so you end up 
with one vma per page.


So you have to scan memory, after the guest has retrieved it from 
disk/network/manufactured it somehow, decompressed and encrypted it, 
written it to the offset it wants.  You can't scan from userspace since 
it's sensitive data, and of course the actual merging need to be done 
atomically, which can only be done from the holy of holies, the vm.


For OpenVZ the situation is less clear, but if you allow users to 
independently upgrade their chroots you will eventually arrive at the 
same scenario (unless of course you apply the same merging strategy at 
the filesystem level).



hm.

There has been the occasional discussion about idenfifying all-zeroes
pages and scavenging them, repointing them at the zero page.  Could
this infrastructure be used for that?  


Yes, trivially.  ksm may be an overkill for this, though.


(And how much would we gain from
it?)
  


A lot of zeros.


[I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm
thing here ;) ]
  


I sympathize -- us too.  Consider the typical multiuser gnome 
minicomputer with all 150 users reading lwn.net at the same time instead 
of working.  You could share the firefox rendered page cache, reducing 
memory utilization drastically.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus
KSM is a linux driver that allows dynamicly sharing identical memory pages
between one or more processes.

unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and merged.
the sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

this driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type, many pages are sharable.
this driver can be useful by OpenVZ as well.

KSM right now scan just memory that was registered to used by it, it
does not
scan the whole system memory (this can be changed, but the changes to
find
identical pages in normal linux system that doesnt run multiple guests)

KSM can run as kernel thread or as userspace application (or both (it is
allowed to run more than one scanner in a time)).

example for how to control the kernel thread:


ksmctl.c

#include stdio.h
#include stdlib.h
#include string.h
#include sys/types.h
#include sys/stat.h
#include sys/ioctl.h
#include fcntl.h
#include sys/mman.h
#include unistd.h
#include ksm.h

int main(int argc, char *argv[])
{
int fd;
int used = 0;
int fd_start;
struct ksm_kthread_info info;


if (argc  2) {
fprintf(stderr, usage: %s {start npages sleep | stop |
info}\n, argv[0]);
exit(1);
}

fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
if (fd == -1) {
fprintf(stderr, could not open /dev/ksm\n);
exit(1);
}

if (!strncmp(argv[1], start, strlen(argv[1]))) {
used = 1;
if (argc  5) {
fprintf(stderr, usage: %s start npages_to_scan,
argv[0]);
fprintf(stderr, npages_max_merge sleep\n);
exit(1);
}
info.pages_to_scan = atoi(argv[2]);
info.max_pages_to_merge = atoi(argv[3]);
info.sleep = atoi(argv[4]);
info.running = 1;

fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
if (fd_start == -1) {
fprintf(stderr, KSM_START_KTHREAD failed\n);
exit(1);
}
printf(created scanner\n);
}

if (!strncmp(argv[1], stop, strlen(argv[1]))) {
used = 1;
info.running = 0;
fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
if (fd_start == -1) {
fprintf(stderr, KSM_START_STOP_KTHREAD failed\n);
exit(1);
}
printf(stopped scanner\n);
}

if (!strncmp(argv[1], info, strlen(argv[1]))) {
used = 1;
fd_start = ioctl(fd, KSM_GET_INFO_KTHREAD, info);
if (fd_start == -1) {
fprintf(stderr, KSM_GET_INFO_KTHREAD failed\n);
exit(1);
}
printf(running %d, pages_to_scan %d pages_max_merge %d,
info.running, info.pages_to_scan,
info.max_pages_to_merge);
printf(sleep_time %d\n, info.sleep);
}

if (!used)
fprintf(stderr, unknown command %s\n, argv[1]);

return 0;
}


example of how to register qemu to ksm (or any userspace application)

diff --git a/qemu/vl.c b/qemu/vl.c
index 4721fdd..7785bf9 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -21,6 +21,7 @@
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN
  * THE SOFTWARE.
  */
+#include ksm.h
 #include hw/hw.h
 #include hw/boards.h
 #include hw/usb.h
@@ -5799,6 +5800,37 @@ static void termsig_setup(void)
 
 #endif
 
+int ksm_register_memory(void)
+{
+int fd;
+int ksm_fd;
+int r = 1;
+struct ksm_memory_region ksm_region;
+
+fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
+if (fd == -1)
+goto out;
+
+ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA);
+if (ksm_fd == -1)
+goto out_free;
+
+ksm_region.npages = phys_ram_size / TARGET_PAGE_SIZE;
+ksm_region.addr = phys_ram_base;
+r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region);
+if (r)
+goto out_free1;
+
+return r;
+
+out_free1:
+close(ksm_fd);
+out_free:
+close(fd);
+out:
+return r;
+}
+
 int main(int argc, char **argv)
 {
 #ifdef CONFIG_GDBSTUB
@@ -6735,6 +6767,8 @@ int main(int argc, char **argv)
 /* init the dynamic translator */
 cpu_exec_init_all(tb_size * 1024 * 1024);
 
+ksm_register_memory();
+
 bdrv_init();
 
 /* we always create the cdrom drive, even