Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
On Friday 17 April 2009 17:08:07 Jared Hulbert wrote: As everyone knows, my favourite thing is to say nasty things about any new feature that adds complexity to common code. I feel like crying to hear about how many more instances of MS Office we can all run, if only we apply this patch. And the poorly written HPC app just sounds like scrapings from the bottom of justification barrel. I'm sorry, maybe I'm way off with my understanding of how important this is. There isn't too much help in the changelog. A discussion of where the memory savings comes from, and how far does things like sharing of fs image, or ballooning goes and how much extra savings we get from this... with people from other hypervisors involved as well. Have I missed this kind of discussion? Nick, I don't know about other hypervisors, fs and balloonings, but I have tried this out. It works. It works on apps I don't consider, poorly written. I'm very excited about this. I got 10% saving in a roughly off the shelf embedded system. No user noticeable performance impact. OK well that's what I want to hear. Thanks, that means a lot to me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
As everyone knows, my favourite thing is to say nasty things about any new feature that adds complexity to common code. I feel like crying to hear about how many more instances of MS Office we can all run, if only we apply this patch. And the poorly written HPC app just sounds like scrapings from the bottom of justification barrel. I'm sorry, maybe I'm way off with my understanding of how important this is. There isn't too much help in the changelog. A discussion of where the memory savings comes from, and how far does things like sharing of fs image, or ballooning goes and how much extra savings we get from this... with people from other hypervisors involved as well. Have I missed this kind of discussion? Nick, I don't know about other hypervisors, fs and balloonings, but I have tried this out. It works. It works on apps I don't consider, poorly written. I'm very excited about this. I got 10% saving in a roughly off the shelf embedded system. No user noticeable performance impact. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
On Wednesday 15 April 2009 08:09:03 Andrew Morton wrote: On Thu, 9 Apr 2009 06:58:37 +0300 Izik Eidus iei...@redhat.com wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Generally looks OK to me. But that doesn't mean much. We should rub bottles with words like hugh and nick on them to be sure. I haven't looked too closely at it yet sorry. Hugh has a great eye for these details, though, hint hint :) As everyone knows, my favourite thing is to say nasty things about any new feature that adds complexity to common code. I feel like crying to hear about how many more instances of MS Office we can all run, if only we apply this patch. And the poorly written HPC app just sounds like scrapings from the bottom of justification barrel. I'm sorry, maybe I'm way off with my understanding of how important this is. There isn't too much help in the changelog. A discussion of where the memory savings comes from, and how far does things like sharing of fs image, or ballooning goes and how much extra savings we get from this... with people from other hypervisors involved as well. Have I missed this kind of discussion? Careful what you wish for, ay? :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
Nick Piggin wrote: On Wednesday 15 April 2009 08:09:03 Andrew Morton wrote: On Thu, 9 Apr 2009 06:58:37 +0300 Izik Eidus iei...@redhat.com wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Generally looks OK to me. But that doesn't mean much. We should rub bottles with words like hugh and nick on them to be sure. I haven't looked too closely at it yet sorry. Hugh has a great eye for these details, though, hint hint :) As everyone knows, my favourite thing is to say nasty things about any new feature that adds complexity to common code. The whole idea and the way i wrote it so it wont touch common code, i didnt change the linux mm logic no where. The worst thing that we have add is helper functions. I feel like crying to hear about how many more instances of MS Office we can all run, if only we apply this patch. And more instances of linux guests... And the poorly written HPC app just sounds like scrapings from the bottom of justification barrel. So if you have a big rendering application that load gigas of geometrical data that is handled by many threads and you have a case that each thread sometimes change this geometrical data and you dont want the other threads will notice it. How would you share it in traditional way?, after one time shared data will get cowed, how will you recollect it again when it become identical? KSM do it for applications transparently KSM writing motivation indeed was KVM where there it is highly needed you may check what VMware say about the fact that they have much better overcommit than Hyper-V / XEN: http://blogs.vmware.com/virtualreality/2008/03/cheap-hyperviso.html It is important to understand that in virtualization enviorments there are cases where memory is much more critical than any other resource for higher density. Together with KSM, KVM will have the same memory overcommit abilitys such as VMware have. I'm sorry, maybe I'm way off with my understanding of how important this is. There isn't too much help in the changelog. A discussion of where the memory savings comes from, Memory saving come from identical librarys, identical kernels, zeroed pages - that is for virtualization. The Librarys code will always be identical among similar guests, so why have this code at multiple places on the host memory? and how far does things like sharing of fs image, or ballooning goes and how much extra savings we get from this... Ballooning is much worse when it come to performance, beacuse what it does is shrink the guest memory, with KSM we find identical pages and merge them into one page, so we dont get guest performance lose with people from other hypervisors involved as well. Have I missed this kind of discussion? Careful what you wish for, ay? :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
On Thu, 9 Apr 2009 06:58:37 +0300 Izik Eidus iei...@redhat.com wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Generally looks OK to me. But that doesn't mean much. We should rub bottles with words like hugh and nick on them to be sure. ... include/linux/ksm.h | 48 ++ include/linux/miscdevice.h |1 + include/linux/mm.h |5 + include/linux/mmu_notifier.h | 34 + include/linux/rmap.h | 11 + mm/Kconfig |6 + mm/Makefile |1 + mm/ksm.c | 1674 ++ mm/memory.c | 90 +++- mm/mmu_notifier.c| 20 + mm/rmap.c| 139 And it's pretty unobtrusive for what it is. I expect we can get this into 2.6.31 unless there are some pratfalls which I missed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] ksm - dynamic page sharing driver for linux v3
From v2 to v3: 1)Remove unnessery check of is_dirty_pte() inside PageKsm() We have added the is_dirty_pte() chceck to protect against the reuse: case inside do_wp_page(). Andrea pointed to me that such condtion couldnt ever happen, du to the fact that if VM_SHARED is set no Anonymous page can be on the vma, therefore it is unpossible that such Page would become KsmPage and therefore KsmPages would never trigger the reuse case (Checkout From v1 to v2 for more info) 2)Add !vm_file check in addition to PageKsm() to check if sharedpage Until now Ksm was checking whatever Pages are sharedpages (KsmPage) by just running get_user_page() and then check if Page != AnonPage. The problem raise as Ksm keep virtual addresses inside its data strctures and if the user will free page and allocate new !AnonPage Page, Ksm might think this page is shared page. To solve this problem we have added an additional check for Ksm, We are checking whatever the vma-vm_file is set to NULL, in case we see a virtual address that its vma-vm_file is NULL and the page that it pointing into it isnt AnonPage we can safetly know that this is shared page (KsmPage). 3)Replace jhash() with jhash2() Andrey Panin pointed that we should use jhash2 as it faster than jhash(). Thanks. (Below is info from previous posts) From v1 to v2: 1)Fixed security issue found by Chris Wright: Ksm was checking if page is a shared page by running !PageAnon. Beacuse that Ksm scan only anonymous memory, all !PageAnons inside ksm data strctures are shared page, however there might be a case for do_wp_page() when the VM_SHARED is used where do_wp_page() would instead of copying the page into new anonymos page, would reuse the page, it was fixed by adding check for the dirty_bit of the virtual addresses pointing into the shared page. I was not finding any VM code tha would clear the dirty bit from this virtual address (due to the fact that we allocate the page using page_alloc() - kernel allocated pages), ~but i still want confirmation about this from the vm guys - thanks.~ 2)Moved to sysfs to control ksm: It was requested as a better way to control the ksm scanning thread than ioctls. the sysfs api: dir: /sys/kernel/mm/ksm/ kernel_pages_allocated - information about how many kernel pages ksm have allocated, this pages are not swappable, and each page like that is used by ksm to share pages with identical content pages_shared - how many pages were shared by ksm run - set to 1 when you want ksm to run, 0 when no max_kernel_pages - set the maximum amount of kernel pages to be allocated by ksm, set 0 for unlimited. pages_to_scan - how many pages to scan before ksm will sleep sleep - how much usecs ksm will sleep. 3)Add sysfs paramater to control the maximum kernel pages to be by ksm. 4)Add statistics about how much pages are really shared. One issue still to be discussed: There was a suggestion to use madvice(SHAREABLE) instead of using ioctls to register memory that need to be scanned by ksm. Such change is outside the area of ksm.c and would required adding new madvice api, and change some parts of the vm and the kernel code, so first thing to do, is realized if we really want this. I dont know any other open issues. Thanks. This is from the first post: (The kvm part, togather with the kvm-userspace part, was post with V1 before about a week, whoever want to test ksm may download the patch from lkml archive) KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) 2) Ksm will go again on the whole memory and will recalculate the checmsum of the pages, pages that are found to have the same checksum value, would be considered pages that are most likely wont changed Ksm will insert this pages into sorted by page content RB-tree that is called unstable tree, the reason that this tree is called unstable is due to the fact that the page contents might changed while they are still inside the tree, and therefore the tree would become corrupted. Due to this problem ksm take two more steps in addition to the checksum calculation: a) Ksm will throw
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote: From v1 to v2: 1)Fixed security issue found by Chris Wright: Ksm was checking if page is a shared page by running !PageAnon. Beacuse that Ksm scan only anonymous memory, all !PageAnons inside ksm data strctures are shared page, however there might be a case for do_wp_page() when the VM_SHARED is used where do_wp_page() would instead of copying the page into new anonymos page, would reuse the page, it was fixed by adding check for the dirty_bit of the virtual addresses pointing into the shared page. I was not finding any VM code tha would clear the dirty bit from this virtual address (due to the fact that we allocate the page using page_alloc() - kernel allocated pages), ~but i still want confirmation about this from the vm guys - thanks.~ As far as I can tell this wasn't a bug and this change is unnecessary. I already checked this bit but I may have missed something, so I ask here to be sure. As far as I can tell when VM_SHARED is set, no anonymous page can ever be allocated by in that vma range, hence no KSM page can ever be generated in that vma either. MAP_SHARED|MAP_ANONYMOUS is only a different API for /dev/shm, IPCSHM backing, no anonymous pages can live there. It surely worked like that in older 2.6, reading latest code it seems to still work like that, but if something has changed Hugh will surely correct me in a jiffy ;). I still see this in the file=null path. } else if (vm_flags VM_SHARED) { error = shmem_zero_setup(vma); if (error) goto free_vma; } So you can revert your change for now. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
Hi Izik, Is there some user documentation available? (apart from RTFS?:)) I've compiled kernel with v2 of Your patches, loaded ksm module, did echo 1 /proc/sys/kernel/mm/ksm/run, but I think it didn't do anything, at least no pages were collected.. Could You advise me a bit? thanks a lot in advance... I can't wait to try it on our hosts runing 50-60 KVMs :) BR nik On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote: From v1 to v2: 1)Fixed security issue found by Chris Wright: Ksm was checking if page is a shared page by running !PageAnon. Beacuse that Ksm scan only anonymous memory, all !PageAnons inside ksm data strctures are shared page, however there might be a case for do_wp_page() when the VM_SHARED is used where do_wp_page() would instead of copying the page into new anonymos page, would reuse the page, it was fixed by adding check for the dirty_bit of the virtual addresses pointing into the shared page. I was not finding any VM code tha would clear the dirty bit from this virtual address (due to the fact that we allocate the page using page_alloc() - kernel allocated pages), ~but i still want confirmation about this from the vm guys - thanks.~ 2)Moved to sysfs to control ksm: It was requested as a better way to control the ksm scanning thread than ioctls. the sysfs api: dir: /sys/kernel/mm/ksm/ kernel_pages_allocated - information about how many kernel pages ksm have allocated, this pages are not swappable, and each page like that is used by ksm to share pages with identical content pages_shared - how many pages were shared by ksm run - set to 1 when you want ksm to run, 0 when no max_kernel_pages - set the maximum amount of kernel pages to be allocated by ksm, set 0 for unlimited. pages_to_scan - how many pages to scan before ksm will sleep sleep - how much usecs ksm will sleep. 3)Add sysfs paramater to control the maximum kernel pages to be by ksm. 4)Add statistics about how much pages are really shared. One issue still to be discussed: There was a suggestion to use madvice(SHAREABLE) instead of using ioctls to register memory that need to be scanned by ksm. Such change is outside the area of ksm.c and would required adding new madvice api, and change some parts of the vm and the kernel code, so first thing to do, is realized if we really want this. I dont know any other open issues. Thanks. This is from the first post: (The kvm part, togather with the kvm-userspace part, was post with V1 before about a week, whoever want to test ksm may download the patch from lkml archive) KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) 2) Ksm will go again on the whole memory and will recalculate the checmsum of the pages, pages that are found to have the same checksum value, would be considered pages that are most likely wont changed Ksm will insert this pages into sorted by page content RB-tree that is called unstable tree, the reason that this tree is called unstable is due to the fact that the page contents might changed while they are still inside the tree, and therefore the tree would become corrupted. Due to this problem ksm take two more steps in addition to the checksum calculation: a) Ksm will throw and recreate the entire unstable tree each round of memory scanning - so if we have corruption, it will be fixed when we will rebuild the tree. b) Ksm is using RB-tree, that its balancing is made by the node color and not by the content, so even if the page get corrupted, it still would take the same amount of time to search on it. 3) In addition to the unstable tree, ksm hold another tree that is called stable tree - this tree is RB-tree that is sorted by the pages content and all its pages are write protected, and therefore it cant get corrupted. Each time ksm will find two identcial pages using the unstable tree, it will create new write-protected shared page, and this page will be inserted into the stable tree, and would be saved there, the stable tree, unlike the
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
On Mon, Apr 06, 2009 at 05:04:49PM +1000, Nick Piggin wrote: They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc. Presumably they will probably want to control it to interleave it over all numa nodes and use hugepages for it. It would be very little work. I thought it's the intermediate result of the computations that leads to lots of equal data too, in which case ksm is the only way to share it all. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] ksm - dynamic page sharing driver for linux v2
From v1 to v2: 1)Fixed security issue found by Chris Wright: Ksm was checking if page is a shared page by running !PageAnon. Beacuse that Ksm scan only anonymous memory, all !PageAnons inside ksm data strctures are shared page, however there might be a case for do_wp_page() when the VM_SHARED is used where do_wp_page() would instead of copying the page into new anonymos page, would reuse the page, it was fixed by adding check for the dirty_bit of the virtual addresses pointing into the shared page. I was not finding any VM code tha would clear the dirty bit from this virtual address (due to the fact that we allocate the page using page_alloc() - kernel allocated pages), ~but i still want confirmation about this from the vm guys - thanks.~ 2)Moved to sysfs to control ksm: It was requested as a better way to control the ksm scanning thread than ioctls. the sysfs api: dir: /sys/kernel/mm/ksm/ kernel_pages_allocated - information about how many kernel pages ksm have allocated, this pages are not swappable, and each page like that is used by ksm to share pages with identical content pages_shared - how many pages were shared by ksm run - set to 1 when you want ksm to run, 0 when no max_kernel_pages - set the maximum amount of kernel pages to be allocated by ksm, set 0 for unlimited. pages_to_scan - how many pages to scan before ksm will sleep sleep - how much usecs ksm will sleep. 3)Add sysfs paramater to control the maximum kernel pages to be by ksm. 4)Add statistics about how much pages are really shared. One issue still to be discussed: There was a suggestion to use madvice(SHAREABLE) instead of using ioctls to register memory that need to be scanned by ksm. Such change is outside the area of ksm.c and would required adding new madvice api, and change some parts of the vm and the kernel code, so first thing to do, is realized if we really want this. I dont know any other open issues. Thanks. This is from the first post: (The kvm part, togather with the kvm-userspace part, was post with V1 before about a week, whoever want to test ksm may download the patch from lkml archive) KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) 2) Ksm will go again on the whole memory and will recalculate the checmsum of the pages, pages that are found to have the same checksum value, would be considered pages that are most likely wont changed Ksm will insert this pages into sorted by page content RB-tree that is called unstable tree, the reason that this tree is called unstable is due to the fact that the page contents might changed while they are still inside the tree, and therefore the tree would become corrupted. Due to this problem ksm take two more steps in addition to the checksum calculation: a) Ksm will throw and recreate the entire unstable tree each round of memory scanning - so if we have corruption, it will be fixed when we will rebuild the tree. b) Ksm is using RB-tree, that its balancing is made by the node color and not by the content, so even if the page get corrupted, it still would take the same amount of time to search on it. 3) In addition to the unstable tree, ksm hold another tree that is called stable tree - this tree is RB-tree that is sorted by the pages content and all its pages are write protected, and therefore it cant get corrupted. Each time ksm will find two identcial pages using the unstable tree, it will create new write-protected shared page, and this page will be inserted into the stable tree, and would be saved there, the stable tree, unlike the unstable tree, is never throwen away, so each page that we find would be saved inside it. Taking into account the three levels that described above, the algorithm work like that: search primary tree (sorted by entire page contents, pages write protected) - if match found, merge - if no match found... - search secondary tree (sorted by entire page contents, pages not write protected) - if match found, merge - remove from secondary tree and insert merged page into primary tree - if no match found... -
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Hi, On Tue, 31 Mar 2009, Izik Eidus wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) One question; Calcolating a checksum is a fine way to find pages that are likely to be identical, but there is no guarantee that two pages with the same checksum really are identical - there *will* be checksum collisions eventually. So, I really hope that your implementation actually checks that two pages that it find that have identical checksums really are 100% identical by comparing them bit by bit before throwing one away. If you rely only on a checksum then eventually a user will get bitten by a checksum collision and, in the best case, something will crash, and in the worst case, data will silently be corrupted. Do you rely only on the checksum or do you actually compare pages to check they are 100% identical before sharing? I must admit that I have not read through the patch to find the answer, I just read your description and became concerned. -- Jesper Juhl j...@chaosbits.net http://www.chaosbits.net/ Plain text mails only, please http://www.expita.com/nomime.html Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
* Jesper Juhl (j...@chaosbits.net) wrote: Do you rely only on the checksum or do you actually compare pages to check they are 100% identical before sharing? Checksum has absolutely nothing to do w/ finding if two pages match. It's only used as a heuristic to suggest whether a single page has changed. If that page is changing we won't bother trying to find a match for it. Here's an example of the life of a page w.r.t checksum. 1. checksum = uninitialized 2. first time page is found, checksum it (checksum = A). if checksum has changed (uninitialize != A) don't go any further w/ that page 3. next time page is found, checksum it (checksum = B). if checksum has change (A != B) don't go any further w/ that page 4. next time page is found, checksum it (checksum = B). if checksum has changed (B == B)...it hasn't, continue processing the page later if a match is found in the tree (which is sorted by _contents_, i.e. memcmp) we'll attempt to merge the pages which at it's very core does: if (pages_identical(oldpage, newpage)) ret = replace_page(vma, oldpage, newpage, orig_pte, newprot); pages_identical? you guessed it...just does: r = memcmp(addr1, addr2, PAGE_SIZE) thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Jesper Juhl wrote: Hi, On Tue, 31 Mar 2009, Izik Eidus wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) One question; Calcolating a checksum is a fine way to find pages that are likely to be identical I dont use checksum as with hash table, the checksum doesnt use to find identical pages by the way that they have similer data... the checksum is used to let me know that the page was not changed for a while and it is worth checking for identical pages to it... In the future we will want to use the page table dirty bit for it, as taking checksum is somewhat expensive , but there is no guarantee that two pages with the same checksum really are identical - there *will* be checksum collisions eventually. So, I really hope that your implementation actually checks that two pages that it find that have identical checksums really are 100% identical by comparing them bit by bit before throwing one away. We do that :-) If you rely only on a checksum then eventually a user will get bitten by a checksum collision and, in the best case, something will crash, and in the worst case, data will silently be corrupted. Do you rely only on the checksum or do you actually compare pages to check they are 100% identical before sharing? I do 100% compare to the pages before i share them. I must admit that I have not read through the patch to find the answer, I just read your description and became concerned. Dont worry, me neither :-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Thu, 2 Apr 2009, Chris Wright wrote: * Jesper Juhl (j...@chaosbits.net) wrote: Do you rely only on the checksum or do you actually compare pages to check they are 100% identical before sharing? Checksum has absolutely nothing to do w/ finding if two pages match. It's only used as a heuristic to suggest whether a single page has changed. If that page is changing we won't bother trying to find a match for it. Here's an example of the life of a page w.r.t checksum. 1. checksum = uninitialized 2. first time page is found, checksum it (checksum = A). if checksum has changed (uninitialize != A) don't go any further w/ that page 3. next time page is found, checksum it (checksum = B). if checksum has change (A != B) don't go any further w/ that page 4. next time page is found, checksum it (checksum = B). if checksum has changed (B == B)...it hasn't, continue processing the page later if a match is found in the tree (which is sorted by _contents_, i.e. memcmp) we'll attempt to merge the pages which at it's very core does: if (pages_identical(oldpage, newpage)) ret = replace_page(vma, oldpage, newpage, orig_pte, newprot); pages_identical? you guessed it...just does: r = memcmp(addr1, addr2, PAGE_SIZE) Thank you for that explanation, it set my mind at ease :-) -- Jesper Juhl j...@chaosbits.net http://www.chaosbits.net/ Plain text mails only, please http://www.expita.com/nomime.html Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Anthony Liguori wrote: Izik Eidus wrote: I am sending another seires of patchs for kvm kernel and kvm-userspace that would allow users of kvm to test ksm with it. The kvm patchs would apply to Avi git tree. Any reason to not take these through upstream QEMU instead of kvm-userspace? In principle, I don't see anything that would prevent normal QEMU from almost making use of this functionality. That would make it one less thing to eventually have to merge... The changes for the kvm-userspace were just provided for testing it... After we will have ksm inside the kernel we will send another patch to qemu-devel that will add support for it. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] ksm - dynamic page sharing driver for linux
KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) 2) Ksm will go again on the whole memory and will recalculate the checmsum of the pages, pages that are found to have the same checksum value, would be considered pages that are most likely wont changed Ksm will insert this pages into sorted by page content RB-tree that is called unstable tree, the reason that this tree is called unstable is due to the fact that the page contents might changed while they are still inside the tree, and therefore the tree would become corrupted. Due to this problem ksm take two more steps in addition to the checksum calculation: a) Ksm will throw and recreate the entire unstable tree each round of memory scanning - so if we have corruption, it will be fixed when we will rebuild the tree. b) Ksm is using RB-tree, that its balancing is made by the node color and not by the content, so even if the page get corrupted, it still would take the same amount of time to search on it. 3) In addition to the unstable tree, ksm hold another tree that is called stable tree - this tree is RB-tree that is sorted by the pages content and all its pages are write protected, and therefore it cant get corrupted. Each time ksm will find two identcial pages using the unstable tree, it will create new write-protected shared page, and this page will be inserted into the stable tree, and would be saved there, the stable tree, unlike the unstable tree, is never throwen away, so each page that we find would be saved inside it. Taking into account the three levels that described above, the algorithm work like that: search primary tree (sorted by entire page contents, pages write protected) - if match found, merge - if no match found... - search secondary tree (sorted by entire page contents, pages not write protected) - if match found, merge - remove from secondary tree and insert merged page into primary tree - if no match found... - checksum - if checksum hasn't changed - insert into secondary tree - if it has, store updated checksum (note: first time this page is handled it won't have a checksum, so checksum will appear as changed, so it takes two passes w/ no other matches to get into secondary tree) - do not insert into any tree, will see it again on next pass The basic idea of this algorithm, is that even if the unstable tree doesnt promise to us to find two identical pages in the first round, we would probably find them in the second or the third or the tenth round, then after we have found this two identical pages only once, we will insert them into the stable tree, and then they would be protected there forever. So the all idea of the unstable tree, is just to build the stable tree and then we will find the identical pages using it. The current implemantion can be improved alot: we dont have to calculate exspensive checksum, we can just use the host dirty bit. currently we dont support shared pages swapping (other pages that are not shared can be swapped (all the pages that we didnt find to be identical to other pages...). Walking on the tree, we keep call to get_user_pages(), we can optimized it by saving the pfn, and using mmu notifiers to know when the virtual address mapping was changed. We currently scan just programs that were registred to be used by ksm, we would later want to add the abilaty to tell ksm to scan PIDS (so you can scan closed binary applications as well). Right now ksm scanning is made by just one thread, multiple scanners support might would be needed. This driver is very useful for KVM as in cases of runing multiple guests operation system of the same type. (For desktop work loads we have achived more than x2 memory overcommit (more like x3)) This driver have found users other than KVM, for example CERN, Fons Rademakers: on many-core machines we run one large detector simulation program per core. These simulation programs are identical but run each in their own process and need about 2 - 2.5 GB RAM. We typically buy machines with 2GB RAM per core and so have a problem to run one of these programs per core. Of
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Izik Eidus wrote: I am sending another seires of patchs for kvm kernel and kvm-userspace that would allow users of kvm to test ksm with it. The kvm patchs would apply to Avi git tree. Any reason to not take these through upstream QEMU instead of kvm-userspace? In principle, I don't see anything that would prevent normal QEMU from almost making use of this functionality. That would make it one less thing to eventually have to merge... Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
Izik Eidus [EMAIL PROTECTED] writes: (From v1 to v2 the main change is much more documentation) KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) This driver is very useful for KVM as in cases of runing multiple guests operation system of the same type. Hi Izik, approach that was used in the driver commonly known as content based search. Where are several variants of it most commons are: 1: with guest TM support 2: w/o guest vm support. You have implemented second one, but seems it already was patented http://www.google.com/patents?vid=USPAT6789156 I'm not a lawyer but IMHO we have direct conflict here. From other point of view they have patented the WEEL, but at least we have to know about this. (For desktop work loads we have achived more than x2 memory overcommit (more like x3)) This driver have found users other than KVM, for example CERN, Fons Rademakers: on many-core machines we run one large detector simulation program per core. These simulation programs are identical but run each in their own process and need about 2 - 2.5 GB RAM. We typically buy machines with 2GB RAM per core and so have a problem to run one of these programs per core. Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field maps, detector geometry, etc. Currently people have been trying to start one program, initialize the geometry and field maps and then fork it N times, to have the data shared. With KSM this would be done automatically by the system so it sounded extremely attractive when Andrea presented it. (We have are already started to test KSM on their systems...) KSM can run as kernel thread or as userspace application or both example for how to control the kernel thread: #include stdio.h #include stdlib.h #include string.h #include sys/types.h #include sys/stat.h #include sys/ioctl.h #include fcntl.h #include sys/mman.h #include unistd.h #include ksm.h int main(int argc, char *argv[]) { int fd; int used = 0; int fd_start; struct ksm_kthread_info info; if (argc 2) { fprintf(stderr, usage: %s {start npages sleep | stop | info}\n, argv[0]); exit(1); } fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600); if (fd == -1) { fprintf(stderr, could not open /dev/ksm\n); exit(1); } if (!strncmp(argv[1], start, strlen(argv[1]))) { used = 1; if (argc 4) { fprintf(stderr, usage: %s start npages_to_scan max_pages_to_merge sleep\n, argv[0]); exit(1); } info.pages_to_scan = atoi(argv[2]); info.max_pages_to_merge = atoi(argv[3]); info.sleep = atoi(argv[4]); info.flags = ksm_control_flags_run; fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info); if (fd_start == -1) { fprintf(stderr, KSM_START_KTHREAD failed\n); exit(1); } printf(created scanner\n); } if (!strncmp(argv[1], stop, strlen(argv[1]))) { used = 1; info.flags = 0; fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info); printf(stopped scanner\n); } if (!strncmp(argv[1], info, strlen(argv[1]))) { used = 1; ioctl(fd, KSM_GET_INFO_KTHREAD, info); printf(flags %d, pages_to_scan %d npages_merge %d, sleep_time %d\n, info.flags, info.pages_to_scan, info.max_pages_to_merge, info.sleep); } if (!used) fprintf(stderr, unknown command %s\n, argv[1]); return 0; } example of how to register qemu to ksm (or any userspace application) diff --git a/qemu/vl.c b/qemu/vl.c index 4721fdd..7785bf9 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -21,6 +21,7 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER * DEALINGS IN * THE SOFTWARE. */ +#include ksm.h #include hw/hw.h #include hw/boards.h #include hw/usb.h @@ -5799,6 +5800,37 @@ static void termsig_setup(void) #endif +int ksm_register_memory(void) +{ +int fd; +int ksm_fd; +int r = 1; +struct ksm_memory_region ksm_region; + +fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600); +if (fd == -1) +goto out; + +
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
You have implemented second one, but seems it already was patented http://www.google.com/patents?vid=USPAT6789156 I'm not a lawyer but IMHO we have direct conflict here. From other point of view they have patented the WEEL, but at least we have to know about this. Its an old idea and appeared for Linux in March 1998: Little project from Philipp Reisner called mergemem. http://groups.google.com/group/muc.lists.linux-kernel/browse_thread/thread/387af278089c7066?ie=utf-8oe=utf-8q=share+identical+pages#b3d4f68fb5dd4f88 so if there is a patent which is relevant (and thats a question for lawyers and legal patent search people) perhaps the Linux Foundation and some of the patent busters could take a look at mergemem and re-examination. Alan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
ציטוט Ryota OZAKI: Hi Izik, I've tried your patch set, but ksm doesn't work in my machine. I compiled linux patched with the four patches and configured with KSM and KVM enabled. After boot with the linux, I run two VMs running linux using QEMU with a patch in your mail and started KSM scanner with your script, then the host linux caused panic with the following oops. Yes you are right, we are missing pte_unmap(pte); in get_pte()! that will effect just 32bits with highmem so this why you see it thanks for the reporting, i will fix it for v3 below patch should fix it (i cant test it now, will test it for v3) can you report if it fix your problem? thanks == BEGINNING of OOPS kernel BUG at arch/x86/mm/highmem_32.c:87! invalid opcode: [#1] SMP last sysfs file: /sys/class/net/vnet-ssh2/address Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: netconsole autofs4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables loop kvm_intel kvm iTCO_wdt iTCO_vendor_support igb netxen_nic button ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode] Pid: 343, comm: kksmd Not tainted (2.6.28-rc5-linus-head-20081119-sparsemem #1) X7DWA EIP: 0060:[c041eff9] EFLAGS: 00010206 CPU: 6 EIP is at kmap_atomic_prot+0x7d/0xeb EAX: c0008d94 EBX: c1ff6240 ECX: 0163 EDX: 7e00 ESI: 0154 EDI: 0055 EBP: f5cdbf10 ESP: f5cdbef8 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Process kksmd (pid: 343, ti=f5cda000 task=f617b140 task.ti=f5cda000) Stack: 7fa12163 f000 c204efbc f50479e8 9eb7e000 c08a34d0 f5cdbf18 c041f07a f5cdbf28 c048339c f5c271e0 f5cdbf30 c04833bc f5cdbfb0 c0483b0d f5cdbf50 c0425845 0064 0009 c08a34d0 f5cdbfb0 c06384c1 Call Trace: [c041f07a] ? kmap_atomic+0x13/0x15 [c048339c] ? get_pte+0x50/0x63 [c04833bc] ? is_present_pte+0xd/0x1f [c0483b0d] ? ksm_scan_start+0x9a/0x7ac [c0425845] ? finish_task_switch+0x29/0xa4 [c06384c1] ? schedule+0x6bf/0x719 [c041b3fc] ? default_spin_lock_flags+0x8/0xc [c043bffa] ? finish_wait+0x49/0x4e [c04845f4] ? kthread_ksm_scan_thread+0x0/0xdc [c048462e] ? kthread_ksm_scan_thread+0x3a/0xdc [c043bf31] ? autoremove_wake_function+0x0/0x38 [c043be3e] ? kthread+0x40/0x66 [c043bdfe] ? kthread+0x0/0x66 [c0404997] ? kernel_thread_helper+0x7/0x10 Code: 86 00 00 00 64 a1 04 a0 82 c0 6b c0 0d 8d 3c 30 a1 78 b0 77 c0 8d 34 bd 00 00 00 00 89 45 ec a1 0c d0 84 c0 29 f0 83 38 00 74 04 0f 0b eb fe c1 ea 1a 8b 04 d5 80 32 8a c0 83 e0 fc 29 c3 c1 fb EIP: [c041eff9] kmap_atomic_prot+0x7d/0xeb SS:ESP 0068:f5cdbef8 Kernel panic - not syncing: Fatal exception == END of OOPS diff --git a/mm/ksm.c b/mm/ksm.c index 707be52..e14448a 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -562,6 +562,7 @@ static pte_t *get_pte(struct mm_struct *mm, unsigned long addr) goto out; ptep = pte_offset_map(pmd, addr); + pte_unmap(ptep); out: return ptep; }
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
ציטוט Izik Eidus: ציטוט Ryota OZAKI: Hi Izik, I've tried your patch set, but ksm doesn't work in my machine. I compiled linux patched with the four patches and configured with KSM and KVM enabled. After boot with the linux, I run two VMs running linux using QEMU with a patch in your mail and started KSM scanner with your script, then the host linux caused panic with the following oops. Yes you are right, we are missing pte_unmap(pte); in get_pte()! that will effect just 32bits with highmem so this why you see it thanks for the reporting, i will fix it for v3 below patch should fix it (i cant test it now, will test it for v3) can you report if it fix your problem? thanks Thinking about what i just did, it is wrong, this patch is the right one (still wasnt tested), but if you are going to apply something then use this one. thanks diff --git a/mm/ksm.c b/mm/ksm.c index 707be52..c842c29 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -569,14 +569,16 @@ out: static int is_present_pte(struct mm_struct *mm, unsigned long addr) { pte_t *ptep; + int r; ptep = get_pte(mm, addr); if (!ptep) return 0; - if (pte_present(*ptep)) - return 1; - return 0; + r = pte_present(*ptep); + pte_unmap(ptep); + + return r; } #define PAGEHASH_LEN 128 @@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm, if (!orig_ptep) goto out_unlock; orig_pte = *orig_ptep; + pte_unmap(orig_ptep); if (!pte_present(orig_pte)) goto out_unlock; if (page_to_pfn(oldpage) != pte_pfn(orig_pte))
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
2008/11/20 Izik Eidus [EMAIL PROTECTED]: ציטוט Izik Eidus: ציטוט Ryota OZAKI: Hi Izik, I've tried your patch set, but ksm doesn't work in my machine. I compiled linux patched with the four patches and configured with KSM and KVM enabled. After boot with the linux, I run two VMs running linux using QEMU with a patch in your mail and started KSM scanner with your script, then the host linux caused panic with the following oops. Yes you are right, we are missing pte_unmap(pte); in get_pte()! that will effect just 32bits with highmem so this why you see it thanks for the reporting, i will fix it for v3 below patch should fix it (i cant test it now, will test it for v3) can you report if it fix your problem? thanks Thinking about what i just did, it is wrong, this patch is the right one (still wasnt tested), but if you are going to apply something then use this one. Great! Applied the 2nd patch, ksm works with both HIGHMEM enabled and disabled. Thanks for your quick response, ozaki-r thanks diff --git a/mm/ksm.c b/mm/ksm.c index 707be52..c842c29 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -569,14 +569,16 @@ out: static int is_present_pte(struct mm_struct *mm, unsigned long addr) { pte_t *ptep; + int r; ptep = get_pte(mm, addr); if (!ptep) return 0; - if (pte_present(*ptep)) - return 1; - return 0; + r = pte_present(*ptep); + pte_unmap(ptep); + + return r; } #define PAGEHASH_LEN 128 @@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm, if (!orig_ptep) goto out_unlock; orig_pte = *orig_ptep; + pte_unmap(orig_ptep); if (!pte_present(orig_pte)) goto out_unlock; if (page_to_pfn(oldpage) != pte_pfn(orig_pte))
[PATCH 0/4] ksm - dynamic page sharing driver for linux v2
(From v1 to v2 the main change is much more documentation) KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) This driver is very useful for KVM as in cases of runing multiple guests operation system of the same type. (For desktop work loads we have achived more than x2 memory overcommit (more like x3)) This driver have found users other than KVM, for example CERN, Fons Rademakers: on many-core machines we run one large detector simulation program per core. These simulation programs are identical but run each in their own process and need about 2 - 2.5 GB RAM. We typically buy machines with 2GB RAM per core and so have a problem to run one of these programs per core. Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field maps, detector geometry, etc. Currently people have been trying to start one program, initialize the geometry and field maps and then fork it N times, to have the data shared. With KSM this would be done automatically by the system so it sounded extremely attractive when Andrea presented it. (We have are already started to test KSM on their systems...) KSM can run as kernel thread or as userspace application or both example for how to control the kernel thread: #include stdio.h #include stdlib.h #include string.h #include sys/types.h #include sys/stat.h #include sys/ioctl.h #include fcntl.h #include sys/mman.h #include unistd.h #include ksm.h int main(int argc, char *argv[]) { int fd; int used = 0; int fd_start; struct ksm_kthread_info info; if (argc 2) { fprintf(stderr, usage: %s {start npages sleep | stop | info}\n, argv[0]); exit(1); } fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600); if (fd == -1) { fprintf(stderr, could not open /dev/ksm\n); exit(1); } if (!strncmp(argv[1], start, strlen(argv[1]))) { used = 1; if (argc 4) { fprintf(stderr, usage: %s start npages_to_scan max_pages_to_merge sleep\n, argv[0]); exit(1); } info.pages_to_scan = atoi(argv[2]); info.max_pages_to_merge = atoi(argv[3]); info.sleep = atoi(argv[4]); info.flags = ksm_control_flags_run; fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info); if (fd_start == -1) { fprintf(stderr, KSM_START_KTHREAD failed\n); exit(1); } printf(created scanner\n); } if (!strncmp(argv[1], stop, strlen(argv[1]))) { used = 1; info.flags = 0; fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info); printf(stopped scanner\n); } if (!strncmp(argv[1], info, strlen(argv[1]))) { used = 1; ioctl(fd, KSM_GET_INFO_KTHREAD, info); printf(flags %d, pages_to_scan %d npages_merge %d, sleep_time %d\n, info.flags, info.pages_to_scan, info.max_pages_to_merge, info.sleep); } if (!used) fprintf(stderr, unknown command %s\n, argv[1]); return 0; } example of how to register qemu to ksm (or any userspace application) diff --git a/qemu/vl.c b/qemu/vl.c index 4721fdd..7785bf9 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -21,6 +21,7 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER * DEALINGS IN * THE SOFTWARE. */ +#include ksm.h #include hw/hw.h #include hw/boards.h #include hw/usb.h @@ -5799,6 +5800,37 @@ static void termsig_setup(void) #endif +int ksm_register_memory(void) +{ +int fd; +int ksm_fd; +int r = 1; +struct ksm_memory_region ksm_region; + +fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600); +if (fd == -1) +goto out; + +ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA); +if (ksm_fd == -1) +goto out_free; + +ksm_region.npages = phys_ram_size / TARGET_PAGE_SIZE; +ksm_region.addr = phys_ram_base; +r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region); +if (r) +goto out_free1; + +return r; + +out_free1: +close(ksm_fd); +out_free: +close(fd); +out: +return r; +} + int main(int argc, char **argv) { #ifdef CONFIG_GDBSTUB @@ -6735,6 +6767,8 @@ int main(int argc, char **argv)
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Tue, 11 Nov 2008 15:21:37 +0200 Izik Eidus [EMAIL PROTECTED] wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. the sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) this driver is very useful for KVM as in cases of runing multiple guests operation system of the same type, many pages are sharable. this driver can be useful by OpenVZ as well. These benefits should be quantified, please. Also any benefits to any other workloads should be identified and quantified. The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: On Tue, 11 Nov 2008 15:21:37 +0200 Izik Eidus [EMAIL PROTECTED] wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. the sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) this driver is very useful for KVM as in cases of runing multiple guests operation system of the same type, many pages are sharable. this driver can be useful by OpenVZ as well. These benefits should be quantified, please. Also any benefits to any other workloads should be identified and quantified. Sure, we have used KSM in production for about half year and the numbers that came from our QA is: using KSM for desktop (KSM was tested just for windows desktop workload) you can run as many as 52 windows xp with 1 giga ram each on server with just 16giga ram. (this is more than 300% overcommit) the reason is that most of the kernel/dlls of this guests is shared and in addition we are sharing the windows zero (windows keep making all its free memory as zero, so every time windows release memory we take the page back to the host) there is slide that give this numbers you can find at: http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFiledo=gettarget=kdf2008_3.pdf (slide 27) beside more i gave presentation about ksm that can be found at: http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFiledo=gettarget=kdf2008_12.pdf if more numbers are wanted for other workloads i can test it. (the idea of ksm is to run it slowly slowy at low priority and let it merge pages when no one need the cpu) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Avi Kivity wrote: Andrew Morton wrote: The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. As Avi said, in kvm we cannot know how the guest is going to map its pages, we have nothing to do but to scan for the identical pages (you can have pages that are shared that are in whole different offset inside the guest) For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Tue, 11 Nov 2008 20:48:16 +0200 Avi Kivity [EMAIL PROTECTED] wrote: Andrew Morton wrote: The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. What userspace-only changes could fix this? Identify the common data, write it to a flat file and mmap it, something like that? For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). hm. There has been the occasional discussion about idenfifying all-zeroes pages and scavenging them, repointing them at the zero page. Could this infrastructure be used for that? (And how much would we gain from it?) [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm thing here ;) ] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: On Tue, 11 Nov 2008 20:48:16 +0200 Avi Kivity [EMAIL PROTECTED] wrote: Andrew Morton wrote: The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. What userspace-only changes could fix this? Identify the common data, write it to a flat file and mmap it, something like that? For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). hm. There has been the occasional discussion about idenfifying all-zeroes pages and scavenging them, repointing them at the zero page. Could this infrastructure be used for that? (And how much would we gain from it?) [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm thing here ;) ] KSM is separate driver , it doesn't change anything in the VM but adding two helper functions. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Tue, 11 Nov 2008 21:07:10 +0200 Izik Eidus [EMAIL PROTECTED] wrote: we have used KSM in production for about half year and the numbers that came from our QA is: using KSM for desktop (KSM was tested just for windows desktop workload) you can run as many as 52 windows xp with 1 giga ram each on server with just 16giga ram. (this is more than 300% overcommit) the reason is that most of the kernel/dlls of this guests is shared and in addition we are sharing the windows zero (windows keep making all its free memory as zero, so every time windows release memory we take the page back to the host) there is slide that give this numbers you can find at: http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFiledo=gettarget=kdf2008_3.pdf (slide 27) beside more i gave presentation about ksm that can be found at: http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFiledo=gettarget=kdf2008_12.pdf OK, 300% isn't chicken feed. It is quite important that information such as this be prepared, added to the patch changelogs and maintained. For a start, without this basic information, there is no reason for anyone to look at any of the code! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. What userspace-only changes could fix this? Identify the common data, write it to a flat file and mmap it, something like that? This was considered. You can't scan the image, because it may be encrypted/compressed/offset (typical images _are_ offset because the first partition starts at sector 63...). The data may come from the network and not a disk image. You can't scan in userspace because the images belong to different users and contain sensitive data. Pages may come from several images (multiple disk images per guest) so you end up with one vma per page. So you have to scan memory, after the guest has retrieved it from disk/network/manufactured it somehow, decompressed and encrypted it, written it to the offset it wants. You can't scan from userspace since it's sensitive data, and of course the actual merging need to be done atomically, which can only be done from the holy of holies, the vm. For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). hm. There has been the occasional discussion about idenfifying all-zeroes pages and scavenging them, repointing them at the zero page. Could this infrastructure be used for that? Yes, trivially. ksm may be an overkill for this, though. (And how much would we gain from it?) A lot of zeros. [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm thing here ;) ] I sympathize -- us too. Consider the typical multiuser gnome minicomputer with all 150 users reading lwn.net at the same time instead of working. You could share the firefox rendered page cache, reducing memory utilization drastically. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] ksm - dynamic page sharing driver for linux
KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. the sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) this driver is very useful for KVM as in cases of runing multiple guests operation system of the same type, many pages are sharable. this driver can be useful by OpenVZ as well. KSM right now scan just memory that was registered to used by it, it does not scan the whole system memory (this can be changed, but the changes to find identical pages in normal linux system that doesnt run multiple guests) KSM can run as kernel thread or as userspace application (or both (it is allowed to run more than one scanner in a time)). example for how to control the kernel thread: ksmctl.c #include stdio.h #include stdlib.h #include string.h #include sys/types.h #include sys/stat.h #include sys/ioctl.h #include fcntl.h #include sys/mman.h #include unistd.h #include ksm.h int main(int argc, char *argv[]) { int fd; int used = 0; int fd_start; struct ksm_kthread_info info; if (argc 2) { fprintf(stderr, usage: %s {start npages sleep | stop | info}\n, argv[0]); exit(1); } fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600); if (fd == -1) { fprintf(stderr, could not open /dev/ksm\n); exit(1); } if (!strncmp(argv[1], start, strlen(argv[1]))) { used = 1; if (argc 5) { fprintf(stderr, usage: %s start npages_to_scan, argv[0]); fprintf(stderr, npages_max_merge sleep\n); exit(1); } info.pages_to_scan = atoi(argv[2]); info.max_pages_to_merge = atoi(argv[3]); info.sleep = atoi(argv[4]); info.running = 1; fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info); if (fd_start == -1) { fprintf(stderr, KSM_START_KTHREAD failed\n); exit(1); } printf(created scanner\n); } if (!strncmp(argv[1], stop, strlen(argv[1]))) { used = 1; info.running = 0; fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info); if (fd_start == -1) { fprintf(stderr, KSM_START_STOP_KTHREAD failed\n); exit(1); } printf(stopped scanner\n); } if (!strncmp(argv[1], info, strlen(argv[1]))) { used = 1; fd_start = ioctl(fd, KSM_GET_INFO_KTHREAD, info); if (fd_start == -1) { fprintf(stderr, KSM_GET_INFO_KTHREAD failed\n); exit(1); } printf(running %d, pages_to_scan %d pages_max_merge %d, info.running, info.pages_to_scan, info.max_pages_to_merge); printf(sleep_time %d\n, info.sleep); } if (!used) fprintf(stderr, unknown command %s\n, argv[1]); return 0; } example of how to register qemu to ksm (or any userspace application) diff --git a/qemu/vl.c b/qemu/vl.c index 4721fdd..7785bf9 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -21,6 +21,7 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER * DEALINGS IN * THE SOFTWARE. */ +#include ksm.h #include hw/hw.h #include hw/boards.h #include hw/usb.h @@ -5799,6 +5800,37 @@ static void termsig_setup(void) #endif +int ksm_register_memory(void) +{ +int fd; +int ksm_fd; +int r = 1; +struct ksm_memory_region ksm_region; + +fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600); +if (fd == -1) +goto out; + +ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA); +if (ksm_fd == -1) +goto out_free; + +ksm_region.npages = phys_ram_size / TARGET_PAGE_SIZE; +ksm_region.addr = phys_ram_base; +r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region); +if (r) +goto out_free1; + +return r; + +out_free1: +close(ksm_fd); +out_free: +close(fd); +out: +return r; +} + int main(int argc, char **argv) { #ifdef CONFIG_GDBSTUB @@ -6735,6 +6767,8 @@ int main(int argc, char **argv) /* init the dynamic translator */ cpu_exec_init_all(tb_size * 1024 * 1024); +ksm_register_memory(); + bdrv_init(); /* we always create the cdrom drive, even