Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Tue, Apr 14, 2009 at 03:09:29PM -0700, Andrew Morton wrote: We need a comment here explaining why we can't use the much preferable lock_page(). Why can't we use the much preferable lock_page()? We might but then it'd risk to waste time waiting. It's not worth waiting, we want kksmd to be allowed to keep one (in future more than one as we scale it smp/numa) CPU busy at all times running memcmp and not schedule (other than for need_resched()) to try to free memory at the fastest peace possible. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Wed, Apr 15, 2009 at 05:43:03PM -0700, Jeremy Fitzhardinge wrote: Shouldn't that be kmap_atomic's job anyway? Otherwise it would be hard to No because those are full noops in no-highmem kernels. I commented in other email why I think it's safe thanks to the wrprotect + smp tlb flush of the userland PTE. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrea Arcangeli wrote: On Wed, Apr 15, 2009 at 05:43:03PM -0700, Jeremy Fitzhardinge wrote: Shouldn't that be kmap_atomic's job anyway? Otherwise it would be hard to No because those are full noops in no-highmem kernels. I commented in other email why I think it's safe thanks to the wrprotect + smp tlb flush of the userland PTE. I think Andrew's query was about data cache synchronization in architectures with virtually indexed d-cache. On x86 it's a non-issue, but on architectures for which it is an issue, I assume kmap_atomic does any necessary cache flushes, as it does tlb flushes on x86 (which may be none at all, if no mapping actually happens). J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrew Morton wrote: On Thu, 9 Apr 2009 06:58:41 +0300 Izik Eidus iei...@redhat.com wrote: Confused. In the covering email you indicated that v2 of the patchset had abandoned ioctls and had moved the interface to sysfs. We have abandoned the ioctls that control the ksm behavior (how much cpu it take, how much kernel pages it may allocate and so on...) But we still use ioctls to register the application memory to be used with ksm. It would be good to completely (and briefly) describe KSM's proposed userspace intefaces in the changelog or somewhere. I'm a bit confused. I will post new clean description for the ksm api with V4. +static pte_t *get_pte(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *ptep = NULL; + + pgd = pgd_offset(mm, addr); + if (!pgd_present(*pgd)) + goto out; + + pud = pud_offset(pgd, addr); + if (!pud_present(*pud)) + goto out; + + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd)) + goto out; + + ptep = pte_offset_map(pmd, addr); +out: + return ptep; +} hm, this looks very generic. Does it duplicate anything which core kernel already provides? I dont think so. If not, perhaps core kernel should provide this (perhaps after some reorganisation). Quick grep on the code show me at least 2 places that can use this function one is: remove_migration_pte() inside migrate.c and the other is: page_check_address() inside rmap.c I will post with V4 an inline get_ptep() function, worst case i will get nacked. ... +static int rmap_hash_init(void) +{ + if (!rmap_hash_size) { + struct sysinfo sinfo; + + si_meminfo(sinfo); + rmap_hash_size = sinfo.totalram / 10; One slot per ten pages of physical memory? Is this too large, too small or just right? Highly depend on the number of processes / memory regions that will be registered inside ksm It is a module parameter and so user can change it to how much it want. + } + nrmaps_hash = rmap_hash_size; + rmap_hash = vmalloc(nrmaps_hash * sizeof(struct hlist_head)); + if (!rmap_hash) + return -ENOMEM; + memset(rmap_hash, 0, nrmaps_hash * sizeof(struct hlist_head)); + return 0; +} + ... +static void break_cow(struct mm_struct *mm, unsigned long addr) +{ + struct page *page[1]; + + down_read(mm-mmap_sem); + if (get_user_pages(current, mm, addr, 1, 1, 0, page, NULL)) { + put_page(page[0]); + } + up_read(mm-mmap_sem); +} - unneeded brakes around single statement - that single statement is over-indented. - and it seems wrong. If get_user_pages() returned, say, -ENOMEM, we end up doing put_page(random-uninitialised-address-from-stack-go-oops)? Good catch. ... +static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma, + struct ksm_memory_region *mem) +{ + struct ksm_mem_slot *slot; + int ret = -EPERM; + + slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL); + if (!slot) { + ret = -ENOMEM; + goto out; + } + + slot-mm = get_task_mm(current); + if (!slot-mm) + goto out_free; + slot-addr = mem-addr; + slot-npages = mem-npages; + + down_write(slots_lock); + + list_add_tail(slot-link, slots); + list_add_tail(slot-sma_link, ksm_sma-sma_slots); + + up_write(slots_lock); + return 0; + +out_free: + kfree(slot); +out: + return ret; +} So this function pins the mm_struct. I wonder what the implications of this are. The mm struct wont go away until the file will be closed... (Application close the file descriptor, or the Application die) Not much, I guess. Some comments in the code which explain the object lifecycles would be nice. ... +static int memcmp_pages(struct page *page1, struct page *page2) +{ + char *addr1, *addr2; + int r; + + addr1 = kmap_atomic(page1, KM_USER0); + addr2 = kmap_atomic(page2, KM_USER1); + r = memcmp(addr1, addr2, PAGE_SIZE); + kunmap_atomic(addr1, KM_USER0); + kunmap_atomic(addr2, KM_USER1); + return r; +} I wonder if this code all does enough cpu cache flushing to be able to guarantee that it's looking at valid data. Not my area, and presumably not an issue on x86. Andrea pointed in previous reply that due to the fact that we are running page_wrprotect() on this pages memcmp_pages should be stable. ... +static int try_to_merge_one_page(struct mm_struct *mm, +struct vm_area_struct *vma, +struct page *oldpage, +struct page *newpage, +
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Thu, 16 Apr 2009 01:37:25 +0300 Izik Eidus iei...@redhat.com wrote: Andrew Morton wrote: On Thu, 9 Apr 2009 06:58:41 +0300 Izik Eidus iei...@redhat.com wrote: Confused. In the covering email you indicated that v2 of the patchset had abandoned ioctls and had moved the interface to sysfs. We have abandoned the ioctls that control the ksm behavior (how much cpu it take, how much kernel pages it may allocate and so on...) But we still use ioctls to register the application memory to be used with ksm. hm. ioctls make kernel people weep and gnash teeth. An appropriate interface would be to add new syscalls. But as ksm is an optional thing and can even be modprobed, that doesn't work. And having a driver in mm/ which can be modprobed is kinda neat. I can't immediately think of a nicer interface. You could always poke numbers into some pseudo-file but to me that seems as ugly, or uglier than an ioctl (others seem to disagee). Ho hum. Please design the ioctl interface so that it doesn't need any compat handling if poss. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Wed, Apr 15, 2009 at 03:50:58PM -0700, Andrew Morton wrote: an optional thing and can even be modprobed, that doesn't work. And having a driver in mm/ which can be modprobed is kinda neat. Agreed. I think madvise with all its vma split requirements and ksm-unregistering invoked at vma destruction time (under CONFIG_KSM || CONFIG_KSM_MODULE) is clean approach only if ksm is considered a piece of the core kernel VM. As long as only certain users out there use ksm (i.e. only virtualization servers and LHC computations) the pseduochar ioctl interface keeps it out of the kernel, so core kernel MM API remains almost unaffected by ksm. It's kinda neat it's external as self-contained module, but the whole point is that to be self-contained it has to use ioctl. Another thing is that madvise usually doesn't require mangling sysfs to be effective. madvise without enabling ksm with sysfs would be entirely useless. So doing it as madvise that returns success and has no effect unless 'root' does something, is kind of weird. Thinking about the absolute worst case: if this really turns out to be wrong decision, simply /dev/ksm won't exist anymore and no app could ever break as they will graceful handle the missing pseudochar. They won't run the ioctl and just continue like if ksm.ko wasn't loaded. As there are only a few (but critically important) apps using KSM, converting them to fallback on madvise is a few liner trivial change (kvm-userland will have 10 more lines to keep opening /dev/ksm before calling madvise if we ever later decide KSM has to become a VM core kernel functionality with madvise or its own per-arch syscall). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrew Morton wrote: +static pte_t *get_pte(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *ptep = NULL; + + pgd = pgd_offset(mm, addr); + if (!pgd_present(*pgd)) + goto out; + + pud = pud_offset(pgd, addr); + if (!pud_present(*pud)) + goto out; + + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd)) + goto out; + + ptep = pte_offset_map(pmd, addr); +out: + return ptep; +} hm, this looks very generic. Does it duplicate anything which core kernel already provides? If not, perhaps core kernel should provide this (perhaps after some reorganisation). It is lookup_address() which works on user addresses, and as such is very useful. But it would need to deal with returning a level so it can deal with large pages in usermode, and have some well-defined semantics on whether the caller is responsible for unmapping the returned thing (ie, only if its a pte). I implemented this myself a couple of months ago, but I can't find it anywhere... +static int memcmp_pages(struct page *page1, struct page *page2) +{ + char *addr1, *addr2; + int r; + + addr1 = kmap_atomic(page1, KM_USER0); + addr2 = kmap_atomic(page2, KM_USER1); + r = memcmp(addr1, addr2, PAGE_SIZE); + kunmap_atomic(addr1, KM_USER0); + kunmap_atomic(addr2, KM_USER1); + return r; +} I wonder if this code all does enough cpu cache flushing to be able to guarantee that it's looking at valid data. Not my area, and presumably not an issue on x86. Shouldn't that be kmap_atomic's job anyway? Otherwise it would be hard to use on any virtual-tag/indexed cache machine. J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Jeremy Fitzhardinge wrote: Andrew Morton wrote: +static pte_t *get_pte(struct mm_struct *mm, unsigned long addr) +{ +pgd_t *pgd; +pud_t *pud; +pmd_t *pmd; +pte_t *ptep = NULL; + +pgd = pgd_offset(mm, addr); +if (!pgd_present(*pgd)) +goto out; + +pud = pud_offset(pgd, addr); +if (!pud_present(*pud)) +goto out; + +pmd = pmd_offset(pud, addr); +if (!pmd_present(*pmd)) +goto out; + +ptep = pte_offset_map(pmd, addr); +out: +return ptep; +} hm, this looks very generic. Does it duplicate anything which core kernel already provides? If not, perhaps core kernel should provide this (perhaps after some reorganisation). It is lookup_address() which works on user addresses, and as such is very useful. But ksm need the pgd offset of an mm struct, not the kernel pgd, so maybe changing it to get the pgd offset would be nice.. Another thing it is just for x86 right now, so probably it need to go out to the common code But it would need to deal with returning a level so it can deal with large pages in usermode, and have some well-defined semantics on whether the caller is responsible for unmapping the returned thing (ie, only if its a pte). I implemented this myself a couple of months ago, but I can't find it anywhere... +static int memcmp_pages(struct page *page1, struct page *page2) +{ +char *addr1, *addr2; +int r; + +addr1 = kmap_atomic(page1, KM_USER0); +addr2 = kmap_atomic(page2, KM_USER1); +r = memcmp(addr1, addr2, PAGE_SIZE); +kunmap_atomic(addr1, KM_USER0); +kunmap_atomic(addr2, KM_USER1); +return r; +} I wonder if this code all does enough cpu cache flushing to be able to guarantee that it's looking at valid data. Not my area, and presumably not an issue on x86. Shouldn't that be kmap_atomic's job anyway? Otherwise it would be hard to use on any virtual-tag/indexed cache machine. J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Thu, 9 Apr 2009 06:58:41 +0300 Izik Eidus iei...@redhat.com wrote: Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. Ksm api: KSM_GET_API_VERSION: Give the userspace the api version of the module. KSM_CREATE_SHARED_MEMORY_AREA: Create shared memory reagion fd, that latter allow the user to register the memory region to scan by using: KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION KSM_REGISTER_MEMORY_REGION: Register userspace virtual address range to be scanned by ksm. This ioctl is using the ksm_memory_region structure: ksm_memory_region: __u32 npages; number of pages to share inside this memory region. __u32 pad; __u64 addr: the begining of the virtual address of this region. __u64 reserved_bits; reserved bits for future usage. KSM_REMOVE_MEMORY_REGION: Remove memory region from ksm. ... +/* ioctls for /dev/ksm */ Confused. In the covering email you indicated that v2 of the patchset had abandoned ioctls and had moved the interface to sysfs. It would be good to completely (and briefly) describe KSM's proposed userspace intefaces in the changelog or somewhere. I'm a bit confused. ... +/* + * slots_lock protect against removing and adding memory regions while a scanner + * is in the middle of scanning. + */ protects +static DECLARE_RWSEM(slots_lock); + +/* The stable and unstable trees heads. */ +struct rb_root root_stable_tree = RB_ROOT; +struct rb_root root_unstable_tree = RB_ROOT; + + +/* The number of linked list members inside the hash table */ +static int nrmaps_hash; A signed type doesn't seem appropriate. +/* rmap_hash hash table */ +static struct hlist_head *rmap_hash; + +static struct kmem_cache *tree_item_cache; +static struct kmem_cache *rmap_item_cache; + +/* the number of nodes inside the stable tree */ +static unsigned long nnodes_stable_tree; + +/* the number of kernel allocated pages outside the stable tree */ +static unsigned long nkpage_out_tree; + +static int kthread_sleep; /* sleep time of the kernel thread */ +static int kthread_pages_to_scan; /* npages to scan for the kernel thread */ +static int kthread_max_kernel_pages; /* number of unswappable pages allowed */ The kthread_max_kernel_pages isn't very illuminating. The use of kthread in the identifier makes is look like part of the kthread subsystem. +static unsigned long ksm_pages_shared; +static struct ksm_scan kthread_ksm_scan; +static int ksmd_flags; +static struct task_struct *kthread; +static DECLARE_WAIT_QUEUE_HEAD(kthread_wait); +static DECLARE_RWSEM(kthread_lock); + + ... +static pte_t *get_pte(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *ptep = NULL; + + pgd = pgd_offset(mm, addr); + if (!pgd_present(*pgd)) + goto out; + + pud = pud_offset(pgd, addr); + if (!pud_present(*pud)) + goto out; + + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd)) + goto out; + + ptep = pte_offset_map(pmd, addr); +out: + return ptep; +} hm, this looks very generic. Does it duplicate anything which core kernel already provides? If not, perhaps core kernel should provide this (perhaps after some reorganisation). ... +static int rmap_hash_init(void) +{ + if (!rmap_hash_size) { + struct sysinfo sinfo; + + si_meminfo(sinfo); + rmap_hash_size = sinfo.totalram / 10; One slot per ten pages of physical memory? Is this too large, too small or just right? + } + nrmaps_hash = rmap_hash_size; + rmap_hash = vmalloc(nrmaps_hash * sizeof(struct hlist_head)); + if (!rmap_hash) + return -ENOMEM; + memset(rmap_hash, 0, nrmaps_hash * sizeof(struct hlist_head)); + return 0; +} + ... +static void break_cow(struct mm_struct *mm, unsigned long addr) +{ + struct page *page[1]; + +
[PATCH 4/4] add ksm kernel shared memory driver.
Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. Ksm api: KSM_GET_API_VERSION: Give the userspace the api version of the module. KSM_CREATE_SHARED_MEMORY_AREA: Create shared memory reagion fd, that latter allow the user to register the memory region to scan by using: KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION KSM_REGISTER_MEMORY_REGION: Register userspace virtual address range to be scanned by ksm. This ioctl is using the ksm_memory_region structure: ksm_memory_region: __u32 npages; number of pages to share inside this memory region. __u32 pad; __u64 addr: the begining of the virtual address of this region. __u64 reserved_bits; reserved bits for future usage. KSM_REMOVE_MEMORY_REGION: Remove memory region from ksm. Signed-off-by: Izik Eidus iei...@redhat.com Signed-off-by: Chris Wright chr...@redhat.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/ksm.h| 48 ++ include/linux/miscdevice.h |1 + mm/Kconfig |6 + mm/Makefile|1 + mm/ksm.c | 1674 5 files changed, 1730 insertions(+), 0 deletions(-) create mode 100644 include/linux/ksm.h create mode 100644 mm/ksm.c diff --git a/include/linux/ksm.h b/include/linux/ksm.h new file mode 100644 index 000..2c11e9a --- /dev/null +++ b/include/linux/ksm.h @@ -0,0 +1,48 @@ +#ifndef __LINUX_KSM_H +#define __LINUX_KSM_H + +/* + * Userspace interface for /dev/ksm - kvm shared memory + */ + +#include linux/types.h +#include linux/ioctl.h + +#include asm/types.h + +#define KSM_API_VERSION 1 + +#define ksm_control_flags_run 1 + +/* for KSM_REGISTER_MEMORY_REGION */ +struct ksm_memory_region { + __u32 npages; /* number of pages to share */ + __u32 pad; + __u64 addr; /* the begining of the virtual address */ +__u64 reserved_bits; +}; + +#define KSMIO 0xAB + +/* ioctls for /dev/ksm */ + +#define KSM_GET_API_VERSION _IO(KSMIO, 0x00) +/* + * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd + */ +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO, 0x01) /* return SMA fd */ + +/* ioctls for SMA fds */ + +/* + * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be + * scanned by kvm. + */ +#define KSM_REGISTER_MEMORY_REGION _IOW(KSMIO, 0x20,\ + struct ksm_memory_region) +/* + * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm. + */ +#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO, 0x21) + +#endif diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index beb6ec9..297c0bb 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -30,6 +30,7 @@ #define HPET_MINOR 228 #define FUSE_MINOR 229 #define KVM_MINOR 232 +#define KSM_MINOR 233 #define MISC_DYNAMIC_MINOR 255 struct device; diff --git a/mm/Kconfig b/mm/Kconfig index b53427a..3f3fd04 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -223,3 +223,9 @@ config HAVE_MLOCKED_PAGE_BIT config MMU_NOTIFIER bool + +config KSM + tristate Enable KSM for page sharing + help + Enable the KSM kernel module to allow page sharing of equal pages + among different tasks. diff --git a/mm/Makefile b/mm/Makefile index ec73c68..b885513 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -24,6 +24,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o +obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o obj-$(CONFIG_SLAB) += slab.o obj-$(CONFIG_SLUB) += slub.o diff --git a/mm/ksm.c b/mm/ksm.c new file mode 100644 index 000..a15a92d --- /dev/null +++ b/mm/ksm.c @@ -0,0 +1,1674 @@ +/* + * Memory merging driver for Linux + * + * This module enables dynamic sharing of identical pages
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrey Panin wrote: On 094, 04 04, 2009 at 05:35:22PM +0300, Izik Eidus wrote: SNIP +static inline u32 calc_checksum(struct page *page) +{ + u32 checksum; + void *addr = kmap_atomic(page, KM_USER0); + checksum = jhash(addr, PAGE_SIZE, 17); Why jhash2() is not used here ? It's faster and leads to smaller code size. Beacuse i didnt know, i will check that and change. Thanks. (We should really use in cpu crc for Intel Nehalem, and dirty bit for the rest of the architactures...) + kunmap_atomic(addr, KM_USER0); + return checksum; +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] add ksm kernel shared memory driver.
Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. Ksm api: KSM_GET_API_VERSION: Give the userspace the api version of the module. KSM_CREATE_SHARED_MEMORY_AREA: Create shared memory reagion fd, that latter allow the user to register the memory region to scan by using: KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION KSM_REGISTER_MEMORY_REGION: Register userspace virtual address range to be scanned by ksm. This ioctl is using the ksm_memory_region structure: ksm_memory_region: __u32 npages; number of pages to share inside this memory region. __u32 pad; __u64 addr: the begining of the virtual address of this region. __u64 reserved_bits; reserved bits for future usage. KSM_REMOVE_MEMORY_REGION: Remove memory region from ksm. Signed-off-by: Izik Eidus iei...@redhat.com --- include/linux/ksm.h| 48 ++ include/linux/miscdevice.h |1 + mm/Kconfig |6 + mm/Makefile|1 + mm/ksm.c | 1668 5 files changed, 1724 insertions(+), 0 deletions(-) create mode 100644 include/linux/ksm.h create mode 100644 mm/ksm.c diff --git a/include/linux/ksm.h b/include/linux/ksm.h new file mode 100644 index 000..2c11e9a --- /dev/null +++ b/include/linux/ksm.h @@ -0,0 +1,48 @@ +#ifndef __LINUX_KSM_H +#define __LINUX_KSM_H + +/* + * Userspace interface for /dev/ksm - kvm shared memory + */ + +#include linux/types.h +#include linux/ioctl.h + +#include asm/types.h + +#define KSM_API_VERSION 1 + +#define ksm_control_flags_run 1 + +/* for KSM_REGISTER_MEMORY_REGION */ +struct ksm_memory_region { + __u32 npages; /* number of pages to share */ + __u32 pad; + __u64 addr; /* the begining of the virtual address */ +__u64 reserved_bits; +}; + +#define KSMIO 0xAB + +/* ioctls for /dev/ksm */ + +#define KSM_GET_API_VERSION _IO(KSMIO, 0x00) +/* + * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd + */ +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO, 0x01) /* return SMA fd */ + +/* ioctls for SMA fds */ + +/* + * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be + * scanned by kvm. + */ +#define KSM_REGISTER_MEMORY_REGION _IOW(KSMIO, 0x20,\ + struct ksm_memory_region) +/* + * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm. + */ +#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO, 0x21) + +#endif diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index beb6ec9..297c0bb 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -30,6 +30,7 @@ #define HPET_MINOR 228 #define FUSE_MINOR 229 #define KVM_MINOR 232 +#define KSM_MINOR 233 #define MISC_DYNAMIC_MINOR 255 struct device; diff --git a/mm/Kconfig b/mm/Kconfig index b53427a..3f3fd04 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -223,3 +223,9 @@ config HAVE_MLOCKED_PAGE_BIT config MMU_NOTIFIER bool + +config KSM + tristate Enable KSM for page sharing + help + Enable the KSM kernel module to allow page sharing of equal pages + among different tasks. diff --git a/mm/Makefile b/mm/Makefile index ec73c68..b885513 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -24,6 +24,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o +obj-$(CONFIG_KSM) += ksm.o obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o obj-$(CONFIG_SLAB) += slab.o obj-$(CONFIG_SLUB) += slub.o diff --git a/mm/ksm.c b/mm/ksm.c new file mode 100644 index 000..fb59a08 --- /dev/null +++ b/mm/ksm.c @@ -0,0 +1,1668 @@ +/* + * Memory merging driver for Linux + * + * This module enables dynamic sharing of identical pages found in different + * memory areas, even if they are not shared by fork() + * + * Copyright (C)
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Anthony Liguori wrote: I'm often afraid of what sort of bugs we'd uncover in kvm if we passed the fds around via SCM_RIGHTS and started poking around :-/ kvm checks the mm doesn't change underneath. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Wed, Apr 01, 2009 at 09:36:31PM -0500, Anthony Liguori wrote: on this behavior to unregister memory regions, you could potentially have badness happen in the kernel if ksm attempted to access an invalid memory region. How could you possibly come to this conclusion? If badness could ever happen then the original task with access to /dev/ksm could make the same badness happen in the first place without needing to exec or pass the fd to anybody else with IPC. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Anthony Liguori wrote: Chris Wright wrote: * Anthony Liguori (anth...@codemonkey.ws) wrote: The ioctl() interface is quite bad for what you're doing. You're telling the kernel extra information about a VA range in userspace. That's what madvise is for. You're tweaking simple read/write values of kernel infrastructure. That's what sysfs is for. I agree re: sysfs (brought it up myself before). As far as madvise vs. ioctl, the one thing that comes from the ioctl is fops-release to automagically unregister memory on exit. This is precisely why ioctl() is a bad interface. fops-release isn't tied to the process but rather tied to the open file. The file can stay open long after the process exits either by a fork()'d child inheriting the file descriptor or through something more sinister like SCM_RIGHTS. In fact, a common mistake is to leak file descriptors by not closing them when exec()'ing a process. Instead of just delaying a close, if you rely on this behavior to unregister memory regions, you could potentially have badness happen in the kernel if ksm attempted to access an invalid memory region. How could such badness ever happen in the kernel? Ksm work by virtual addresses!, it fetch the pages by using get_user_pages(), and the mm struct is protected by get_task_mm(), in addion we take the down_read(mmap_sem) So how could ksm ever acces to invalid memory region unless the host page table or get_task_mm() would stop working! When someone register memory for scan, we do get_task_mm() when the file is closed or when he say that he dont want this to be registered anymore he call the unregister ioctl You can aurgoment about API, but this is mathamathical thing to say Ksm is insecure, please show me senario! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
KAMEZAWA Hiroyuki wrote: On Tue, 31 Mar 2009 15:21:53 +0300 Izik Eidus iei...@redhat.com wrote: kpage is actually what going to be KsmPage - the shared page... Right now this pages are not swappable..., after ksm will be merged we will make this pages swappable as well... sure. If so, please - show the amount of kpage - allow users to set limit for usage of kpages. or preserve kpages at boot or by user's command. kpage actually save memory..., and limiting the number of them, would make you limit the number of shared pages... Ah, I'm working for memory control cgroup. And *KSM* will be out of control. It's ok to make the default limit value as INFINITY. but please add knobs. Sure, when i will post V2 i will take care for this issue (i will do it after i get little bit more review for ksm.c :-)) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Anthony Liguori wrote: Andrea Arcangeli wrote: On Tue, Mar 31, 2009 at 10:54:57AM -0500, Anthony Liguori wrote: You can still disable ksm and simply return ENOSYS for the MADV_ flag. You Anthony, the biggest problem about madvice() is that it is a real system call api, i wouldnt want in that stage of ksm commit into api changes of linux... The ioctl itself is restricting, madvice is much more..., Can we draft this issue to after ksm is merged, and after all the big new fetures that we want to add to ksm will be merge (then the api would be much more stable, and we will be able to ask ppl in the list about changing of api, but for new driver that it yet to be merged, it is kind of overkill to add api to linux) What do you think? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Izik Eidus wrote: Anthony, the biggest problem about madvice() is that it is a real system call api, i wouldnt want in that stage of ksm commit into api changes of linux... The ioctl itself is restricting, madvice is much more..., Can we draft this issue to after ksm is merged, and after all the big new fetures that we want to add to ksm will be merge (then the api would be much more stable, and we will be able to ask ppl in the list about changing of api, but for new driver that it yet to be merged, it is kind of overkill to add api to linux) What do you think? You can't change ABIs after something is merged or you break userspace. So you need to figure out the right ABI first. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
* Anthony Liguori (anth...@codemonkey.ws) wrote: You can't change ABIs after something is merged or you break userspace. So you need to figure out the right ABI first. Absolutely. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
* Anthony Liguori (anth...@codemonkey.ws) wrote: The ioctl() interface is quite bad for what you're doing. You're telling the kernel extra information about a VA range in userspace. That's what madvise is for. You're tweaking simple read/write values of kernel infrastructure. That's what sysfs is for. I agree re: sysfs (brought it up myself before). As far as madvise vs. ioctl, the one thing that comes from the ioctl is fops-release to automagically unregister memory on exit. This needs to be handled anyway if some -p pid is added to add a process after it's running, so less weight there. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Chris Wright wrote: * Anthony Liguori (anth...@codemonkey.ws) wrote: The ioctl() interface is quite bad for what you're doing. You're telling the kernel extra information about a VA range in userspace. That's what madvise is for. You're tweaking simple read/write values of kernel infrastructure. That's what sysfs is for. I agree re: sysfs (brought it up myself before). As far as madvise vs. ioctl, the one thing that comes from the ioctl is fops-release to automagically unregister memory on exit. This is precisely why ioctl() is a bad interface. fops-release isn't tied to the process but rather tied to the open file. The file can stay open long after the process exits either by a fork()'d child inheriting the file descriptor or through something more sinister like SCM_RIGHTS. In fact, a common mistake is to leak file descriptors by not closing them when exec()'ing a process. Instead of just delaying a close, if you rely on this behavior to unregister memory regions, you could potentially have badness happen in the kernel if ksm attempted to access an invalid memory region. So you absolutely have to automatically unregister regions in something other than the fops-release handler based on something that's tied to the pid's life cycle. Using an interface like madvise() would force the issue to be dealt with properly from the start :-) I'm often afraid of what sort of bugs we'd uncover in kvm if we passed the fds around via SCM_RIGHTS and started poking around :-/ Regards, Anthony Liguori This needs to be handled anyway if some -p pid is added to add a process after it's running, so less weight there. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
KAMEZAWA Hiroyuki wrote: On Tue, 31 Mar 2009 02:59:20 +0300 Izik Eidus iei...@redhat.com wrote: Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. Ksm api: KSM_GET_API_VERSION: Give the userspace the api version of the module. KSM_CREATE_SHARED_MEMORY_AREA: Create shared memory reagion fd, that latter allow the user to register the memory region to scan by using: KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION KSM_START_STOP_KTHREAD: Return information about the kernel thread, the inforamtion is returned using the ksm_kthread_info structure: ksm_kthread_info: __u32 sleep: number of microsecoends to sleep between each iteration of scanning. __u32 pages_to_scan: number of pages to scan for each iteration of scanning. __u32 max_pages_to_merge: maximum number of pages to merge in each iteration of scanning (so even if there are still more pages to scan, we stop this iteration) __u32 flags: flags to control ksmd (right now just ksm_control_flags_run available) KSM_REGISTER_MEMORY_REGION: Register userspace virtual address range to be scanned by ksm. This ioctl is using the ksm_memory_region structure: ksm_memory_region: __u32 npages; number of pages to share inside this memory region. __u32 pad; __u64 addr: the begining of the virtual address of this region. KSM_REMOVE_MEMORY_REGION: Remove memory region from ksm. Signed-off-by: Izik Eidus iei...@redhat.com --- include/linux/ksm.h| 69 +++ include/linux/miscdevice.h |1 + mm/Kconfig |6 + mm/Makefile|1 + mm/ksm.c | 1431 5 files changed, 1508 insertions(+), 0 deletions(-) create mode 100644 include/linux/ksm.h create mode 100644 mm/ksm.c diff --git a/include/linux/ksm.h b/include/linux/ksm.h new file mode 100644 index 000..5776dce --- /dev/null +++ b/include/linux/ksm.h @@ -0,0 +1,69 @@ +#ifndef __LINUX_KSM_H +#define __LINUX_KSM_H + +/* + * Userspace interface for /dev/ksm - kvm shared memory + */ + +#include linux/types.h +#include linux/ioctl.h + +#include asm/types.h + +#define KSM_API_VERSION 1 + +#define ksm_control_flags_run 1 + +/* for KSM_REGISTER_MEMORY_REGION */ +struct ksm_memory_region { + __u32 npages; /* number of pages to share */ + __u32 pad; + __u64 addr; /* the begining of the virtual address */ +__u64 reserved_bits; +}; + +struct ksm_kthread_info { + __u32 sleep; /* number of microsecoends to sleep */ + __u32 pages_to_scan; /* number of pages to scan */ + __u32 flags; /* control flags */ +__u32 pad; +__u64 reserved_bits; +}; + +#define KSMIO 0xAB + +/* ioctls for /dev/ksm */ + +#define KSM_GET_API_VERSION _IO(KSMIO, 0x00) +/* + * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd + */ +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO, 0x01) /* return SMA fd */ +/* + * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed + * (can stop the kernel thread from working by setting running = 0) + */ +#define KSM_START_STOP_KTHREAD _IOW(KSMIO, 0x02,\ + struct ksm_kthread_info) +/* + * KSM_GET_INFO_KTHREAD - return information about the kernel thread + * scanning speed. + */ +#define KSM_GET_INFO_KTHREAD_IOW(KSMIO, 0x03,\ + struct ksm_kthread_info) + + +/* ioctls for SMA fds */ + +/* + * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be + * scanned by kvm. + */ +#define KSM_REGISTER_MEMORY_REGION _IOW(KSMIO, 0x20,\ + struct ksm_memory_region) +/* + * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm. + */ +#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO, 0x21) + +#endif diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Anthony Liguori wrote: Izik Eidus wrote: Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. Ksm api: KSM_GET_API_VERSION: Give the userspace the api version of the module. KSM_CREATE_SHARED_MEMORY_AREA: Create shared memory reagion fd, that latter allow the user to register the memory region to scan by using: KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION KSM_START_STOP_KTHREAD: Return information about the kernel thread, the inforamtion is returned using the ksm_kthread_info structure: ksm_kthread_info: __u32 sleep: number of microsecoends to sleep between each iteration of scanning. __u32 pages_to_scan: number of pages to scan for each iteration of scanning. __u32 max_pages_to_merge: maximum number of pages to merge in each iteration of scanning (so even if there are still more pages to scan, we stop this iteration) __u32 flags: flags to control ksmd (right now just ksm_control_flags_run available) Wouldn't this make more sense as a sysfs interface? I belive using ioctl for registering memory of applications make it easier Ksm doesnt have any complicated API that would benefit from sysfs (beside adding more complexity) That is, the KSM_START_STOP_KTHREAD part, not necessarily the rest of the API. What you mean? Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Izik Eidus wrote: I belive using ioctl for registering memory of applications make it easier Yes, I completely agree. Ksm doesnt have any complicated API that would benefit from sysfs (beside adding more complexity) That is, the KSM_START_STOP_KTHREAD part, not necessarily the rest of the API. What you mean? The ioctl(KSM_START_STOP_KTHREAD) API is distinct from the rest of the API. Whereas the rest of the API is used by applications to register their memory with KSM, this API is used by ksmctl to allow parameters to be tweaked in userspace. These parameters are just simple values like enable, pages_to_scan, sleep_time. Then there is KSM_GET_INFO_KTHREAD which provides a read interface to these parameters. You could drop KSM_START_STOP_KTHREAD and KSM_GET_INFO_KTHREAD altogether, and introduce a sysfs hierarchy: /sysfs/some/path/ksm/{enable,pages_to_scan,sleep_time} That eliminates the need for ksmctl altogether, cleanly separates the two APIs, and provides a stronger interface. The main problem with the current API is that it uses a single device to do both the administrative task and the userspace interface. That means that any application that has access to registering its memory with KSM also has the ability to disable KSM. That seems like a security concern to me since registering a memory region ought to be an unprivileged action whereas enabling/disabling KSM ought to be a privileged action. Regards, Anthony Liguori Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Tue, Mar 31, 2009 at 08:31:31AM -0500, Anthony Liguori wrote: You could drop KSM_START_STOP_KTHREAD and KSM_GET_INFO_KTHREAD altogether, and introduce a sysfs hierarchy: /sysfs/some/path/ksm/{enable,pages_to_scan,sleep_time} Introducing a sysfs hierarchy sounds a bit of overkill. the ability to disable KSM. That seems like a security concern to me since registering a memory region ought to be an unprivileged action whereas enabling/disabling KSM ought to be a privileged action. sysfs files would then only be writeable by admin, so if we want to allow only admin to start/stop/tune ksm it'd be enough to plug an admin capability check in the ioctl to provide equivalent permissions. I could imagine converting the enable/pages_to_scan/sleep_time to module params and tweaking them through /sys/module/ksm/parameters, but for enable to work that way, we'd need to intercept the write so we can at least weakup the kksmd daemon, which doesn't seem possible with /sys/module/ksm/parameters, so in the end if we stick to the ioctl for registering regions, it seems simpler to use it for start/stop/tune too. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrea Arcangeli wrote: the ability to disable KSM. That seems like a security concern to me since registering a memory region ought to be an unprivileged action whereas enabling/disabling KSM ought to be a privileged action. sysfs files would then only be writeable by admin, so if we want to allow only admin to start/stop/tune ksm it'd be enough to plug an admin capability check in the ioctl to provide equivalent permissions. Caps are not very granular unless you introduce a new capability. Furthermore, it's a bit more difficult to associate a capability with a user/group. With sysfs, you use file based permissions to control the API. It also fits into things like selinux a lot better. In the very least, if you insist on not using sysfs, you should have a separate character device that's used for control (like /dev/ksmctl). Regards, Anthony Liguori I could imagine converting the enable/pages_to_scan/sleep_time to module params and tweaking them through /sys/module/ksm/parameters, but for enable to work that way, we'd need to intercept the write so we can at least weakup the kksmd daemon, which doesn't seem possible with /sys/module/ksm/parameters, so in the end if we stick to the ioctl for registering regions, it seems simpler to use it for start/stop/tune too. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Tue, Mar 31, 2009 at 09:37:17AM -0500, Anthony Liguori wrote: In the very least, if you insist on not using sysfs, you should have a separate character device that's used for control (like /dev/ksmctl). I'm fine to use sysfs that's not the point, if you've to add a ksmctl device, then sysfs is surely better. Besides ksm would normally be enabled at boot, tasks jailed by selinux will better not start/stop this thing. If people wants /sys/kernel/mm/ksm instead of the start_stop ioctl we surely can add it (provided there's a way to intercept write to the sysfs file). Problem is registering memory could also be done with 'echo 0 -1 /proc/self/ksm' and be inherited by childs, it's not just start/stop. I mean this is more a matter of taste I'm afraid... Personally I'm more concerned about the registering of the ram API than the start/stop thing which I cannot care less about, so my logic is that as long as this pseudodevice exists, we should use it for everything. If we go away from it, then we should remove it as a whole. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrea Arcangeli wrote: On Tue, Mar 31, 2009 at 09:37:17AM -0500, Anthony Liguori wrote: In the very least, if you insist on not using sysfs, you should have a separate character device that's used for control (like /dev/ksmctl). I'm fine to use sysfs that's not the point, if you've to add a ksmctl device, then sysfs is surely better. Besides ksm would normally be enabled at boot, tasks jailed by selinux will better not start/stop this thing. If people wants /sys/kernel/mm/ksm instead of the start_stop ioctl we surely can add it (provided there's a way to intercept write to the sysfs file). Problem is registering memory could also be done with 'echo 0 -1 /proc/self/ksm' and be inherited by childs, it's not just start/stop. I mean this is more a matter of taste I'm afraid... Personally I'm more concerned about the registering of the ram API than the start/stop thing which I cannot care less about, I don't think the registering of ram should be done via sysfs. That would be a pretty bad interface IMHO. But I do think the functionality that ksmctl provides along with the security issues I mentioned earlier really suggest that there ought to be a separate API for control vs. registration and that control API would make a lot of sense as a sysfs API. If you wanted to explore alternative APIs for registration, madvise() seems like the obvious candidate to me. madvise(start, size, MADV_SHARABLE) seems like a pretty obvious API to me. So combining a sysfs interface for control and an madvise() interface for registration seems like a really nice interface to me. Regards, Anthony Liguori so my logic is that as long as this pseudodevice exists, we should use it for everything. If we go away from it, then we should remove it as a whole. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Tue, Mar 31, 2009 at 10:09:24AM -0500, Anthony Liguori wrote: I don't think the registering of ram should be done via sysfs. That would be a pretty bad interface IMHO. But I do think the functionality that ksmctl provides along with the security issues I mentioned earlier really suggest that there ought to be a separate API for control vs. registration and that control API would make a lot of sense as a sysfs API. If you wanted to explore alternative APIs for registration, madvise() seems like the obvious candidate to me. madvise(start, size, MADV_SHARABLE) seems like a pretty obvious API to me. madvise to me would sound appropriate, only if ksm would be always-in, which is not the case as it won't even be built if it's configured to N. Besides madvise is sus covered syscall, and this is linux specific detail. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrea Arcangeli wrote: On Tue, Mar 31, 2009 at 10:09:24AM -0500, Anthony Liguori wrote: I don't think the registering of ram should be done via sysfs. That would be a pretty bad interface IMHO. But I do think the functionality that ksmctl provides along with the security issues I mentioned earlier really suggest that there ought to be a separate API for control vs. registration and that control API would make a lot of sense as a sysfs API. If you wanted to explore alternative APIs for registration, madvise() seems like the obvious candidate to me. madvise(start, size, MADV_SHARABLE) seems like a pretty obvious API to me. madvise to me would sound appropriate, only if ksm would be always-in, which is not the case as it won't even be built if it's configured to N. You can still disable ksm and simply return ENOSYS for the MADV_ flag. You could even keep it as a module if you liked by separating the madvise bits from the ksm bits. The madvise() bits could just provide the tracking infrastructure for determine which vmas were currently marked as sharable. You could then have ksm as loadable module that consumed that interface to then perform scanning. Besides madvise is sus covered syscall, and this is linux specific detail. A number of MADV_ flags are Linux specific (like MADV_DOFORK/MADV_DONTFORK). Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Tue, Mar 31, 2009 at 10:54:57AM -0500, Anthony Liguori wrote: You can still disable ksm and simply return ENOSYS for the MADV_ flag. You -EINVAL if something, -ENOSYS would tell userland that it shall stop trying to use madvise, including the other MADV_ too. could even keep it as a module if you liked by separating the madvise bits from the ksm bits. The madvise() bits could just provide the tracking infrastructure for determine which vmas were currently marked as sharable. You could then have ksm as loadable module that consumed that interface to then perform scanning. What's the point of making ksm a module if one has part of ksm code loaded in the kernel and not being possible to avoid compiling in? People that says KSM=N in their .config (like embedded running with 1M of ram), don't want that tracking overhead compiled into the kernel. Returning -EINVAL would be an option but again I think madvise is core syscall for SuS and I don't like that those core VM parts returns -EINVAL at will depend on certain kernel modules being loaded. A number of MADV_ flags are Linux specific (like MADV_DOFORK/MADV_DONTFORK). But those aren't kernel module related, so they're in line with the standard ones and could be adapted by other OS. KSM is not a core VM functionality, madvise is a core VM functionality, so I don't see fit. KSM as ioctl or KSM creating /proc/pid/ksm when loaded, sounds fine to me instead. If open of either one fails, application won't register in. It's up to you to choose KSM=M/N, if you want it as core functionality just build as KSM=Y but leave the option to others to save memory. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrea Arcangeli wrote: On Tue, Mar 31, 2009 at 10:54:57AM -0500, Anthony Liguori wrote: You can still disable ksm and simply return ENOSYS for the MADV_ flag. You -EINVAL if something, -ENOSYS would tell userland that it shall stop trying to use madvise, including the other MADV_ too. could even keep it as a module if you liked by separating the madvise bits from the ksm bits. The madvise() bits could just provide the tracking infrastructure for determine which vmas were currently marked as sharable. You could then have ksm as loadable module that consumed that interface to then perform scanning. What's the point of making ksm a module if one has part of ksm code loaded in the kernel and not being possible to avoid compiling in? People that says KSM=N in their .config (like embedded running with 1M of ram), don't want that tracking overhead compiled into the kernel. You have two things here. CONFIG_MEM_SHARABLE and CONFIG_KSM. CONFIG_MEM_SHARABLE cannot be a module. If it's set to =n, then madvise(MADV_SHARABLE) == -ENOSYS. If CONFIG_MEM_SHARABLE=y, then madvise(MADV_SHARABLE) will keep track of all sharable memory regions. Independently of that, CONFIG_KSM can be set to n,m,y. It depends on CONFIG_MEM_SHARABLE and when it's loaded, it consumes the list of sharable vmas. But honestly, CONFIG_MEM_SHARABLE shouldn't a lot of code so I don't see why you'd even need to make it configable. A number of MADV_ flags are Linux specific (like MADV_DOFORK/MADV_DONTFORK). But those aren't kernel module related, so they're in line with the standard ones and could be adapted by other OS. KSM is not a core VM functionality, madvise is a core VM functionality, so I don't see fit. KSM as ioctl or KSM creating /proc/pid/ksm when loaded, sounds fine to me instead. If open of either one fails, application won't register in. It's up to you to choose KSM=M/N, if you want it as core functionality just build as KSM=Y but leave the option to others to save memory. The ioctl() interface is quite bad for what you're doing. You're telling the kernel extra information about a VA range in userspace. That's what madvise is for. You're tweaking simple read/write values of kernel infrastructure. That's what sysfs is for. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Tue, Mar 31, 2009 at 11:51:14AM -0500, Anthony Liguori wrote: You have two things here. CONFIG_MEM_SHARABLE and CONFIG_KSM. CONFIG_MEM_SHARABLE cannot be a module. If it's set to =n, then madvise(MADV_SHARABLE) == -ENOSYS. Where the part that -ENOSYS tell userland madvise syscall table is empty, which is obviously not the case, wasn't clear? If CONFIG_MEM_SHARABLE=y, then madvise(MADV_SHARABLE) will keep track of all sharable memory regions. Independently of that, CONFIG_KSM can be set to n,m,y. It depends on CONFIG_MEM_SHARABLE and when it's loaded, it consumes the list of sharable vmas. And what do you gain by creating two config params when only one is needed other than more pain for the poor user doing make oldconfig and being asked new zillon of questions that aren't necessary? But honestly, CONFIG_MEM_SHARABLE shouldn't a lot of code so I don't see why you'd even need to make it configable. Even if you were to move the registration code in madvise with a -EINVAL retval if KSM was set to N for embedded, CONFIG_KSM would be enough: the registration code would be surrounded by CONFIG_KSM_MODULE || CONFIG_KSM, just like page_wrprotect/replace_page. This CONFIG_MEM_SHARABLE in addition to CONFIG_KSM is beyond what can make sense to me. The ioctl() interface is quite bad for what you're doing. You're telling the kernel extra information about a VA range in userspace. That's what The ioctl can be extended to also tell which pid to share without having to specify VA range, and having the feature inherited by the child. Not everyone wants to deal with VA. But my main issue with madvise is that it's core kernel functionality while KSM clearly is not. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Hello, I attach below some benchmark of the new ksm tree algorithm, showing ksm performance in best and worst case scenarios. --- Here a program ksmpages.c that tries to create the worst case scenario for the ksm tree algorithm. --- /* ksmpages.c: exercise KSM (C) Red Hat Inc. GPL'd */ #include stdlib.h #include malloc.h #include unistd.h #include fcntl.h #include stdio.h #include ksm.h #define SIZE (1UL*1024*1024*1024) #define PAGE_SIZE 4096 #define PAGES (SIZE/PAGE_SIZE) int ksm_register_memory(char * p) { int fd; int ksm_fd; int r = 1; struct ksm_memory_region ksm_region; fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600); if (fd == -1) goto out; ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA); if (ksm_fd == -1) goto out_free; ksm_region.npages = PAGES; ksm_region.addr = (unsigned long) p; r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region); if (r) goto out_free1; return r; out_free1: close(ksm_fd); out_free: close(fd); out: return r; } int main(void) { unsigned long page; char *p = memalign(PAGE_SIZE, PAGES*PAGE_SIZE); if (!p) perror(memalign), exit(1); if (ksm_register_memory(p)) printf(failed to register into ksm, run inside VM\n); else printf(registered into ksm, run outside VM\n); for (page = 0; page PAGES; page++) { char *ppage; ppage = p + page * PAGE_SIZE + PAGE_SIZE - sizeof(unsigned long); *(unsigned long *)ppage = page; } pause(); return 0; } --- ksmpages exercises ksm tree algorithm worst case where pages are all equal except for the last bytes, so the memcmp breaks after having accessed the worst-case amount of memory (i.e. almost 4096 bytes for each level of the stable or unstable tree). Top after running the first copy of ksmpages: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 16473 andrea20 0 1027m 1.0g 328 S0 25.9 0:01.14 ksmpages Below is vmstat 1 while running a second copy with kksmd running at 100% CPU load: --- 1 0 3104 2806044 60256 4553200 0 0 912 338 0 25 74 0 1 0 3104 2805700 60256 4553200 0 0 676 171 0 27 73 0 1 0 3104 2805452 60264 4552400 036 708 172 0 23 77 0 1 0 3104 2806428 60264 4553200 0 0 787 210 0 25 75 0 1 0 3104 2806212 60264 4552400 0 0 643 132 0 25 75 0 1 0 3104 2805864 60264 4552400 0 0 685 157 0 27 73 0 1 0 3104 2805616 60264 4552400 0 0 640 128 0 23 77 0 1 0 3104 2805368 60264 4552400 0 0 637 131 0 25 75 0 1 0 3104 2804996 60280 4550800 076 704 165 0 25 75 0 2 0 3104 2804748 60280 4552400 0 0 636 131 0 27 73 0 1 0 3104 2804500 60280 4552400 0 0 641 133 0 23 77 0 Here the second copy of ksmpages is started. 2 0 3104 2660544 60280 4552400 0 0 711 178 0 28 72 0 1 0 3104 1754096 60280 4552400 0 0 839 172 1 47 53 0 1G of ram has been allocated and initialized by ksmpages. 1 0 3104 1753848 60280 4552400 0 0 632 122 0 27 73 0 1 0 3104 1753328 60280 4552400 0 0 661 167 0 23 77 0 1 0 3104 1753104 60280 4552400 0 0 635 129 0 25 75 0 1 0 3104 1752856 60280 4552400 0 0 635 127 0 25 75 0 1 0 3104 1752608 60280 4552400 0 0 677 158 0 27 73 0 1 0 3104 1752360 60280 4552400 0 0 636 132 0 23 77 0 1 0 3104 1752112 60280 4552400 0 0 638 133 0 25 75 0 1 0 3104 1751864 60280 4552400 0 0 665 149 0 25 75 0 It takes around 8 seconds for kksmd to complete a full scan of the 1G indexed in the unstable tree plus the refresh of the checksum of the whole 2G registered. 1 0 3104 1758944 60280 4552400 0 0 649 122 0 27 73 0 1 0 3104 1772316 60280 4552400 0 0 660 128 0 23 77 0 1 0 3104 1784668 60280 4552400 0 0 711 159 0 25 75 0 1 0 3104 1796252 60280 4552400 0 0 669 138 0 25 75 0 1 0 3104 1807908 60280 4552400 0 0 653 124 0 27 73 0 1 0 3104 1819044 60280 4552400 0 0 677 148 0 23 77 0 1 0 3104 1829684 60280 4552400 0 0 649
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Tue, 31 Mar 2009 15:21:53 +0300 Izik Eidus iei...@redhat.com wrote: kpage is actually what going to be KsmPage - the shared page... Right now this pages are not swappable..., after ksm will be merged we will make this pages swappable as well... sure. If so, please - show the amount of kpage - allow users to set limit for usage of kpages. or preserve kpages at boot or by user's command. kpage actually save memory..., and limiting the number of them, would make you limit the number of shared pages... Ah, I'm working for memory control cgroup. And *KSM* will be out of control. It's ok to make the default limit value as INFINITY. but please add knobs. Thanks, -Kame -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] add ksm kernel shared memory driver.
Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. Ksm api: KSM_GET_API_VERSION: Give the userspace the api version of the module. KSM_CREATE_SHARED_MEMORY_AREA: Create shared memory reagion fd, that latter allow the user to register the memory region to scan by using: KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION KSM_START_STOP_KTHREAD: Return information about the kernel thread, the inforamtion is returned using the ksm_kthread_info structure: ksm_kthread_info: __u32 sleep: number of microsecoends to sleep between each iteration of scanning. __u32 pages_to_scan: number of pages to scan for each iteration of scanning. __u32 max_pages_to_merge: maximum number of pages to merge in each iteration of scanning (so even if there are still more pages to scan, we stop this iteration) __u32 flags: flags to control ksmd (right now just ksm_control_flags_run available) KSM_REGISTER_MEMORY_REGION: Register userspace virtual address range to be scanned by ksm. This ioctl is using the ksm_memory_region structure: ksm_memory_region: __u32 npages; number of pages to share inside this memory region. __u32 pad; __u64 addr: the begining of the virtual address of this region. KSM_REMOVE_MEMORY_REGION: Remove memory region from ksm. Signed-off-by: Izik Eidus iei...@redhat.com --- include/linux/ksm.h| 69 +++ include/linux/miscdevice.h |1 + mm/Kconfig |6 + mm/Makefile|1 + mm/ksm.c | 1431 5 files changed, 1508 insertions(+), 0 deletions(-) create mode 100644 include/linux/ksm.h create mode 100644 mm/ksm.c diff --git a/include/linux/ksm.h b/include/linux/ksm.h new file mode 100644 index 000..5776dce --- /dev/null +++ b/include/linux/ksm.h @@ -0,0 +1,69 @@ +#ifndef __LINUX_KSM_H +#define __LINUX_KSM_H + +/* + * Userspace interface for /dev/ksm - kvm shared memory + */ + +#include linux/types.h +#include linux/ioctl.h + +#include asm/types.h + +#define KSM_API_VERSION 1 + +#define ksm_control_flags_run 1 + +/* for KSM_REGISTER_MEMORY_REGION */ +struct ksm_memory_region { + __u32 npages; /* number of pages to share */ + __u32 pad; + __u64 addr; /* the begining of the virtual address */ +__u64 reserved_bits; +}; + +struct ksm_kthread_info { + __u32 sleep; /* number of microsecoends to sleep */ + __u32 pages_to_scan; /* number of pages to scan */ + __u32 flags; /* control flags */ +__u32 pad; +__u64 reserved_bits; +}; + +#define KSMIO 0xAB + +/* ioctls for /dev/ksm */ + +#define KSM_GET_API_VERSION _IO(KSMIO, 0x00) +/* + * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd + */ +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO, 0x01) /* return SMA fd */ +/* + * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed + * (can stop the kernel thread from working by setting running = 0) + */ +#define KSM_START_STOP_KTHREAD _IOW(KSMIO, 0x02,\ + struct ksm_kthread_info) +/* + * KSM_GET_INFO_KTHREAD - return information about the kernel thread + * scanning speed. + */ +#define KSM_GET_INFO_KTHREAD_IOW(KSMIO, 0x03,\ + struct ksm_kthread_info) + + +/* ioctls for SMA fds */ + +/* + * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be + * scanned by kvm. + */ +#define KSM_REGISTER_MEMORY_REGION _IOW(KSMIO, 0x20,\ + struct ksm_memory_region) +/* + * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm. + */ +#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO, 0x21) + +#endif diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index a820f81..6d4f8df 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -29,6 +29,7 @@
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Izik Eidus wrote: Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. Ksm api: KSM_GET_API_VERSION: Give the userspace the api version of the module. KSM_CREATE_SHARED_MEMORY_AREA: Create shared memory reagion fd, that latter allow the user to register the memory region to scan by using: KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION KSM_START_STOP_KTHREAD: Return information about the kernel thread, the inforamtion is returned using the ksm_kthread_info structure: ksm_kthread_info: __u32 sleep: number of microsecoends to sleep between each iteration of scanning. __u32 pages_to_scan: number of pages to scan for each iteration of scanning. __u32 max_pages_to_merge: maximum number of pages to merge in each iteration of scanning (so even if there are still more pages to scan, we stop this iteration) __u32 flags: flags to control ksmd (right now just ksm_control_flags_run available) Wouldn't this make more sense as a sysfs interface? That is, the KSM_START_STOP_KTHREAD part, not necessarily the rest of the API. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
On Tue, 31 Mar 2009 02:59:20 +0300 Izik Eidus iei...@redhat.com wrote: Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. Ksm api: KSM_GET_API_VERSION: Give the userspace the api version of the module. KSM_CREATE_SHARED_MEMORY_AREA: Create shared memory reagion fd, that latter allow the user to register the memory region to scan by using: KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION KSM_START_STOP_KTHREAD: Return information about the kernel thread, the inforamtion is returned using the ksm_kthread_info structure: ksm_kthread_info: __u32 sleep: number of microsecoends to sleep between each iteration of scanning. __u32 pages_to_scan: number of pages to scan for each iteration of scanning. __u32 max_pages_to_merge: maximum number of pages to merge in each iteration of scanning (so even if there are still more pages to scan, we stop this iteration) __u32 flags: flags to control ksmd (right now just ksm_control_flags_run available) KSM_REGISTER_MEMORY_REGION: Register userspace virtual address range to be scanned by ksm. This ioctl is using the ksm_memory_region structure: ksm_memory_region: __u32 npages; number of pages to share inside this memory region. __u32 pad; __u64 addr: the begining of the virtual address of this region. KSM_REMOVE_MEMORY_REGION: Remove memory region from ksm. Signed-off-by: Izik Eidus iei...@redhat.com --- include/linux/ksm.h| 69 +++ include/linux/miscdevice.h |1 + mm/Kconfig |6 + mm/Makefile|1 + mm/ksm.c | 1431 5 files changed, 1508 insertions(+), 0 deletions(-) create mode 100644 include/linux/ksm.h create mode 100644 mm/ksm.c diff --git a/include/linux/ksm.h b/include/linux/ksm.h new file mode 100644 index 000..5776dce --- /dev/null +++ b/include/linux/ksm.h @@ -0,0 +1,69 @@ +#ifndef __LINUX_KSM_H +#define __LINUX_KSM_H + +/* + * Userspace interface for /dev/ksm - kvm shared memory + */ + +#include linux/types.h +#include linux/ioctl.h + +#include asm/types.h + +#define KSM_API_VERSION 1 + +#define ksm_control_flags_run 1 + +/* for KSM_REGISTER_MEMORY_REGION */ +struct ksm_memory_region { + __u32 npages; /* number of pages to share */ + __u32 pad; + __u64 addr; /* the begining of the virtual address */ +__u64 reserved_bits; +}; + +struct ksm_kthread_info { + __u32 sleep; /* number of microsecoends to sleep */ + __u32 pages_to_scan; /* number of pages to scan */ + __u32 flags; /* control flags */ +__u32 pad; +__u64 reserved_bits; +}; + +#define KSMIO 0xAB + +/* ioctls for /dev/ksm */ + +#define KSM_GET_API_VERSION _IO(KSMIO, 0x00) +/* + * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd + */ +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO, 0x01) /* return SMA fd */ +/* + * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed + * (can stop the kernel thread from working by setting running = 0) + */ +#define KSM_START_STOP_KTHREAD_IOW(KSMIO, 0x02,\ + struct ksm_kthread_info) +/* + * KSM_GET_INFO_KTHREAD - return information about the kernel thread + * scanning speed. + */ +#define KSM_GET_INFO_KTHREAD _IOW(KSMIO, 0x03,\ + struct ksm_kthread_info) + + +/* ioctls for SMA fds */ + +/* + * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be + * scanned by kvm. + */ +#define KSM_REGISTER_MEMORY_REGION _IOW(KSMIO, 0x20,\ + struct ksm_memory_region) +/* + * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm. + */ +#define KSM_REMOVE_MEMORY_REGION