[PATCH v2] zswap: Update with same-value filled page feature
From: Srividya Desireddy <srividya...@samsung.com> Date: Wed, 6 Dec 2017 16:29:50 +0530 Subject: [PATCH v2] zswap: Update with same-value filled page feature Changes since v1: Updated to clarify about zswap.same_filled_pages_enabled parameter. Updated zswap document with details on same-value filled pages identification feature. The usage of zswap.same_filled_pages_enabled module parameter is explained. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- Documentation/vm/zswap.txt | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt index 89fff7d..0b3a114 100644 --- a/Documentation/vm/zswap.txt +++ b/Documentation/vm/zswap.txt @@ -98,5 +98,25 @@ request is made for a page in an old zpool, it is uncompressed using its original compressor. Once all pages are removed from an old zpool, the zpool and its compressor are freed. +Some of the pages in zswap are same-value filled pages (i.e. contents of the +page have same value or repetitive pattern). These pages include zero-filled +pages and they are handled differently. During store operation, a page is +checked if it is a same-value filled page before compressing it. If true, the +compressed length of the page is set to zero and the pattern or same-filled +value is stored. + +Same-value filled pages identification feature is enabled by default and can be +disabled at boot time by setting the "same_filled_pages_enabled" attribute to 0, +e.g. zswap.same_filled_pages_enabled=0. It can also be enabled and disabled at +runtime using the sysfs "same_filled_pages_enabled" attribute, e.g. + +echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled + +When zswap same-filled page identification is disabled at runtime, it will stop +checking for the same-value filled pages during store operation. However, the +existing pages which are marked as same-value filled pages remain stored +unchanged in zswap until they are either loaded or invalidated. + A debugfs interface is provided for various statistic about pool size, number -of pages stored, and various counters for the reasons pages are rejected. +of pages stored, same-value filled pages and various counters for the reasons +pages are rejected. -- 2.7.4
[PATCH v2] zswap: Update with same-value filled page feature
From: Srividya Desireddy Date: Wed, 6 Dec 2017 16:29:50 +0530 Subject: [PATCH v2] zswap: Update with same-value filled page feature Changes since v1: Updated to clarify about zswap.same_filled_pages_enabled parameter. Updated zswap document with details on same-value filled pages identification feature. The usage of zswap.same_filled_pages_enabled module parameter is explained. Signed-off-by: Srividya Desireddy --- Documentation/vm/zswap.txt | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt index 89fff7d..0b3a114 100644 --- a/Documentation/vm/zswap.txt +++ b/Documentation/vm/zswap.txt @@ -98,5 +98,25 @@ request is made for a page in an old zpool, it is uncompressed using its original compressor. Once all pages are removed from an old zpool, the zpool and its compressor are freed. +Some of the pages in zswap are same-value filled pages (i.e. contents of the +page have same value or repetitive pattern). These pages include zero-filled +pages and they are handled differently. During store operation, a page is +checked if it is a same-value filled page before compressing it. If true, the +compressed length of the page is set to zero and the pattern or same-filled +value is stored. + +Same-value filled pages identification feature is enabled by default and can be +disabled at boot time by setting the "same_filled_pages_enabled" attribute to 0, +e.g. zswap.same_filled_pages_enabled=0. It can also be enabled and disabled at +runtime using the sysfs "same_filled_pages_enabled" attribute, e.g. + +echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled + +When zswap same-filled page identification is disabled at runtime, it will stop +checking for the same-value filled pages during store operation. However, the +existing pages which are marked as same-value filled pages remain stored +unchanged in zswap until they are either loaded or invalidated. + A debugfs interface is provided for various statistic about pool size, number -of pages stored, and various counters for the reasons pages are rejected. +of pages stored, same-value filled pages and various counters for the reasons +pages are rejected. -- 2.7.4
[PATCH] zswap: Update with same-value filled page feature
From: Srividya Desireddy <srividya...@samsung.com> Date: Wed, 29 Nov 2017 20:23:15 +0530 Subject: [PATCH] zswap: Update with same-value filled page feature Updated zswap document with details on same-value filled pages identification feature. The usage of zswap.same_filled_pages_enabled module parameter is explained. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- Documentation/vm/zswap.txt | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt index 89fff7d..cc015b5 100644 --- a/Documentation/vm/zswap.txt +++ b/Documentation/vm/zswap.txt @@ -98,5 +98,25 @@ request is made for a page in an old zpool, it is uncompressed using its original compressor. Once all pages are removed from an old zpool, the zpool and its compressor are freed. +Some of the pages in zswap are same-value filled pages (i.e. contents of the +page have same value or repetitive pattern). These pages include zero-filled +pages and they are handled differently. During store operation, a page is +checked if it is a same-value filled page before compressing it. If true, the +compressed length of the page is set to zero and the pattern or same-filled +value is stored. + +Same-value filled pages identification feature is enabled by default and can be +disabled at boot time by setting the "same_filled_pages_enabled" attribute to 0, +e.g. zswap.same_filled_pages_enabled=0. It can also be enabled and disabled at +runtime using the sysfs "same_filled_pages_enabled" attribute, e.g. + +echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled + +When zswap same-filled page identification is disabled at runtime, it will stop +checking for the same-value filled pages during store operation. However, the +existing pages which are marked as same-value filled pages will be loaded or +invalidated. + A debugfs interface is provided for various statistic about pool size, number -of pages stored, and various counters for the reasons pages are rejected. +of pages stored, same-value filled pages and various counters for the reasons +pages are rejected. -- 2.7.4
[PATCH] zswap: Update with same-value filled page feature
From: Srividya Desireddy Date: Wed, 29 Nov 2017 20:23:15 +0530 Subject: [PATCH] zswap: Update with same-value filled page feature Updated zswap document with details on same-value filled pages identification feature. The usage of zswap.same_filled_pages_enabled module parameter is explained. Signed-off-by: Srividya Desireddy --- Documentation/vm/zswap.txt | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt index 89fff7d..cc015b5 100644 --- a/Documentation/vm/zswap.txt +++ b/Documentation/vm/zswap.txt @@ -98,5 +98,25 @@ request is made for a page in an old zpool, it is uncompressed using its original compressor. Once all pages are removed from an old zpool, the zpool and its compressor are freed. +Some of the pages in zswap are same-value filled pages (i.e. contents of the +page have same value or repetitive pattern). These pages include zero-filled +pages and they are handled differently. During store operation, a page is +checked if it is a same-value filled page before compressing it. If true, the +compressed length of the page is set to zero and the pattern or same-filled +value is stored. + +Same-value filled pages identification feature is enabled by default and can be +disabled at boot time by setting the "same_filled_pages_enabled" attribute to 0, +e.g. zswap.same_filled_pages_enabled=0. It can also be enabled and disabled at +runtime using the sysfs "same_filled_pages_enabled" attribute, e.g. + +echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled + +When zswap same-filled page identification is disabled at runtime, it will stop +checking for the same-value filled pages during store operation. However, the +existing pages which are marked as same-value filled pages will be loaded or +invalidated. + A debugfs interface is provided for various statistic about pool size, number -of pages stored, and various counters for the reasons pages are rejected. +of pages stored, same-value filled pages and various counters for the reasons +pages are rejected. -- 2.7.4
[PATCH v2] zswap: Same-filled pages handling
From: Srividya Desireddy <srividya...@samsung.com> Date: Sat, 18 Nov 2017 18:29:16 +0530 Subject: [PATCH v2] zswap: Same-filled pages handling Changes since v1 : Added memset_l instead of for loop. Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are same-filled pages (i.e. contents of the page are all same), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify same-filled page before compression of the page. If the page is a same-filled page, set zswap_entry.length to zero, save the same-filled value and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if value of zswap_entry.length is zero corresponding to the page to be loaded. If zswap_entry.length is zero, fill the page with same-filled value. This saves the decompression time during load. On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications, out of ~64000 pages stored in zswap, ~11000 pages were same-value filled pages (including zero-filled pages) and ~9000 pages were zero-filled pages. An average of 17% of pages(including zero-filled pages) in zswap are same-value filled pages and 14% pages are zero-filled pages. An average of 3% of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 26.5ms 18ms 32% (of same value pages) *Zswap Load Time (of same value pages) 25.5ms 13ms 49% - On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts and running multimedia applications, out of 36 pages stored in zswap 78000(~22%) of pages were found to be same-value filled pages (including zero-filled pages) and 64000(~17%) are zero-filled pages. So an average of %5 of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 91ms74ms 19% (of same value pages) *Zswap Load Time50ms7.5ms 85% (of same value pages) - *The execution times may vary with test device used. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- mm/zswap.c | 71 +- 1 file changed, 66 insertions(+), 5 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index d39581a..1133b4ce 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -49,6 +49,8 @@ static u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +/* The number of same-value filled pages currently stored in zswap */ +static atomic_t zswap_same_filled_pages = ATOMIC_INIT(0); /* * The statistics below are not protected from concurrent access for @@ -116,6 +118,11 @@ module_param_cb(zpool, _zpool_param_ops, _zpool_type, 0644); static unsigned int zswap_max_pool_percent = 20; module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); +/* Enable/disable handling same-value filled pages (enabled by default) */ +static bool zswap_same_filled_pages_enabled = true; +module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, + bool, 0644); + /* * data structures **/ @@ -145,9 +152,10 @@ struct zswap_pool { *be held while changing the refcount. Since the lock must *be held, there is no reason to also make refcount atomic. * length - the length in bytes of the compressed page data. Needed during - * decompression + * decompression. For a same value filled page length is 0. * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data + * value - value of the same-value filled pages which have same content */ struct zswap_entry { struct rb_node rbnode; @@ -155,7 +163,10 @@ struct zswap_entry { int refcount; unsigned int length; struct zswap_pool *pool; - unsigned long handle; + union { + unsigned long handle; + unsigned long value; + }; }; struct zswap_header { @@ -320,8 +331,12 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
[PATCH v2] zswap: Same-filled pages handling
From: Srividya Desireddy Date: Sat, 18 Nov 2017 18:29:16 +0530 Subject: [PATCH v2] zswap: Same-filled pages handling Changes since v1 : Added memset_l instead of for loop. Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are same-filled pages (i.e. contents of the page are all same), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify same-filled page before compression of the page. If the page is a same-filled page, set zswap_entry.length to zero, save the same-filled value and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if value of zswap_entry.length is zero corresponding to the page to be loaded. If zswap_entry.length is zero, fill the page with same-filled value. This saves the decompression time during load. On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications, out of ~64000 pages stored in zswap, ~11000 pages were same-value filled pages (including zero-filled pages) and ~9000 pages were zero-filled pages. An average of 17% of pages(including zero-filled pages) in zswap are same-value filled pages and 14% pages are zero-filled pages. An average of 3% of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 26.5ms 18ms 32% (of same value pages) *Zswap Load Time (of same value pages) 25.5ms 13ms 49% - On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts and running multimedia applications, out of 36 pages stored in zswap 78000(~22%) of pages were found to be same-value filled pages (including zero-filled pages) and 64000(~17%) are zero-filled pages. So an average of %5 of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 91ms74ms 19% (of same value pages) *Zswap Load Time50ms7.5ms 85% (of same value pages) - *The execution times may vary with test device used. Signed-off-by: Srividya Desireddy --- mm/zswap.c | 71 +- 1 file changed, 66 insertions(+), 5 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index d39581a..1133b4ce 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -49,6 +49,8 @@ static u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +/* The number of same-value filled pages currently stored in zswap */ +static atomic_t zswap_same_filled_pages = ATOMIC_INIT(0); /* * The statistics below are not protected from concurrent access for @@ -116,6 +118,11 @@ module_param_cb(zpool, _zpool_param_ops, _zpool_type, 0644); static unsigned int zswap_max_pool_percent = 20; module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); +/* Enable/disable handling same-value filled pages (enabled by default) */ +static bool zswap_same_filled_pages_enabled = true; +module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, + bool, 0644); + /* * data structures **/ @@ -145,9 +152,10 @@ struct zswap_pool { *be held while changing the refcount. Since the lock must *be held, there is no reason to also make refcount atomic. * length - the length in bytes of the compressed page data. Needed during - * decompression + * decompression. For a same value filled page length is 0. * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data + * value - value of the same-value filled pages which have same content */ struct zswap_entry { struct rb_node rbnode; @@ -155,7 +163,10 @@ struct zswap_entry { int refcount; unsigned int length; struct zswap_pool *pool; - unsigned long handle; + union { + unsigned long handle; + unsigned long value; + }; }; struct zswap_header { @@ -320,8 +331,12 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) */ static void zswap_free_entry(struct zswap_entry *entry
Re: [PATCH] zswap: Same-filled pages handling
On Wed, Oct 19, 2017 at 6:38 AM, Matthew Wilcox wrote: > On Thu, Oct 19, 2017 at 12:31:18AM +0300, Timofey Titovets wrote: >> > +static void zswap_fill_page(void *ptr, unsigned long value) >> > +{ >> > + unsigned int pos; >> > + unsigned long *page; >> > + >> > + page = (unsigned long *)ptr; >> > + if (value == 0) >> > + memset(page, 0, PAGE_SIZE); >> > + else { >> > + for (pos = 0; pos < PAGE_SIZE / sizeof(*page); pos++) >> > + page[pos] = value; >> > + } >> > +} >> >> Same here, but with memcpy(). > >No. Use memset_l which is optimised for this specific job. I have tested this patch using memset_l() function in zswap_fill_page() on x86 64-bit system with 2GB RAM. The performance remains same. But, memset_l() funcion might be optimised in future. @Seth Jennings/Dan Streetman: Should I use memset_l() function in this patch.
Re: [PATCH] zswap: Same-filled pages handling
On Wed, Oct 19, 2017 at 6:38 AM, Matthew Wilcox wrote: > On Thu, Oct 19, 2017 at 12:31:18AM +0300, Timofey Titovets wrote: >> > +static void zswap_fill_page(void *ptr, unsigned long value) >> > +{ >> > + unsigned int pos; >> > + unsigned long *page; >> > + >> > + page = (unsigned long *)ptr; >> > + if (value == 0) >> > + memset(page, 0, PAGE_SIZE); >> > + else { >> > + for (pos = 0; pos < PAGE_SIZE / sizeof(*page); pos++) >> > + page[pos] = value; >> > + } >> > +} >> >> Same here, but with memcpy(). > >No. Use memset_l which is optimised for this specific job. I have tested this patch using memset_l() function in zswap_fill_page() on x86 64-bit system with 2GB RAM. The performance remains same. But, memset_l() funcion might be optimised in future. @Seth Jennings/Dan Streetman: Should I use memset_l() function in this patch.
Re: [PATCH] zswap: Same-filled pages handling
On Wed, Oct 18, 2017 at 7:41 PM, Matthew Wilcox wrote: > On Wed, Oct 18, 2017 at 04:33:43PM +0300, Timofey Titovets wrote: >> 2017-10-18 15:34 GMT+03:00 Matthew Wilcox <wi...@infradead.org>: >> > On Wed, Oct 18, 2017 at 10:48:32AM +, Srividya Desireddy wrote: >> >> +static void zswap_fill_page(void *ptr, unsigned long value) >> >> +{ >> >> + unsigned int pos; >> >> + unsigned long *page; >> >> + >> >> + page = (unsigned long *)ptr; >> >> + if (value == 0) >> >> + memset(page, 0, PAGE_SIZE); >> >> + else { >> >> + for (pos = 0; pos < PAGE_SIZE / sizeof(*page); pos++) >> >> + page[pos] = value; >> >> + } >> >> +} >> > >> > I think you meant: >> > >> > static void zswap_fill_page(void *ptr, unsigned long value) >> > { >> > memset_l(ptr, value, PAGE_SIZE / sizeof(unsigned long)); >> > } >> >> IIRC kernel have special zero page, and if i understand correctly. >> You can map all zero pages to that zero page and not touch zswap completely. >> (Your situation look like some KSM case (i.e. KSM can handle pages >> with same content), but i'm not sure if that applicable there) > >You're confused by the word "same". What Srividya meant was that the >page is filled with a pattern, eg 0xfffefffefffefffe..., not that it is >the same as any other page. In kernel there is a special zero page or empty_zero_page which is in general allocated in paging_init() function, to map all zero pages. But, same-value-filled pages including zero pages exist in memory because applications may be initializing the allocated pages with a value and not using them; or the actual content written to the memory pages during execution itself is same-value, in case of multimedia data for example. I had earlier posted a patch with similar implementaion of KSM concept for Zswap: https://lkml.org/lkml/2016/8/17/171 https://lkml.org/lkml/2017/2/17/612 - Srividya
Re: [PATCH] zswap: Same-filled pages handling
On Wed, Oct 18, 2017 at 7:41 PM, Matthew Wilcox wrote: > On Wed, Oct 18, 2017 at 04:33:43PM +0300, Timofey Titovets wrote: >> 2017-10-18 15:34 GMT+03:00 Matthew Wilcox : >> > On Wed, Oct 18, 2017 at 10:48:32AM +, Srividya Desireddy wrote: >> >> +static void zswap_fill_page(void *ptr, unsigned long value) >> >> +{ >> >> + unsigned int pos; >> >> + unsigned long *page; >> >> + >> >> + page = (unsigned long *)ptr; >> >> + if (value == 0) >> >> + memset(page, 0, PAGE_SIZE); >> >> + else { >> >> + for (pos = 0; pos < PAGE_SIZE / sizeof(*page); pos++) >> >> + page[pos] = value; >> >> + } >> >> +} >> > >> > I think you meant: >> > >> > static void zswap_fill_page(void *ptr, unsigned long value) >> > { >> > memset_l(ptr, value, PAGE_SIZE / sizeof(unsigned long)); >> > } >> >> IIRC kernel have special zero page, and if i understand correctly. >> You can map all zero pages to that zero page and not touch zswap completely. >> (Your situation look like some KSM case (i.e. KSM can handle pages >> with same content), but i'm not sure if that applicable there) > >You're confused by the word "same". What Srividya meant was that the >page is filled with a pattern, eg 0xfffefffefffefffe..., not that it is >the same as any other page. In kernel there is a special zero page or empty_zero_page which is in general allocated in paging_init() function, to map all zero pages. But, same-value-filled pages including zero pages exist in memory because applications may be initializing the allocated pages with a value and not using them; or the actual content written to the memory pages during execution itself is same-value, in case of multimedia data for example. I had earlier posted a patch with similar implementaion of KSM concept for Zswap: https://lkml.org/lkml/2016/8/17/171 https://lkml.org/lkml/2017/2/17/612 - Srividya
[PATCH] zswap: Same-filled pages handling
From: Srividya Desireddy <srividya...@samsung.com> Date: Wed, 18 Oct 2017 15:39:02 +0530 Subject: [PATCH] zswap: Same-filled pages handling Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are same-filled pages (i.e. contents of the page are all same), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify same-filled page before compression of the page. If the page is a same-filled page, set zswap_entry.length to zero, save the same-filled value and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if value of zswap_entry.length is zero corresponding to the page to be loaded. If zswap_entry.length is zero, fill the page with same-filled value. This saves the decompression time during load. On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications, out of ~64000 pages stored in zswap, ~11000 pages were same-value filled pages (including zero-filled pages) and ~9000 pages were zero-filled pages. An average of 17% of pages(including zero-filled pages) in zswap are same-value filled pages and 14% pages are zero-filled pages. An average of 3% of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 26.5ms 18ms 32% (of same value pages) *Zswap Load Time (of same value pages) 25.5ms 13ms 49% - On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts and running multimedia applications, out of 36 pages stored in zswap 78000(~22%) of pages were found to be same-value filled pages (including zero-filled pages) and 64000(~17%) are zero-filled pages. So an average of %5 of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 91ms74ms 19% (of same value pages) *Zswap Load Time50ms7.5ms 85% (of same value pages) - *The execution times may vary with test device used. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- mm/zswap.c | 77 ++ 1 file changed, 72 insertions(+), 5 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index d39581a..4dd8b89 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -49,6 +49,8 @@ static u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +/* The number of same-value filled pages currently stored in zswap */ +static atomic_t zswap_same_filled_pages = ATOMIC_INIT(0); /* * The statistics below are not protected from concurrent access for @@ -116,6 +118,11 @@ static int zswap_compressor_param_set(const char *, static unsigned int zswap_max_pool_percent = 20; module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); +/* Enable/disable handling same-value filled pages (enabled by default) */ +static bool zswap_same_filled_pages_enabled = true; +module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, + bool, 0644); + /* * data structures **/ @@ -145,9 +152,10 @@ struct zswap_pool { *be held while changing the refcount. Since the lock must *be held, there is no reason to also make refcount atomic. * length - the length in bytes of the compressed page data. Needed during - * decompression + * decompression. For a same value filled page length is 0. * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data + * value - value of the same-value filled pages which have same content */ struct zswap_entry { struct rb_node rbnode; @@ -155,7 +163,10 @@ struct zswap_entry { int refcount; unsigned int length; struct zswap_pool *pool; - unsigned long handle; + union { + unsigned long handle; + unsigned long value; + }; }; struct zswap_header { @@ -320,8 +331,12 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) */ static void zswap_free_entry(struct zswap_entry *entry) { -
[PATCH] zswap: Same-filled pages handling
From: Srividya Desireddy Date: Wed, 18 Oct 2017 15:39:02 +0530 Subject: [PATCH] zswap: Same-filled pages handling Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are same-filled pages (i.e. contents of the page are all same), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify same-filled page before compression of the page. If the page is a same-filled page, set zswap_entry.length to zero, save the same-filled value and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if value of zswap_entry.length is zero corresponding to the page to be loaded. If zswap_entry.length is zero, fill the page with same-filled value. This saves the decompression time during load. On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications, out of ~64000 pages stored in zswap, ~11000 pages were same-value filled pages (including zero-filled pages) and ~9000 pages were zero-filled pages. An average of 17% of pages(including zero-filled pages) in zswap are same-value filled pages and 14% pages are zero-filled pages. An average of 3% of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 26.5ms 18ms 32% (of same value pages) *Zswap Load Time (of same value pages) 25.5ms 13ms 49% - On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts and running multimedia applications, out of 36 pages stored in zswap 78000(~22%) of pages were found to be same-value filled pages (including zero-filled pages) and 64000(~17%) are zero-filled pages. So an average of %5 of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 91ms74ms 19% (of same value pages) *Zswap Load Time50ms7.5ms 85% (of same value pages) - *The execution times may vary with test device used. Signed-off-by: Srividya Desireddy --- mm/zswap.c | 77 ++ 1 file changed, 72 insertions(+), 5 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index d39581a..4dd8b89 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -49,6 +49,8 @@ static u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +/* The number of same-value filled pages currently stored in zswap */ +static atomic_t zswap_same_filled_pages = ATOMIC_INIT(0); /* * The statistics below are not protected from concurrent access for @@ -116,6 +118,11 @@ static int zswap_compressor_param_set(const char *, static unsigned int zswap_max_pool_percent = 20; module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); +/* Enable/disable handling same-value filled pages (enabled by default) */ +static bool zswap_same_filled_pages_enabled = true; +module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, + bool, 0644); + /* * data structures **/ @@ -145,9 +152,10 @@ struct zswap_pool { *be held while changing the refcount. Since the lock must *be held, there is no reason to also make refcount atomic. * length - the length in bytes of the compressed page data. Needed during - * decompression + * decompression. For a same value filled page length is 0. * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data + * value - value of the same-value filled pages which have same content */ struct zswap_entry { struct rb_node rbnode; @@ -155,7 +163,10 @@ struct zswap_entry { int refcount; unsigned int length; struct zswap_pool *pool; - unsigned long handle; + union { + unsigned long handle; + unsigned long value; + }; }; struct zswap_header { @@ -320,8 +331,12 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) */ static void zswap_free_entry(struct zswap_entry *entry) { - zpool_free(entry->pool->zpool, entry-&
[PATCH v2] zswap: Zero-filled pages handling
On Thu, Jul 6, 2017 at 3:32 PM, Dan Streetman wrote: > On Thu, Jul 6, 2017 at 5:29 AM, Srividya Desireddy > wrote: >> On Wed, Jul 6, 2017 at 10:49 AM, Sergey Senozhatsky wrote: >>> On (07/02/17 20:28), Seth Jennings wrote: >>>> On Sun, Jul 2, 2017 at 9:19 AM, Srividya Desireddy >>>> > Zswap is a cache which compresses the pages that are being swapped out >>>> > and stores them into a dynamically allocated RAM-based memory pool. >>>> > Experiments have shown that around 10-20% of pages stored in zswap >>>> > are zero-filled pages (i.e. contents of the page are all zeros), but >>>> > these pages are handled as normal pages by compressing and allocating >>>> > memory in the pool. >>>> >>>> I am somewhat surprised that this many anon pages are zero filled. >>>> >>>> If this is true, then maybe we should consider solving this at the >>>> swap level in general, as we can de-dup zero pages in all swap >>>> devices, not just zswap. >>>> >>>> That being said, this is a fair small change and I don't see anything >>>> objectionable. However, I do think the better solution would be to do >>> this at a higher level. >>> >> >> Thank you for your suggestion. It is a better solution to handle >> zero-filled pages before swapping-out to zswap. Since, Zram is already >> handles Zero pages internally, I considered to handle within Zswap. >> In a long run, we can work on it to commonly handle zero-filled anon >> pages. >> >>> zero-filled pages are just 1 case. in general, it's better >>> to handle pages that are memset-ed with the same value (e.g. >>> memset(page, 0x01, page_size)). which includes, but not >>> limited to, 0x00. zram does it. >>> >>> -ss >> >> It is a good solution to extend zero-filled pages handling to same value >> pages. I will work on to identify the percentage of same value pages >> excluding zero-filled pages in Zswap and will get back. > > Yes, this sounds like a good modification to the patch. Also, unless > anyone else disagrees, it may be good to control this with a module > param - in case anyone has a use case that they know won't be helped > by this, and the extra overhead of checking each page is wasteful. > Probably should default to enabled. > >> >> - Srividya I have made changes to patch to handle pages with same-value filled. I tested on a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications. After the test, out of ~64000 pages stored in zswap, ~ 11000 pages were same-value filled pages (including zero-filled pages) and ~9000 pages were zero-filled pages. An average of 17% of pages(including zero-filled pages) in zswap are same-value filled pages and 14% pages are zero-filled pages. An average of 3% of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 26.5ms18ms 32% (of same value pages) *Zswap Load Time (of same value pages) 25.5ms 13ms 49% - On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts and running multimedia applications, out of 36 pages stored in zswap 78000(~22%) of pages were found to be same-value filled pages (including zero-filled pages) and 64000(~17%) are zero-filled pages. So an average of %5 of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 91ms74ms 19% (of same value pages) *Zswap Load Time50ms7.5ms 85% (of same value pages) - *The execution times may vary with test device used. I will send this patch of handling same-value filled pages along with module param to control it(default being enabled). - Srividya
[PATCH v2] zswap: Zero-filled pages handling
On Thu, Jul 6, 2017 at 3:32 PM, Dan Streetman wrote: > On Thu, Jul 6, 2017 at 5:29 AM, Srividya Desireddy > wrote: >> On Wed, Jul 6, 2017 at 10:49 AM, Sergey Senozhatsky wrote: >>> On (07/02/17 20:28), Seth Jennings wrote: >>>> On Sun, Jul 2, 2017 at 9:19 AM, Srividya Desireddy >>>> > Zswap is a cache which compresses the pages that are being swapped out >>>> > and stores them into a dynamically allocated RAM-based memory pool. >>>> > Experiments have shown that around 10-20% of pages stored in zswap >>>> > are zero-filled pages (i.e. contents of the page are all zeros), but >>>> > these pages are handled as normal pages by compressing and allocating >>>> > memory in the pool. >>>> >>>> I am somewhat surprised that this many anon pages are zero filled. >>>> >>>> If this is true, then maybe we should consider solving this at the >>>> swap level in general, as we can de-dup zero pages in all swap >>>> devices, not just zswap. >>>> >>>> That being said, this is a fair small change and I don't see anything >>>> objectionable. However, I do think the better solution would be to do >>> this at a higher level. >>> >> >> Thank you for your suggestion. It is a better solution to handle >> zero-filled pages before swapping-out to zswap. Since, Zram is already >> handles Zero pages internally, I considered to handle within Zswap. >> In a long run, we can work on it to commonly handle zero-filled anon >> pages. >> >>> zero-filled pages are just 1 case. in general, it's better >>> to handle pages that are memset-ed with the same value (e.g. >>> memset(page, 0x01, page_size)). which includes, but not >>> limited to, 0x00. zram does it. >>> >>> -ss >> >> It is a good solution to extend zero-filled pages handling to same value >> pages. I will work on to identify the percentage of same value pages >> excluding zero-filled pages in Zswap and will get back. > > Yes, this sounds like a good modification to the patch. Also, unless > anyone else disagrees, it may be good to control this with a module > param - in case anyone has a use case that they know won't be helped > by this, and the extra overhead of checking each page is wasteful. > Probably should default to enabled. > >> >> - Srividya I have made changes to patch to handle pages with same-value filled. I tested on a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications. After the test, out of ~64000 pages stored in zswap, ~ 11000 pages were same-value filled pages (including zero-filled pages) and ~9000 pages were zero-filled pages. An average of 17% of pages(including zero-filled pages) in zswap are same-value filled pages and 14% pages are zero-filled pages. An average of 3% of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 26.5ms18ms 32% (of same value pages) *Zswap Load Time (of same value pages) 25.5ms 13ms 49% - On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts and running multimedia applications, out of 36 pages stored in zswap 78000(~22%) of pages were found to be same-value filled pages (including zero-filled pages) and 64000(~17%) are zero-filled pages. So an average of %5 of pages are same-filled non-zero pages. The below table shows the execution time profiling with the patch. BaselineWith patch % Improvement - *Zswap Store Time 91ms74ms 19% (of same value pages) *Zswap Load Time50ms7.5ms 85% (of same value pages) - *The execution times may vary with test device used. I will send this patch of handling same-value filled pages along with module param to control it(default being enabled). - Srividya
Re: [PATCH v2] zswap: Zero-filled pages handling
On Wed, Jul 6, 2017 at 10:49 AM, Sergey Senozhatsky wrote: > On (07/02/17 20:28), Seth Jennings wrote: >> On Sun, Jul 2, 2017 at 9:19 AM, Srividya Desireddy >> > Zswap is a cache which compresses the pages that are being swapped out >> > and stores them into a dynamically allocated RAM-based memory pool. >> > Experiments have shown that around 10-20% of pages stored in zswap >> > are zero-filled pages (i.e. contents of the page are all zeros), but >> > these pages are handled as normal pages by compressing and allocating >> > memory in the pool. >> >> I am somewhat surprised that this many anon pages are zero filled. >> >> If this is true, then maybe we should consider solving this at the >> swap level in general, as we can de-dup zero pages in all swap >> devices, not just zswap. >> >> That being said, this is a fair small change and I don't see anything >> objectionable. However, I do think the better solution would be to do > this at a higher level. > Thank you for your suggestion. It is a better solution to handle zero-filled pages before swapping-out to zswap. Since, Zram is already handles Zero pages internally, I considered to handle within Zswap. In a long run, we can work on it to commonly handle zero-filled anon pages. > zero-filled pages are just 1 case. in general, it's better > to handle pages that are memset-ed with the same value (e.g. > memset(page, 0x01, page_size)). which includes, but not > limited to, 0x00. zram does it. > > -ss It is a good solution to extend zero-filled pages handling to same value pages. I will work on to identify the percentage of same value pages excluding zero-filled pages in Zswap and will get back. - Srividya
Re: [PATCH v2] zswap: Zero-filled pages handling
On Wed, Jul 6, 2017 at 10:49 AM, Sergey Senozhatsky wrote: > On (07/02/17 20:28), Seth Jennings wrote: >> On Sun, Jul 2, 2017 at 9:19 AM, Srividya Desireddy >> > Zswap is a cache which compresses the pages that are being swapped out >> > and stores them into a dynamically allocated RAM-based memory pool. >> > Experiments have shown that around 10-20% of pages stored in zswap >> > are zero-filled pages (i.e. contents of the page are all zeros), but >> > these pages are handled as normal pages by compressing and allocating >> > memory in the pool. >> >> I am somewhat surprised that this many anon pages are zero filled. >> >> If this is true, then maybe we should consider solving this at the >> swap level in general, as we can de-dup zero pages in all swap >> devices, not just zswap. >> >> That being said, this is a fair small change and I don't see anything >> objectionable. However, I do think the better solution would be to do > this at a higher level. > Thank you for your suggestion. It is a better solution to handle zero-filled pages before swapping-out to zswap. Since, Zram is already handles Zero pages internally, I considered to handle within Zswap. In a long run, we can work on it to commonly handle zero-filled anon pages. > zero-filled pages are just 1 case. in general, it's better > to handle pages that are memset-ed with the same value (e.g. > memset(page, 0x01, page_size)). which includes, but not > limited to, 0x00. zram does it. > > -ss It is a good solution to extend zero-filled pages handling to same value pages. I will work on to identify the percentage of same value pages excluding zero-filled pages in Zswap and will get back. - Srividya
[PATCH v2] zswap: Zero-filled pages handling
From: Srividya Desireddy <srividya...@samsung.com> Date: Sun, 2 Jul 2017 19:15:37 +0530 Subject: [PATCH v2] zswap: Zero-filled pages handling Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are zero-filled pages (i.e. contents of the page are all zeros), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify zero-filled page before compression of the page. If the page is a zero-filled page, set zswap_entry.zeroflag and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is set for the page in zswap_entry. If the flag is set, memset the page with zero. This saves the decompression time during load. On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts ~15% of pages in zswap were zero pages. With multimedia workload more than 20% of zswap pages were found to be zero pages. On a ARM Quad Core 32-bit device with 1.5GB RAM an average 10% of zero pages were found in zswap (an average of 5000 zero pages found out of ~5 pages stored in zswap) on launching and relaunching 15 applications. The launch time of the applications improved by ~3%. Test Parameters BaselineWith patch Improvement --- Total RAM 1343MB 1343MB Available RAM 451MB 445MB -6MB Avg. Memfree69MB70MB 1MB Avg. Swap Used 226MB 215MB -11MB Avg. App entry time 644msec 623msec 3% With patch, every page swapped to zswap is checked if it is a zero page or not and for all the zero pages compression and memory allocation operations are skipped. Overall there is an improvement of 30% in zswap store time. In case of non-zero pages there is no overhead during zswap page load. For zero pages there is a improvement of more than 60% in the zswap load time as the zero page decompression is avoided. The below table shows the execution time profiling of the patch. Zswap Store Operation BaselineWith patch % Improvement -- * Zero page check-- 22.5ms (for non-zero pages) * Zero page check-- 24ms (for zero pages) * Compression time 55ms -- (of zero pages) * Allocation time 14ms -- (to store compressed zero pages) - Total 69ms46.5ms 32% Zswap Load Operation BaselineWith patch % Improvement - * Decompression time 30.4ms-- (of zero pages) * Zero page check +-- 10.04ms memset operation (of zero pages) - Total 30.4ms 10.04ms 66% *The execution times may vary with test device used. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- mm/zswap.c | 46 ++ 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index eedc278..edc584b 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -49,6 +49,8 @@ static u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +/* The number of zero filled pages swapped out to zswap */ +static atomic_t zswap_zero_pages = ATOMIC_INIT(0); /* * The statistics below are not protected from concurrent access for @@ -145,7 +147,7 @@ struct zswap_pool { *be held while changing the refcount. Since the lock must *be held, there is no reason to also make refcount atomic. * length - the length in bytes of the compressed page data. Needed during - * decompression + * decompression. For a zero page length is 0. * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data */ @@ -320,8 +322,12 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) */ static void zswap_free_entry(struct zswap_entry *entry) { - zpool_free(entry->pool->zpool, entry->handle); - zswap_pool_put(entry->pool); + if (!entry->length) + atomic_dec(_zero_pages); + else { + zpool_free(entry->pool->zpool, entry->handle); + zswap_pool_put(entry->pool); + } zswap_entry_cache_free(entry); atomic_dec(_stored_pages); zswap_update_total_size(); @@ -956,6 +962,19 @@ static int zswap_shrink
[PATCH v2] zswap: Zero-filled pages handling
From: Srividya Desireddy Date: Sun, 2 Jul 2017 19:15:37 +0530 Subject: [PATCH v2] zswap: Zero-filled pages handling Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are zero-filled pages (i.e. contents of the page are all zeros), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify zero-filled page before compression of the page. If the page is a zero-filled page, set zswap_entry.zeroflag and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is set for the page in zswap_entry. If the flag is set, memset the page with zero. This saves the decompression time during load. On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts ~15% of pages in zswap were zero pages. With multimedia workload more than 20% of zswap pages were found to be zero pages. On a ARM Quad Core 32-bit device with 1.5GB RAM an average 10% of zero pages were found in zswap (an average of 5000 zero pages found out of ~5 pages stored in zswap) on launching and relaunching 15 applications. The launch time of the applications improved by ~3%. Test Parameters BaselineWith patch Improvement --- Total RAM 1343MB 1343MB Available RAM 451MB 445MB -6MB Avg. Memfree69MB70MB 1MB Avg. Swap Used 226MB 215MB -11MB Avg. App entry time 644msec 623msec 3% With patch, every page swapped to zswap is checked if it is a zero page or not and for all the zero pages compression and memory allocation operations are skipped. Overall there is an improvement of 30% in zswap store time. In case of non-zero pages there is no overhead during zswap page load. For zero pages there is a improvement of more than 60% in the zswap load time as the zero page decompression is avoided. The below table shows the execution time profiling of the patch. Zswap Store Operation BaselineWith patch % Improvement -- * Zero page check-- 22.5ms (for non-zero pages) * Zero page check-- 24ms (for zero pages) * Compression time 55ms -- (of zero pages) * Allocation time 14ms -- (to store compressed zero pages) - Total 69ms46.5ms 32% Zswap Load Operation BaselineWith patch % Improvement - * Decompression time 30.4ms-- (of zero pages) * Zero page check +-- 10.04ms memset operation (of zero pages) - Total 30.4ms 10.04ms 66% *The execution times may vary with test device used. Signed-off-by: Srividya Desireddy --- mm/zswap.c | 46 ++ 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index eedc278..edc584b 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -49,6 +49,8 @@ static u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +/* The number of zero filled pages swapped out to zswap */ +static atomic_t zswap_zero_pages = ATOMIC_INIT(0); /* * The statistics below are not protected from concurrent access for @@ -145,7 +147,7 @@ struct zswap_pool { *be held while changing the refcount. Since the lock must *be held, there is no reason to also make refcount atomic. * length - the length in bytes of the compressed page data. Needed during - * decompression + * decompression. For a zero page length is 0. * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data */ @@ -320,8 +322,12 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) */ static void zswap_free_entry(struct zswap_entry *entry) { - zpool_free(entry->pool->zpool, entry->handle); - zswap_pool_put(entry->pool); + if (!entry->length) + atomic_dec(_zero_pages); + else { + zpool_free(entry->pool->zpool, entry->handle); + zswap_pool_put(entry->pool); + } zswap_entry_cache_free(entry); atomic_dec(_stored_pages); zswap_update_total_size(); @@ -956,6 +962,19 @@ static int zswap_shrink(void) return ret; } +static int zswap_is
[PATCH] zswap: Zero-filled pages handling
On Sat, Mar 4, 2017 at 02:55 AM, Dan Streetman <ddstr...@ieee.org> wrote: > On Sat, Feb 25, 2017 at 12:18 PM, Sarbojit Ganguly > <unixman.linux...@gmail.com> wrote: >> On 25 February 2017 at 20:12, Srividya Desireddy >> <srividya...@samsung.com> wrote: >>> From: Srividya Desireddy <srividya...@samsung.com> >>> Date: Thu, 23 Feb 2017 15:04:06 +0530 >>> Subject: [PATCH] zswap: Zero-filled pages handling > > your email is base64-encoded; please send plain text emails. > >>> >>> Zswap is a cache which compresses the pages that are being swapped out >>> and stores them into a dynamically allocated RAM-based memory pool. >>> Experiments have shown that around 10-20% of pages stored in zswap >>> are zero-filled pages (i.e. contents of the page are all zeros), but > > 20%? that's a LOT of zero pages...which seems like applications are > wasting a lot of memory. what kind of workload are you testing with? > I have tested this patch with different workloaded on different devices. On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts ~15% of pages in zswap were zero pages. With multimedia workload more than 20% of zswap pages were found to be zero pages. On a ARM Quad Core 32-bit device with 1.5GB RAM an average 10% of zero pages were found on launching and relaunching 15 applications. >>> these pages are handled as normal pages by compressing and allocating >>> memory in the pool. >>> >>> This patch adds a check in zswap_frontswap_store() to identify zero-filled >>> page before compression of the page. If the page is a zero-filled page, set >>> zswap_entry.zeroflag and skip the compression of the page and alloction >>> of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is >>> set for the page in zswap_entry. If the flag is set, memset the page with >>> zero. This saves the decompression time during load. >>> >>> The overall overhead caused to check for a zero-filled page is very minimal >>> when compared to the time saved by avoiding compression and allocation in >>> case of zero-filled pages. Although, compressed size of a zero-filled page >>> is very less, with this patch load time of a zero-filled page is reduced by >>> 80% when compared to baseline. >> >> Is it possible to share the benchmark details? > > Was there an answer to this? > This patch is tested on a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications. With the patch, an average of 5000 pages zero pages found in zswap out of the ~5 pages stored in zswap and application launch time improved by ~3%. Test Parameters BaselineWith patch Improvement --- Total RAM 1343MB 1343MB Available RAM 451MB 445MB -6MB Avg. Memfree69MB70MB 1MB Avg. Swap Used 226MB 215MB -11MB Avg. App entry time 644msec 623msec 3% With patch, every page swapped to zswap is checked if it is a zero page or not and for all the zero pages compression and memory allocation operations are skipped. Overall there is an improvement of 30% in zswap store time. In case of non-zero pages there is no overhead during zswap page load. For zero pages there is a improvement of more than 60% in the zswap load time as the zero page decompression is avoided. The below table shows the execution time profiling of the patch. Zswap Store Operation BaselineWith patch % Improvement -- * Zero page check-- 22.5ms (for non-zero pages) * Zero page check-- 24ms (for zero pages) * Compression time 55ms -- (of zero pages) * Allocation time 14ms -- (to store compressed zero pages) - Total 69ms46.5ms 32% Zswap Load Operation BaselineWith patch % Improvement - * Decompression time 30.4ms-- (of zero pages) * Zero page check +-- 10.04ms memset operation (of zero pages) - Total 30.4ms 10.04ms 66% *The execution times may vary with test device used. >> >> >>> >>> Signed-off-by: Srividya Desireddy <srividya...@samsung.com> >>> --- >>> mm/zswap.c | 48 +--- >>> 1 file changed, 45 insertions(+), 3 deleti
[PATCH] zswap: Zero-filled pages handling
On Sat, Mar 4, 2017 at 02:55 AM, Dan Streetman wrote: > On Sat, Feb 25, 2017 at 12:18 PM, Sarbojit Ganguly > wrote: >> On 25 February 2017 at 20:12, Srividya Desireddy >> wrote: >>> From: Srividya Desireddy >>> Date: Thu, 23 Feb 2017 15:04:06 +0530 >>> Subject: [PATCH] zswap: Zero-filled pages handling > > your email is base64-encoded; please send plain text emails. > >>> >>> Zswap is a cache which compresses the pages that are being swapped out >>> and stores them into a dynamically allocated RAM-based memory pool. >>> Experiments have shown that around 10-20% of pages stored in zswap >>> are zero-filled pages (i.e. contents of the page are all zeros), but > > 20%? that's a LOT of zero pages...which seems like applications are > wasting a lot of memory. what kind of workload are you testing with? > I have tested this patch with different workloaded on different devices. On Ubuntu PC with 2GB RAM, while executing kernel build and other test scripts ~15% of pages in zswap were zero pages. With multimedia workload more than 20% of zswap pages were found to be zero pages. On a ARM Quad Core 32-bit device with 1.5GB RAM an average 10% of zero pages were found on launching and relaunching 15 applications. >>> these pages are handled as normal pages by compressing and allocating >>> memory in the pool. >>> >>> This patch adds a check in zswap_frontswap_store() to identify zero-filled >>> page before compression of the page. If the page is a zero-filled page, set >>> zswap_entry.zeroflag and skip the compression of the page and alloction >>> of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is >>> set for the page in zswap_entry. If the flag is set, memset the page with >>> zero. This saves the decompression time during load. >>> >>> The overall overhead caused to check for a zero-filled page is very minimal >>> when compared to the time saved by avoiding compression and allocation in >>> case of zero-filled pages. Although, compressed size of a zero-filled page >>> is very less, with this patch load time of a zero-filled page is reduced by >>> 80% when compared to baseline. >> >> Is it possible to share the benchmark details? > > Was there an answer to this? > This patch is tested on a ARM Quad Core 32-bit device with 1.5GB RAM by launching and relaunching different applications. With the patch, an average of 5000 pages zero pages found in zswap out of the ~5 pages stored in zswap and application launch time improved by ~3%. Test Parameters BaselineWith patch Improvement --- Total RAM 1343MB 1343MB Available RAM 451MB 445MB -6MB Avg. Memfree69MB70MB 1MB Avg. Swap Used 226MB 215MB -11MB Avg. App entry time 644msec 623msec 3% With patch, every page swapped to zswap is checked if it is a zero page or not and for all the zero pages compression and memory allocation operations are skipped. Overall there is an improvement of 30% in zswap store time. In case of non-zero pages there is no overhead during zswap page load. For zero pages there is a improvement of more than 60% in the zswap load time as the zero page decompression is avoided. The below table shows the execution time profiling of the patch. Zswap Store Operation BaselineWith patch % Improvement -- * Zero page check-- 22.5ms (for non-zero pages) * Zero page check-- 24ms (for zero pages) * Compression time 55ms -- (of zero pages) * Allocation time 14ms -- (to store compressed zero pages) - Total 69ms46.5ms 32% Zswap Load Operation BaselineWith patch % Improvement - * Decompression time 30.4ms-- (of zero pages) * Zero page check +-- 10.04ms memset operation (of zero pages) - Total 30.4ms 10.04ms 66% *The execution times may vary with test device used. >> >> >>> >>> Signed-off-by: Srividya Desireddy >>> --- >>> mm/zswap.c | 48 +--- >>> 1 file changed, 45 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/zswap.c b/mm/zswap.c >>> index 067a0d6..a574008 100644 >>> --- a/mm/zswap.c >>
[PATCH] zswap: Zero-filled pages handling
From: Srividya Desireddy <srividya...@samsung.com> Date: Thu, 23 Feb 2017 15:04:06 +0530 Subject: [PATCH] zswap: Zero-filled pages handling Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are zero-filled pages (i.e. contents of the page are all zeros), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify zero-filled page before compression of the page. If the page is a zero-filled page, set zswap_entry.zeroflag and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is set for the page in zswap_entry. If the flag is set, memset the page with zero. This saves the decompression time during load. The overall overhead caused to check for a zero-filled page is very minimal when compared to the time saved by avoiding compression and allocation in case of zero-filled pages. Although, compressed size of a zero-filled page is very less, with this patch load time of a zero-filled page is reduced by 80% when compared to baseline. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- mm/zswap.c | 48 +--- 1 file changed, 45 insertions(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 067a0d6..a574008 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -49,6 +49,8 @@ static u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +/* The number of zero filled pages swapped out to zswap */ +static atomic_t zswap_zero_pages = ATOMIC_INIT(0); /* * The statistics below are not protected from concurrent access for @@ -140,6 +142,8 @@ struct zswap_pool { * decompression * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data + * zeroflag - the flag is set if the content of the page is filled with + *zeros */ struct zswap_entry { struct rb_node rbnode; @@ -148,6 +152,7 @@ struct zswap_entry { unsigned int length; struct zswap_pool *pool; unsigned long handle; + unsigned char zeroflag; }; struct zswap_header { @@ -236,6 +241,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp) if (!entry) return NULL; entry->refcount = 1; + entry->zeroflag = 0; RB_CLEAR_NODE(>rbnode); return entry; } @@ -306,8 +312,12 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) */ static void zswap_free_entry(struct zswap_entry *entry) { - zpool_free(entry->pool->zpool, entry->handle); - zswap_pool_put(entry->pool); + if (entry->zeroflag) + atomic_dec(_zero_pages); + else { + zpool_free(entry->pool->zpool, entry->handle); + zswap_pool_put(entry->pool); + } zswap_entry_cache_free(entry); atomic_dec(_stored_pages); zswap_update_total_size(); @@ -877,6 +887,19 @@ static int zswap_shrink(void) return ret; } +static int zswap_is_page_zero_filled(void *ptr) +{ + unsigned int pos; + unsigned long *page; + + page = (unsigned long *)ptr; + for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) { + if (page[pos]) + return 0; + } + return 1; +} + /* * frontswap hooks **/ @@ -917,6 +940,15 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, goto reject; } + src = kmap_atomic(page); + if (zswap_is_page_zero_filled(src)) { + kunmap_atomic(src); + entry->offset = offset; + entry->zeroflag = 1; + atomic_inc(_zero_pages); + goto insert_entry; + } + /* if entry is successfully added, it keeps the reference */ entry->pool = zswap_pool_current_get(); if (!entry->pool) { @@ -927,7 +959,6 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, /* compress */ dst = get_cpu_var(zswap_dstmem); tfm = *get_cpu_ptr(entry->pool->tfm); - src = kmap_atomic(page); ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, ); kunmap_atomic(src); put_cpu_ptr(entry->pool->tfm); @@ -961,6 +992,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, entry->handle = handle; entry->length = dlen; +insert_entry: /* map */ spin_lock(>lock); do { @@ -1013,6 +1045,13 @@ static int zswap_f
[PATCH] zswap: Zero-filled pages handling
From: Srividya Desireddy Date: Thu, 23 Feb 2017 15:04:06 +0530 Subject: [PATCH] zswap: Zero-filled pages handling Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-20% of pages stored in zswap are zero-filled pages (i.e. contents of the page are all zeros), but these pages are handled as normal pages by compressing and allocating memory in the pool. This patch adds a check in zswap_frontswap_store() to identify zero-filled page before compression of the page. If the page is a zero-filled page, set zswap_entry.zeroflag and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is set for the page in zswap_entry. If the flag is set, memset the page with zero. This saves the decompression time during load. The overall overhead caused to check for a zero-filled page is very minimal when compared to the time saved by avoiding compression and allocation in case of zero-filled pages. Although, compressed size of a zero-filled page is very less, with this patch load time of a zero-filled page is reduced by 80% when compared to baseline. Signed-off-by: Srividya Desireddy --- mm/zswap.c | 48 +--- 1 file changed, 45 insertions(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 067a0d6..a574008 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -49,6 +49,8 @@ static u64 zswap_pool_total_size; /* The number of compressed pages currently stored in zswap */ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); +/* The number of zero filled pages swapped out to zswap */ +static atomic_t zswap_zero_pages = ATOMIC_INIT(0); /* * The statistics below are not protected from concurrent access for @@ -140,6 +142,8 @@ struct zswap_pool { * decompression * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data + * zeroflag - the flag is set if the content of the page is filled with + *zeros */ struct zswap_entry { struct rb_node rbnode; @@ -148,6 +152,7 @@ struct zswap_entry { unsigned int length; struct zswap_pool *pool; unsigned long handle; + unsigned char zeroflag; }; struct zswap_header { @@ -236,6 +241,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp) if (!entry) return NULL; entry->refcount = 1; + entry->zeroflag = 0; RB_CLEAR_NODE(>rbnode); return entry; } @@ -306,8 +312,12 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) */ static void zswap_free_entry(struct zswap_entry *entry) { - zpool_free(entry->pool->zpool, entry->handle); - zswap_pool_put(entry->pool); + if (entry->zeroflag) + atomic_dec(_zero_pages); + else { + zpool_free(entry->pool->zpool, entry->handle); + zswap_pool_put(entry->pool); + } zswap_entry_cache_free(entry); atomic_dec(_stored_pages); zswap_update_total_size(); @@ -877,6 +887,19 @@ static int zswap_shrink(void) return ret; } +static int zswap_is_page_zero_filled(void *ptr) +{ + unsigned int pos; + unsigned long *page; + + page = (unsigned long *)ptr; + for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) { + if (page[pos]) + return 0; + } + return 1; +} + /* * frontswap hooks **/ @@ -917,6 +940,15 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, goto reject; } + src = kmap_atomic(page); + if (zswap_is_page_zero_filled(src)) { + kunmap_atomic(src); + entry->offset = offset; + entry->zeroflag = 1; + atomic_inc(_zero_pages); + goto insert_entry; + } + /* if entry is successfully added, it keeps the reference */ entry->pool = zswap_pool_current_get(); if (!entry->pool) { @@ -927,7 +959,6 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, /* compress */ dst = get_cpu_var(zswap_dstmem); tfm = *get_cpu_ptr(entry->pool->tfm); - src = kmap_atomic(page); ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, ); kunmap_atomic(src); put_cpu_ptr(entry->pool->tfm); @@ -961,6 +992,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, entry->handle = handle; entry->length = dlen; +insert_entry: /* map */ spin_lock(>lock); do { @@ -1013,6 +1045,13 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, } spi
[PATCH 0/4] zswap: Optimize compressed pool memory utilization
Could you please review this patch series and update if any corrections are needed in the patch-set. -Srividya On Fri, Aug 19, 2016 at 11:04 AM, Srividya Desireddy wrote: > On 17 August 2016 at 18:08, Pekka Enberg wrote: >> On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy >> wrote: >>> This series of patches optimize the memory utilized by zswap for storing >>> the swapped out pages. >>> >>> Zswap is a cache which compresses the pages that are being swapped out >>> and stores them into a dynamically allocated RAM-based memory pool. >>> Experiments have shown that around 10-15% of pages stored in zswap are >>> duplicates which results in 10-12% more RAM required to store these >>> duplicate compressed pages. Around 10-20% of pages stored in zswap >>> are zero-filled pages, but these pages are handled as normal pages by >>> compressing and allocating memory in the pool. >>> >>> The following patch-set optimizes memory utilized by zswap by avoiding the >>> storage of duplicate pages and zero-filled pages in zswap compressed memory >>> pool. >>> >>> Patch 1/4: zswap: Share zpool memory of duplicate pages >>> This patch shares compressed pool memory of the duplicate pages. When a new >>> page is requested for swap-out to zswap; search for an identical page in >>> the pages already stored in zswap. If an identical page is found then share >>> the compressed page data of the identical page with the new page. This >>> avoids allocation of memory in the compressed pool for a duplicate page. >>> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing >>> performance test at low memory conditions. Around 15-20% of the pages >>> swapped are duplicate of the pages existing in zswap, resulting in 15% >>> saving of zswap memory pool when compared to the baseline version. >>> >>> Test Parameters BaselineWith patch Improvement >>> Total RAM 955MB 955MB >>> Available RAM 254MB 269MB 15MB >>> Avg. App entry time 2.469sec2.207sec7% >>> Avg. App close time 1.151sec1.085sec6% >>> Apps launched in 1sec 5 12 7 >>> >>> There is little overhead in zswap store function due to the search >>> operation for finding duplicate pages. However, if duplicate page is >>> found it saves the compression and allocation time of the page. The average >>> overhead per zswap_frontswap_store() function call in the experimental >>> device is 9us. There is no overhead in case of zswap_frontswap_load() >>> operation. >>> >>> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime >>> This patch adds a module parameter to enable or disable the sharing of >>> duplicate zswap pages at runtime. >>> >>> Patch 3/4: zswap: Zero-filled pages handling >>> This patch checks if a page to be stored in zswap is a zero-filled page >>> (i.e. contents of the page are all zeros). If such page is found, >>> compression and allocation of memory for the compressed page is avoided >>> and instead the page is just marked as zero-filled page. >>> Although, compressed size of a zero-filled page using LZO compressor is >>> very less (52 bytes including zswap_header), this patch saves compression >>> and allocation time during store operation and decompression time during >>> zswap load operation for zero-filled pages. Experiments have shown that >>> around 10-20% of pages stored in zswap are zero-filled. >> >> Aren't zero-filled pages already handled by patch 1/4 as their >> contents match? So the overall memory saving is 52 bytes? >> >> - Pekka > > Thanks for the quick reply. > > Zero-filled pages can also be handled by patch 1/4. It performs > searching of a duplicate page among existing stored pages in zswap. > Its been observed that average search time to identify duplicate zero > filled pages(using patch 1/4) is almost thrice compared to checking > all pages for zero-filled. > > Also, in case of patch 1/4, the zswap_frontswap_load() operation requires > the compressed zero-filled page to be decompressed. zswap_frontswap_load() > function in patch 3/4 just fills the page with zeros while loading a > zero-filled page and is faster than decompression. > > - Srividya
[PATCH 0/4] zswap: Optimize compressed pool memory utilization
Could you please review this patch series and update if any corrections are needed in the patch-set. -Srividya On Fri, Aug 19, 2016 at 11:04 AM, Srividya Desireddy wrote: > On 17 August 2016 at 18:08, Pekka Enberg wrote: >> On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy >> wrote: >>> This series of patches optimize the memory utilized by zswap for storing >>> the swapped out pages. >>> >>> Zswap is a cache which compresses the pages that are being swapped out >>> and stores them into a dynamically allocated RAM-based memory pool. >>> Experiments have shown that around 10-15% of pages stored in zswap are >>> duplicates which results in 10-12% more RAM required to store these >>> duplicate compressed pages. Around 10-20% of pages stored in zswap >>> are zero-filled pages, but these pages are handled as normal pages by >>> compressing and allocating memory in the pool. >>> >>> The following patch-set optimizes memory utilized by zswap by avoiding the >>> storage of duplicate pages and zero-filled pages in zswap compressed memory >>> pool. >>> >>> Patch 1/4: zswap: Share zpool memory of duplicate pages >>> This patch shares compressed pool memory of the duplicate pages. When a new >>> page is requested for swap-out to zswap; search for an identical page in >>> the pages already stored in zswap. If an identical page is found then share >>> the compressed page data of the identical page with the new page. This >>> avoids allocation of memory in the compressed pool for a duplicate page. >>> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing >>> performance test at low memory conditions. Around 15-20% of the pages >>> swapped are duplicate of the pages existing in zswap, resulting in 15% >>> saving of zswap memory pool when compared to the baseline version. >>> >>> Test Parameters BaselineWith patch Improvement >>> Total RAM 955MB 955MB >>> Available RAM 254MB 269MB 15MB >>> Avg. App entry time 2.469sec2.207sec7% >>> Avg. App close time 1.151sec1.085sec6% >>> Apps launched in 1sec 5 12 7 >>> >>> There is little overhead in zswap store function due to the search >>> operation for finding duplicate pages. However, if duplicate page is >>> found it saves the compression and allocation time of the page. The average >>> overhead per zswap_frontswap_store() function call in the experimental >>> device is 9us. There is no overhead in case of zswap_frontswap_load() >>> operation. >>> >>> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime >>> This patch adds a module parameter to enable or disable the sharing of >>> duplicate zswap pages at runtime. >>> >>> Patch 3/4: zswap: Zero-filled pages handling >>> This patch checks if a page to be stored in zswap is a zero-filled page >>> (i.e. contents of the page are all zeros). If such page is found, >>> compression and allocation of memory for the compressed page is avoided >>> and instead the page is just marked as zero-filled page. >>> Although, compressed size of a zero-filled page using LZO compressor is >>> very less (52 bytes including zswap_header), this patch saves compression >>> and allocation time during store operation and decompression time during >>> zswap load operation for zero-filled pages. Experiments have shown that >>> around 10-20% of pages stored in zswap are zero-filled. >> >> Aren't zero-filled pages already handled by patch 1/4 as their >> contents match? So the overall memory saving is 52 bytes? >> >> - Pekka > > Thanks for the quick reply. > > Zero-filled pages can also be handled by patch 1/4. It performs > searching of a duplicate page among existing stored pages in zswap. > Its been observed that average search time to identify duplicate zero > filled pages(using patch 1/4) is almost thrice compared to checking > all pages for zero-filled. > > Also, in case of patch 1/4, the zswap_frontswap_load() operation requires > the compressed zero-filled page to be decompressed. zswap_frontswap_load() > function in patch 3/4 just fills the page with zeros while loading a > zero-filled page and is faster than decompression. > > - Srividya
[PATCH 3/4] zswap: Zero-filled pages handling
On 17 August 2016 at 18:02, Pekka Enberg <penb...@kernel.org> wrote: > On Wed, Aug 17, 2016 at 1:18 PM, Srividya Desireddy > <srividya...@samsung.com> wrote: >>> This patch adds a check in zswap_frontswap_store() to identify zero-filled >>> page before compression of the page. If the page is a zero-filled page, set >>> zswap_entry.zeroflag and skip the compression of the page and alloction >>> of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is >>> set for the page in zswap_entry. If the flag is set, memset the page with >>> zero. This saves the decompression time during load. >>> >>> The overall overhead caused due to zero-filled page check is very minimal >>> when compared to the time saved by avoiding compression and allocation in >>> case of zero-filled pages. The load time of a zero-filled page is reduced >>> by 80% when compared to baseline. > > On Wed, Aug 17, 2016 at 3:25 PM, Pekka Enberg <penb...@kernel.org> wrote: >> AFAICT, that's an overall improvement only if there are a lot of >> zero-filled pages because it's just overhead for pages that we *need* >> to compress, no? So I suppose the question is, are there a lot of >> zero-filled pages that we need to swap and why is that the case? > > I suppose reading your cover letter would have been helpful before > sending out my email: > > "Experiments have shown that around 10-15% of pages stored in zswap are > duplicates which results in 10-12% more RAM required to store these > duplicate compressed pages." > > But I still don't understand why we have zero-filled pages that we are > swapping out. > > - Pekka Zero-filled pages exists in memory because applications may be initializing the allocated pages with zeros and not using them; or the actual content written to the memory pages during execution itself is zeros. The existing page reclamation path in kernel does not check for zero-filled pages in the anonymous LRU lists before swapping out. - Srividya
[PATCH 3/4] zswap: Zero-filled pages handling
On 17 August 2016 at 18:02, Pekka Enberg wrote: > On Wed, Aug 17, 2016 at 1:18 PM, Srividya Desireddy > wrote: >>> This patch adds a check in zswap_frontswap_store() to identify zero-filled >>> page before compression of the page. If the page is a zero-filled page, set >>> zswap_entry.zeroflag and skip the compression of the page and alloction >>> of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is >>> set for the page in zswap_entry. If the flag is set, memset the page with >>> zero. This saves the decompression time during load. >>> >>> The overall overhead caused due to zero-filled page check is very minimal >>> when compared to the time saved by avoiding compression and allocation in >>> case of zero-filled pages. The load time of a zero-filled page is reduced >>> by 80% when compared to baseline. > > On Wed, Aug 17, 2016 at 3:25 PM, Pekka Enberg wrote: >> AFAICT, that's an overall improvement only if there are a lot of >> zero-filled pages because it's just overhead for pages that we *need* >> to compress, no? So I suppose the question is, are there a lot of >> zero-filled pages that we need to swap and why is that the case? > > I suppose reading your cover letter would have been helpful before > sending out my email: > > "Experiments have shown that around 10-15% of pages stored in zswap are > duplicates which results in 10-12% more RAM required to store these > duplicate compressed pages." > > But I still don't understand why we have zero-filled pages that we are > swapping out. > > - Pekka Zero-filled pages exists in memory because applications may be initializing the allocated pages with zeros and not using them; or the actual content written to the memory pages during execution itself is zeros. The existing page reclamation path in kernel does not check for zero-filled pages in the anonymous LRU lists before swapping out. - Srividya
[PATCH 3/4] zswap: Zero-filled pages handling
On 17 August 2016 at 17:55, Pekka Enberg <penb...@kernel.org> wrote: > On Wed, Aug 17, 2016 at 1:18 PM, Srividya Desireddy > <srividya...@samsung.com> wrote: >> @@ -1314,6 +1347,13 @@ static int zswap_frontswap_load(unsigned type, >> pgoff_t offset, >> } >> spin_unlock(>lock); >> >> + if (entry->zeroflag) { >> + dst = kmap_atomic(page); >> + memset(dst, 0, PAGE_SIZE); >> + kunmap_atomic(dst); >> + goto freeentry; >> + } > > Don't we need the same thing in zswap_writeback_entry() for the > ZSWAP_SWAPCACHE_NEW case? Zero-filled pages are not compressed and stored in the zpool memory. Zpool handle will not be created for zero-filled pages, hence they can not be picked for eviction/writeback to the swap device. - Srividya > >> + >> /* decompress */ >> dlen = PAGE_SIZE; >> src = (u8 *)zpool_map_handle(entry->pool->zpool, >> entry->zhandle->handle, >> @@ -1327,6 +1367,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t >> offset, >> zpool_unmap_handle(entry->pool->zpool, entry->zhandle->handle); >> BUG_ON(ret); > > - Pekka
[PATCH 3/4] zswap: Zero-filled pages handling
On 17 August 2016 at 17:55, Pekka Enberg wrote: > On Wed, Aug 17, 2016 at 1:18 PM, Srividya Desireddy > wrote: >> @@ -1314,6 +1347,13 @@ static int zswap_frontswap_load(unsigned type, >> pgoff_t offset, >> } >> spin_unlock(>lock); >> >> + if (entry->zeroflag) { >> + dst = kmap_atomic(page); >> + memset(dst, 0, PAGE_SIZE); >> + kunmap_atomic(dst); >> + goto freeentry; >> + } > > Don't we need the same thing in zswap_writeback_entry() for the > ZSWAP_SWAPCACHE_NEW case? Zero-filled pages are not compressed and stored in the zpool memory. Zpool handle will not be created for zero-filled pages, hence they can not be picked for eviction/writeback to the swap device. - Srividya > >> + >> /* decompress */ >> dlen = PAGE_SIZE; >> src = (u8 *)zpool_map_handle(entry->pool->zpool, >> entry->zhandle->handle, >> @@ -1327,6 +1367,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t >> offset, >> zpool_unmap_handle(entry->pool->zpool, entry->zhandle->handle); >> BUG_ON(ret); > > - Pekka
[PATCH 0/4] zswap: Optimize compressed pool memory utilization
On 17 August 2016 at 18:08, Pekka Enberg <penb...@kernel.org> wrote: > On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy > <srividya...@samsung.com> wrote: >> This series of patches optimize the memory utilized by zswap for storing >> the swapped out pages. >> >> Zswap is a cache which compresses the pages that are being swapped out >> and stores them into a dynamically allocated RAM-based memory pool. >> Experiments have shown that around 10-15% of pages stored in zswap are >> duplicates which results in 10-12% more RAM required to store these >> duplicate compressed pages. Around 10-20% of pages stored in zswap >> are zero-filled pages, but these pages are handled as normal pages by >> compressing and allocating memory in the pool. >> >> The following patch-set optimizes memory utilized by zswap by avoiding the >> storage of duplicate pages and zero-filled pages in zswap compressed memory >> pool. >> >> Patch 1/4: zswap: Share zpool memory of duplicate pages >> This patch shares compressed pool memory of the duplicate pages. When a new >> page is requested for swap-out to zswap; search for an identical page in >> the pages already stored in zswap. If an identical page is found then share >> the compressed page data of the identical page with the new page. This >> avoids allocation of memory in the compressed pool for a duplicate page. >> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing >> performance test at low memory conditions. Around 15-20% of the pages >> swapped are duplicate of the pages existing in zswap, resulting in 15% >> saving of zswap memory pool when compared to the baseline version. >> >> Test Parameters BaselineWith patch Improvement >> Total RAM 955MB 955MB >> Available RAM 254MB 269MB 15MB >> Avg. App entry time 2.469sec2.207sec7% >> Avg. App close time 1.151sec1.085sec6% >> Apps launched in 1sec 5 12 7 >> >> There is little overhead in zswap store function due to the search >> operation for finding duplicate pages. However, if duplicate page is >> found it saves the compression and allocation time of the page. The average >> overhead per zswap_frontswap_store() function call in the experimental >> device is 9us. There is no overhead in case of zswap_frontswap_load() >> operation. >> >> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime >> This patch adds a module parameter to enable or disable the sharing of >> duplicate zswap pages at runtime. >> >> Patch 3/4: zswap: Zero-filled pages handling >> This patch checks if a page to be stored in zswap is a zero-filled page >> (i.e. contents of the page are all zeros). If such page is found, >> compression and allocation of memory for the compressed page is avoided >> and instead the page is just marked as zero-filled page. >> Although, compressed size of a zero-filled page using LZO compressor is >> very less (52 bytes including zswap_header), this patch saves compression >> and allocation time during store operation and decompression time during >> zswap load operation for zero-filled pages. Experiments have shown that >> around 10-20% of pages stored in zswap are zero-filled. > > Aren't zero-filled pages already handled by patch 1/4 as their > contents match? So the overall memory saving is 52 bytes? > > - Pekka Thanks for the quick reply. Zero-filled pages can also be handled by patch 1/4. It performs searching of a duplicate page among existing stored pages in zswap. Its been observed that average search time to identify duplicate zero filled pages(using patch 1/4) is almost thrice compared to checking all pages for zero-filled. Also, in case of patch 1/4, the zswap_frontswap_load() operation requires the compressed zero-filled page to be decompressed. zswap_frontswap_load() function in patch 3/4 just fills the page with zeros while loading a zero-filled page and is faster than decompression. - Srividya
[PATCH 0/4] zswap: Optimize compressed pool memory utilization
On 17 August 2016 at 18:08, Pekka Enberg wrote: > On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy > wrote: >> This series of patches optimize the memory utilized by zswap for storing >> the swapped out pages. >> >> Zswap is a cache which compresses the pages that are being swapped out >> and stores them into a dynamically allocated RAM-based memory pool. >> Experiments have shown that around 10-15% of pages stored in zswap are >> duplicates which results in 10-12% more RAM required to store these >> duplicate compressed pages. Around 10-20% of pages stored in zswap >> are zero-filled pages, but these pages are handled as normal pages by >> compressing and allocating memory in the pool. >> >> The following patch-set optimizes memory utilized by zswap by avoiding the >> storage of duplicate pages and zero-filled pages in zswap compressed memory >> pool. >> >> Patch 1/4: zswap: Share zpool memory of duplicate pages >> This patch shares compressed pool memory of the duplicate pages. When a new >> page is requested for swap-out to zswap; search for an identical page in >> the pages already stored in zswap. If an identical page is found then share >> the compressed page data of the identical page with the new page. This >> avoids allocation of memory in the compressed pool for a duplicate page. >> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing >> performance test at low memory conditions. Around 15-20% of the pages >> swapped are duplicate of the pages existing in zswap, resulting in 15% >> saving of zswap memory pool when compared to the baseline version. >> >> Test Parameters BaselineWith patch Improvement >> Total RAM 955MB 955MB >> Available RAM 254MB 269MB 15MB >> Avg. App entry time 2.469sec2.207sec7% >> Avg. App close time 1.151sec1.085sec6% >> Apps launched in 1sec 5 12 7 >> >> There is little overhead in zswap store function due to the search >> operation for finding duplicate pages. However, if duplicate page is >> found it saves the compression and allocation time of the page. The average >> overhead per zswap_frontswap_store() function call in the experimental >> device is 9us. There is no overhead in case of zswap_frontswap_load() >> operation. >> >> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime >> This patch adds a module parameter to enable or disable the sharing of >> duplicate zswap pages at runtime. >> >> Patch 3/4: zswap: Zero-filled pages handling >> This patch checks if a page to be stored in zswap is a zero-filled page >> (i.e. contents of the page are all zeros). If such page is found, >> compression and allocation of memory for the compressed page is avoided >> and instead the page is just marked as zero-filled page. >> Although, compressed size of a zero-filled page using LZO compressor is >> very less (52 bytes including zswap_header), this patch saves compression >> and allocation time during store operation and decompression time during >> zswap load operation for zero-filled pages. Experiments have shown that >> around 10-20% of pages stored in zswap are zero-filled. > > Aren't zero-filled pages already handled by patch 1/4 as their > contents match? So the overall memory saving is 52 bytes? > > - Pekka Thanks for the quick reply. Zero-filled pages can also be handled by patch 1/4. It performs searching of a duplicate page among existing stored pages in zswap. Its been observed that average search time to identify duplicate zero filled pages(using patch 1/4) is almost thrice compared to checking all pages for zero-filled. Also, in case of patch 1/4, the zswap_frontswap_load() operation requires the compressed zero-filled page to be decompressed. zswap_frontswap_load() function in patch 3/4 just fills the page with zeros while loading a zero-filled page and is faster than decompression. - Srividya
[PATCH 4/4] zswap: Update document with sharing of duplicate pages feature
From: Srividya Desireddy <srividya...@samsung.com> Date: Wed, 17 Aug 2016 14:34:41 +0530 Subject: [PATCH 4/4] zswap: Update document with sharing of duplicate pages feature Updated zswap document with details on the sharing of duplicate swap pages feature. The usage of zswap.same_page_sharing module parameter is explained. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- Documentation/vm/zswap.txt | 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt index 89fff7d..cf11807 100644 --- a/Documentation/vm/zswap.txt +++ b/Documentation/vm/zswap.txt @@ -98,5 +98,23 @@ request is made for a page in an old zpool, it is uncompressed using its original compressor. Once all pages are removed from an old zpool, the zpool and its compressor are freed. +Some of the pages swapped to zswap have same content as that of pages already +stored in zswap. These pages are compressed and stored in the zpool memory. +Same page sharing feature enables the duplicate pages to share same compressed +zpool memory. This helps in reducing the zpool memory allocated by zswap to +store compressed pages. + +Same page sharing feature is disabled by default and can be enabled at boot +time by setting the "same_page_sharing" attribute to 1 at boot time. ie: +zswap.same_page_sharing=1. It can also be enabled and disabled at runtime +using the sysfs "same_page_sharing" attribute, e.g. + +echo 1 > /sys/module/zswap/parameters/same_page_sharing + +When zswap same page sharing is disabled at runtime it will stop sharing the +new duplicate pages that are being swapped out. However, the existing duplicate +pages will keep sharing the compressed memory pool until they are swapped in or +invalidated. + A debugfs interface is provided for various statistic about pool size, number of pages stored, and various counters for the reasons pages are rejected. -- 1.7.9.5
[PATCH 4/4] zswap: Update document with sharing of duplicate pages feature
From: Srividya Desireddy Date: Wed, 17 Aug 2016 14:34:41 +0530 Subject: [PATCH 4/4] zswap: Update document with sharing of duplicate pages feature Updated zswap document with details on the sharing of duplicate swap pages feature. The usage of zswap.same_page_sharing module parameter is explained. Signed-off-by: Srividya Desireddy --- Documentation/vm/zswap.txt | 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt index 89fff7d..cf11807 100644 --- a/Documentation/vm/zswap.txt +++ b/Documentation/vm/zswap.txt @@ -98,5 +98,23 @@ request is made for a page in an old zpool, it is uncompressed using its original compressor. Once all pages are removed from an old zpool, the zpool and its compressor are freed. +Some of the pages swapped to zswap have same content as that of pages already +stored in zswap. These pages are compressed and stored in the zpool memory. +Same page sharing feature enables the duplicate pages to share same compressed +zpool memory. This helps in reducing the zpool memory allocated by zswap to +store compressed pages. + +Same page sharing feature is disabled by default and can be enabled at boot +time by setting the "same_page_sharing" attribute to 1 at boot time. ie: +zswap.same_page_sharing=1. It can also be enabled and disabled at runtime +using the sysfs "same_page_sharing" attribute, e.g. + +echo 1 > /sys/module/zswap/parameters/same_page_sharing + +When zswap same page sharing is disabled at runtime it will stop sharing the +new duplicate pages that are being swapped out. However, the existing duplicate +pages will keep sharing the compressed memory pool until they are swapped in or +invalidated. + A debugfs interface is provided for various statistic about pool size, number of pages stored, and various counters for the reasons pages are rejected. -- 1.7.9.5
[PATCH 2/4] zswap: Enable or disable sharing of duplicate pages at runtime
From: Srividya Desireddy <srividya...@samsung.com> Date: Wed, 17 Aug 2016 14:32:24 +0530 Subject: [PATCH 2/4] zswap: Enable or disable sharing of duplicate pages at runtime Enable or disable the sharing of duplicate zswap pages at runtime. To enable sharing of duplicate zswap pages set 'same_page_sharing' sysfs attribute. By default it is disabled. In zswap_frontswap_store(), duplicate pages are searched in zswap only when same_page_sharing is set. When zswap same page sharing is disabled at runtime it will stop sharing the new duplicate pages. However, the existing duplicate pages will keep sharing the compressed memory pool until they are faulted back or invalidated. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- mm/zswap.c | 42 +- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index f7efede..ae39c77 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -116,6 +116,10 @@ module_param_cb(zpool, _zpool_param_ops, _zpool_type, 0644); static unsigned int zswap_max_pool_percent = 20; module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); +/* Enable/disable zswap same page sharing feature (disabled by default) */ +static bool zswap_same_page_sharing; +module_param_named(same_page_sharing, zswap_same_page_sharing, bool, 0644); + /* * data structures **/ @@ -1180,20 +1184,22 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, src = kmap_atomic(page); - checksum = jhash2((const u32 *)src, PAGE_SIZE / 4, 17); - spin_lock(>lock); - zhandle = zswap_same_page_search(tree, src, checksum); - if (zhandle) { - entry->offset = offset; - entry->zhandle = zhandle; - entry->pool = zhandle->pool; - entry->zhandle->ref_count++; + if (zswap_same_page_sharing) { + checksum = jhash2((const u32 *)src, PAGE_SIZE / 4, 17); + spin_lock(>lock); + zhandle = zswap_same_page_search(tree, src, checksum); + if (zhandle) { + entry->offset = offset; + entry->zhandle = zhandle; + entry->pool = zhandle->pool; + entry->zhandle->ref_count++; + spin_unlock(>lock); + kunmap_atomic(src); + atomic_inc(_duplicate_pages); + goto insert_entry; + } spin_unlock(>lock); - kunmap_atomic(src); - atomic_inc(_duplicate_pages); - goto insert_entry; } - spin_unlock(>lock); /* if entry is successfully added, it keeps the reference */ entry->pool = zswap_pool_current_get(); @@ -1245,12 +1251,14 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, entry->zhandle = zhandle; entry->zhandle->handle = handle; entry->zhandle->length = dlen; - entry->zhandle->checksum = checksum; - entry->zhandle->pool = entry->pool; - spin_lock(>lock); - ret = zswap_handle_rb_insert(>zhandleroot, entry->zhandle, + if (zswap_same_page_sharing) { + entry->zhandle->checksum = checksum; + entry->zhandle->pool = entry->pool; + spin_lock(>lock); + ret = zswap_handle_rb_insert(>zhandleroot, entry->zhandle, ); - spin_unlock(>lock); + spin_unlock(>lock); + } insert_entry: /* map */ -- 1.7.9.5
[PATCH 2/4] zswap: Enable or disable sharing of duplicate pages at runtime
From: Srividya Desireddy Date: Wed, 17 Aug 2016 14:32:24 +0530 Subject: [PATCH 2/4] zswap: Enable or disable sharing of duplicate pages at runtime Enable or disable the sharing of duplicate zswap pages at runtime. To enable sharing of duplicate zswap pages set 'same_page_sharing' sysfs attribute. By default it is disabled. In zswap_frontswap_store(), duplicate pages are searched in zswap only when same_page_sharing is set. When zswap same page sharing is disabled at runtime it will stop sharing the new duplicate pages. However, the existing duplicate pages will keep sharing the compressed memory pool until they are faulted back or invalidated. Signed-off-by: Srividya Desireddy --- mm/zswap.c | 42 +- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index f7efede..ae39c77 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -116,6 +116,10 @@ module_param_cb(zpool, _zpool_param_ops, _zpool_type, 0644); static unsigned int zswap_max_pool_percent = 20; module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); +/* Enable/disable zswap same page sharing feature (disabled by default) */ +static bool zswap_same_page_sharing; +module_param_named(same_page_sharing, zswap_same_page_sharing, bool, 0644); + /* * data structures **/ @@ -1180,20 +1184,22 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, src = kmap_atomic(page); - checksum = jhash2((const u32 *)src, PAGE_SIZE / 4, 17); - spin_lock(>lock); - zhandle = zswap_same_page_search(tree, src, checksum); - if (zhandle) { - entry->offset = offset; - entry->zhandle = zhandle; - entry->pool = zhandle->pool; - entry->zhandle->ref_count++; + if (zswap_same_page_sharing) { + checksum = jhash2((const u32 *)src, PAGE_SIZE / 4, 17); + spin_lock(>lock); + zhandle = zswap_same_page_search(tree, src, checksum); + if (zhandle) { + entry->offset = offset; + entry->zhandle = zhandle; + entry->pool = zhandle->pool; + entry->zhandle->ref_count++; + spin_unlock(>lock); + kunmap_atomic(src); + atomic_inc(_duplicate_pages); + goto insert_entry; + } spin_unlock(>lock); - kunmap_atomic(src); - atomic_inc(_duplicate_pages); - goto insert_entry; } - spin_unlock(>lock); /* if entry is successfully added, it keeps the reference */ entry->pool = zswap_pool_current_get(); @@ -1245,12 +1251,14 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, entry->zhandle = zhandle; entry->zhandle->handle = handle; entry->zhandle->length = dlen; - entry->zhandle->checksum = checksum; - entry->zhandle->pool = entry->pool; - spin_lock(>lock); - ret = zswap_handle_rb_insert(>zhandleroot, entry->zhandle, + if (zswap_same_page_sharing) { + entry->zhandle->checksum = checksum; + entry->zhandle->pool = entry->pool; + spin_lock(>lock); + ret = zswap_handle_rb_insert(>zhandleroot, entry->zhandle, ); - spin_unlock(>lock); + spin_unlock(>lock); + } insert_entry: /* map */ -- 1.7.9.5
[PATCH 3/4] zswap: Zero-filled pages handling
From: Srividya Desireddy <srividya...@samsung.com> Date: Wed, 17 Aug 2016 14:34:14 +0530 Subject: [PATCH 3/4] zswap: Zero-filled pages handling This patch adds a check in zswap_frontswap_store() to identify zero-filled page before compression of the page. If the page is a zero-filled page, set zswap_entry.zeroflag and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is set for the page in zswap_entry. If the flag is set, memset the page with zero. This saves the decompression time during load. The overall overhead caused due to zero-filled page check is very minimal when compared to the time saved by avoiding compression and allocation in case of zero-filled pages. The load time of a zero-filled page is reduced by 80% when compared to baseline. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- mm/zswap.c | 58 ++ 1 file changed, 50 insertions(+), 8 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index ae39c77..d0c3f96 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -58,6 +58,9 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); */ static atomic_t zswap_duplicate_pages = ATOMIC_INIT(0); +/* The number of zero filled pages swapped out to zswap */ +static atomic_t zswap_zero_pages = ATOMIC_INIT(0); + /* * The statistics below are not protected from concurrent access for * performance reasons so they may not be a 100% accurate. However, @@ -172,6 +175,8 @@ struct zswap_handle { *be held, there is no reason to also make refcount atomic. * pool - the zswap_pool the entry's data is in * zhandle - pointer to struct zswap_handle + * zeroflag - the flag is set if the content of the page is filled with + *zeros */ struct zswap_entry { struct rb_node rbnode; @@ -179,6 +184,7 @@ struct zswap_entry { int refcount; struct zswap_pool *pool; struct zswap_handle *zhandle; + unsigned char zeroflag; }; struct zswap_header { @@ -269,6 +275,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp) if (!entry) return NULL; entry->refcount = 1; + entry->zeroflag = 0; entry->zhandle = NULL; RB_CLEAR_NODE(>rbnode); return entry; @@ -477,13 +484,17 @@ static bool zswap_handle_is_unique(struct zswap_handle *zhandle) */ static void zswap_free_entry(struct zswap_entry *entry) { - if (zswap_handle_is_unique(entry->zhandle)) { - zpool_free(entry->pool->zpool, entry->zhandle->handle); - zswap_handle_cache_free(entry->zhandle); - zswap_pool_put(entry->pool); - } else { - entry->zhandle->ref_count--; - atomic_dec(_duplicate_pages); + if (entry->zeroflag) + atomic_dec(_zero_pages); + else { + if (zswap_handle_is_unique(entry->zhandle)) { + zpool_free(entry->pool->zpool, entry->zhandle->handle); + zswap_handle_cache_free(entry->zhandle); + zswap_pool_put(entry->pool); + } else { + entry->zhandle->ref_count--; + atomic_dec(_duplicate_pages); + } } zswap_entry_cache_free(entry); atomic_dec(_stored_pages); @@ -1140,6 +1151,21 @@ static int zswap_shrink(void) return ret; } +static int zswap_is_page_zero_filled(void *ptr) +{ + unsigned int pos; + unsigned long *page; + + page = (unsigned long *)ptr; + + for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) { + if (page[pos]) + return 0; + } + + return 1; +} + /* * frontswap hooks **/ @@ -1183,6 +1209,13 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, } src = kmap_atomic(page); + if (zswap_is_page_zero_filled(src)) { + kunmap_atomic(src); + entry->offset = offset; + entry->zeroflag = 1; + atomic_inc(_zero_pages); + goto insert_entry; + } if (zswap_same_page_sharing) { checksum = jhash2((const u32 *)src, PAGE_SIZE / 4, 17); @@ -1314,6 +1347,13 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, } spin_unlock(>lock); + if (entry->zeroflag) { + dst = kmap_atomic(page); + memset(dst, 0, PAGE_SIZE); + kunmap_atomic(dst); + goto freeentry; + } + /* decompress */ dlen = PAGE_SIZE; src = (u8 *)zpool_map_handle(entry->pool->zpool, entry->zhandle->handle, @@ -1327,6 +1367,7 @@ static int zswap_frontswap_l
[PATCH 3/4] zswap: Zero-filled pages handling
From: Srividya Desireddy Date: Wed, 17 Aug 2016 14:34:14 +0530 Subject: [PATCH 3/4] zswap: Zero-filled pages handling This patch adds a check in zswap_frontswap_store() to identify zero-filled page before compression of the page. If the page is a zero-filled page, set zswap_entry.zeroflag and skip the compression of the page and alloction of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is set for the page in zswap_entry. If the flag is set, memset the page with zero. This saves the decompression time during load. The overall overhead caused due to zero-filled page check is very minimal when compared to the time saved by avoiding compression and allocation in case of zero-filled pages. The load time of a zero-filled page is reduced by 80% when compared to baseline. Signed-off-by: Srividya Desireddy --- mm/zswap.c | 58 ++ 1 file changed, 50 insertions(+), 8 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index ae39c77..d0c3f96 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -58,6 +58,9 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); */ static atomic_t zswap_duplicate_pages = ATOMIC_INIT(0); +/* The number of zero filled pages swapped out to zswap */ +static atomic_t zswap_zero_pages = ATOMIC_INIT(0); + /* * The statistics below are not protected from concurrent access for * performance reasons so they may not be a 100% accurate. However, @@ -172,6 +175,8 @@ struct zswap_handle { *be held, there is no reason to also make refcount atomic. * pool - the zswap_pool the entry's data is in * zhandle - pointer to struct zswap_handle + * zeroflag - the flag is set if the content of the page is filled with + *zeros */ struct zswap_entry { struct rb_node rbnode; @@ -179,6 +184,7 @@ struct zswap_entry { int refcount; struct zswap_pool *pool; struct zswap_handle *zhandle; + unsigned char zeroflag; }; struct zswap_header { @@ -269,6 +275,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp) if (!entry) return NULL; entry->refcount = 1; + entry->zeroflag = 0; entry->zhandle = NULL; RB_CLEAR_NODE(>rbnode); return entry; @@ -477,13 +484,17 @@ static bool zswap_handle_is_unique(struct zswap_handle *zhandle) */ static void zswap_free_entry(struct zswap_entry *entry) { - if (zswap_handle_is_unique(entry->zhandle)) { - zpool_free(entry->pool->zpool, entry->zhandle->handle); - zswap_handle_cache_free(entry->zhandle); - zswap_pool_put(entry->pool); - } else { - entry->zhandle->ref_count--; - atomic_dec(_duplicate_pages); + if (entry->zeroflag) + atomic_dec(_zero_pages); + else { + if (zswap_handle_is_unique(entry->zhandle)) { + zpool_free(entry->pool->zpool, entry->zhandle->handle); + zswap_handle_cache_free(entry->zhandle); + zswap_pool_put(entry->pool); + } else { + entry->zhandle->ref_count--; + atomic_dec(_duplicate_pages); + } } zswap_entry_cache_free(entry); atomic_dec(_stored_pages); @@ -1140,6 +1151,21 @@ static int zswap_shrink(void) return ret; } +static int zswap_is_page_zero_filled(void *ptr) +{ + unsigned int pos; + unsigned long *page; + + page = (unsigned long *)ptr; + + for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) { + if (page[pos]) + return 0; + } + + return 1; +} + /* * frontswap hooks **/ @@ -1183,6 +1209,13 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, } src = kmap_atomic(page); + if (zswap_is_page_zero_filled(src)) { + kunmap_atomic(src); + entry->offset = offset; + entry->zeroflag = 1; + atomic_inc(_zero_pages); + goto insert_entry; + } if (zswap_same_page_sharing) { checksum = jhash2((const u32 *)src, PAGE_SIZE / 4, 17); @@ -1314,6 +1347,13 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, } spin_unlock(>lock); + if (entry->zeroflag) { + dst = kmap_atomic(page); + memset(dst, 0, PAGE_SIZE); + kunmap_atomic(dst); + goto freeentry; + } + /* decompress */ dlen = PAGE_SIZE; src = (u8 *)zpool_map_handle(entry->pool->zpool, entry->zhandle->handle, @@ -1327,6 +1367,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, zpoo
[PATCH 1/4] zswap: Share zpool memory of duplicate pages
From: Srividya Desireddy <srividya...@samsung.com> Date: Wed, 17 Aug 2016 14:31:01 +0530 Subject: [PATCH 1/4] zswap: Share zpool memory of duplicate pages This patch shares the compressed pool memory of duplicate pages and reduces compressed pool memory utilized by zswap. For each page requested for swap-out to zswap, calculate 32-bit checksum of the page. Search for duplicate pages by comparing the checksum of the new page with existing pages. Compare the contents of the pages if checksum matches. If the contents also match, then share the compressed data of the existing page with the new page. Increment the reference count to check the number of pages sharing the compressed page in zpool. If a duplicate page is not found then treat the new page as a 'unique' page in zswap. Compress the new page and store the compressed data in the zpool. Insert the unique page in the Red-Black Tree which is balanced based on 32-bit checksum value of the page. Signed-off-by: Srividya Desireddy <srividya...@samsung.com> --- mm/zswap.c | 265 1 file changed, 248 insertions(+), 17 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 275b22c..f7efede 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -41,6 +41,7 @@ #include #include #include +#include /* * statistics @@ -51,6 +52,13 @@ static u64 zswap_pool_total_size; static atomic_t zswap_stored_pages = ATOMIC_INIT(0); /* + * The number of swapped out pages which are identified as duplicate + * to the existing zswap pages. Compression and storing of these pages + * is avoided. + */ +static atomic_t zswap_duplicate_pages = ATOMIC_INIT(0); + +/* * The statistics below are not protected from concurrent access for * performance reasons so they may not be a 100% accurate. However, * they do provide useful information on roughly how many times a @@ -123,6 +131,28 @@ struct zswap_pool { }; /* + * struct zswap_handle + * This structure contains the metadata for tracking single zpool handle + * allocation. + * + * rbnode - links the zswap_handle into red-black tree + * checksum - 32-bit checksum value of the page swapped to zswap + * ref_count - number of pages sharing this handle + * length - the length in bytes of the compressed page data. + * Needed during decompression. + * handle - zpool allocation handle that stores the compressed page data. + * pool - the zswap_pool the entry's data is in. + */ +struct zswap_handle { + struct rb_node rbnode; + u32 checksum; + u16 ref_count; + unsigned int length; + unsigned long handle; + struct zswap_pool *pool; +}; + +/* * struct zswap_entry * * This structure contains the metadata for tracking a single compressed @@ -136,18 +166,15 @@ struct zswap_pool { *for the zswap_tree structure that contains the entry must *be held while changing the refcount. Since the lock must *be held, there is no reason to also make refcount atomic. - * length - the length in bytes of the compressed page data. Needed during - * decompression * pool - the zswap_pool the entry's data is in - * handle - zpool allocation handle that stores the compressed page data + * zhandle - pointer to struct zswap_handle */ struct zswap_entry { struct rb_node rbnode; pgoff_t offset; int refcount; - unsigned int length; struct zswap_pool *pool; - unsigned long handle; + struct zswap_handle *zhandle; }; struct zswap_header { @@ -161,6 +188,8 @@ struct zswap_header { */ struct zswap_tree { struct rb_root rbroot; + struct rb_root zhandleroot; + void *buffer; spinlock_t lock; }; @@ -236,6 +265,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp) if (!entry) return NULL; entry->refcount = 1; + entry->zhandle = NULL; RB_CLEAR_NODE(>rbnode); return entry; } @@ -246,6 +276,39 @@ static void zswap_entry_cache_free(struct zswap_entry *entry) } /* +* zswap handle functions +**/ +static struct kmem_cache *zswap_handle_cache; + +static int __init zswap_handle_cache_create(void) +{ + zswap_handle_cache = KMEM_CACHE(zswap_handle, 0); + return zswap_handle_cache == NULL; +} + +static void __init zswap_handle_cache_destroy(void) +{ + kmem_cache_destroy(zswap_handle_cache); +} + +static struct zswap_handle *zswap_handle_cache_alloc(gfp_t gfp) +{ + struct zswap_handle *zhandle; + + zhandle = kmem_cache_alloc(zswap_handle_cache, gfp); + if (!zhandle) + return NULL; + zhandle->ref_count = 1; + RB_CLEAR_NODE(>rbnode); + return zhandle; +} + +static void zswap_handle_cache_free(struct zswap_handle *zhandle) +{ + kmem_cache_free(
[PATCH 1/4] zswap: Share zpool memory of duplicate pages
From: Srividya Desireddy Date: Wed, 17 Aug 2016 14:31:01 +0530 Subject: [PATCH 1/4] zswap: Share zpool memory of duplicate pages This patch shares the compressed pool memory of duplicate pages and reduces compressed pool memory utilized by zswap. For each page requested for swap-out to zswap, calculate 32-bit checksum of the page. Search for duplicate pages by comparing the checksum of the new page with existing pages. Compare the contents of the pages if checksum matches. If the contents also match, then share the compressed data of the existing page with the new page. Increment the reference count to check the number of pages sharing the compressed page in zpool. If a duplicate page is not found then treat the new page as a 'unique' page in zswap. Compress the new page and store the compressed data in the zpool. Insert the unique page in the Red-Black Tree which is balanced based on 32-bit checksum value of the page. Signed-off-by: Srividya Desireddy --- mm/zswap.c | 265 1 file changed, 248 insertions(+), 17 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 275b22c..f7efede 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -41,6 +41,7 @@ #include #include #include +#include /* * statistics @@ -51,6 +52,13 @@ static u64 zswap_pool_total_size; static atomic_t zswap_stored_pages = ATOMIC_INIT(0); /* + * The number of swapped out pages which are identified as duplicate + * to the existing zswap pages. Compression and storing of these pages + * is avoided. + */ +static atomic_t zswap_duplicate_pages = ATOMIC_INIT(0); + +/* * The statistics below are not protected from concurrent access for * performance reasons so they may not be a 100% accurate. However, * they do provide useful information on roughly how many times a @@ -123,6 +131,28 @@ struct zswap_pool { }; /* + * struct zswap_handle + * This structure contains the metadata for tracking single zpool handle + * allocation. + * + * rbnode - links the zswap_handle into red-black tree + * checksum - 32-bit checksum value of the page swapped to zswap + * ref_count - number of pages sharing this handle + * length - the length in bytes of the compressed page data. + * Needed during decompression. + * handle - zpool allocation handle that stores the compressed page data. + * pool - the zswap_pool the entry's data is in. + */ +struct zswap_handle { + struct rb_node rbnode; + u32 checksum; + u16 ref_count; + unsigned int length; + unsigned long handle; + struct zswap_pool *pool; +}; + +/* * struct zswap_entry * * This structure contains the metadata for tracking a single compressed @@ -136,18 +166,15 @@ struct zswap_pool { *for the zswap_tree structure that contains the entry must *be held while changing the refcount. Since the lock must *be held, there is no reason to also make refcount atomic. - * length - the length in bytes of the compressed page data. Needed during - * decompression * pool - the zswap_pool the entry's data is in - * handle - zpool allocation handle that stores the compressed page data + * zhandle - pointer to struct zswap_handle */ struct zswap_entry { struct rb_node rbnode; pgoff_t offset; int refcount; - unsigned int length; struct zswap_pool *pool; - unsigned long handle; + struct zswap_handle *zhandle; }; struct zswap_header { @@ -161,6 +188,8 @@ struct zswap_header { */ struct zswap_tree { struct rb_root rbroot; + struct rb_root zhandleroot; + void *buffer; spinlock_t lock; }; @@ -236,6 +265,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp) if (!entry) return NULL; entry->refcount = 1; + entry->zhandle = NULL; RB_CLEAR_NODE(>rbnode); return entry; } @@ -246,6 +276,39 @@ static void zswap_entry_cache_free(struct zswap_entry *entry) } /* +* zswap handle functions +**/ +static struct kmem_cache *zswap_handle_cache; + +static int __init zswap_handle_cache_create(void) +{ + zswap_handle_cache = KMEM_CACHE(zswap_handle, 0); + return zswap_handle_cache == NULL; +} + +static void __init zswap_handle_cache_destroy(void) +{ + kmem_cache_destroy(zswap_handle_cache); +} + +static struct zswap_handle *zswap_handle_cache_alloc(gfp_t gfp) +{ + struct zswap_handle *zhandle; + + zhandle = kmem_cache_alloc(zswap_handle_cache, gfp); + if (!zhandle) + return NULL; + zhandle->ref_count = 1; + RB_CLEAR_NODE(>rbnode); + return zhandle; +} + +static void zswap_handle_cache_free(struct zswap_handle *zhandle) +{ + kmem_cache_free(zswap_handle_cache, zhandle); +} + +/* * r
[PATCH 0/4] zswap: Optimize compressed pool memory utilization
This series of patches optimize the memory utilized by zswap for storing the swapped out pages. Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-15% of pages stored in zswap are duplicates which results in 10-12% more RAM required to store these duplicate compressed pages. Around 10-20% of pages stored in zswap are zero-filled pages, but these pages are handled as normal pages by compressing and allocating memory in the pool. The following patch-set optimizes memory utilized by zswap by avoiding the storage of duplicate pages and zero-filled pages in zswap compressed memory pool. Patch 1/4: zswap: Share zpool memory of duplicate pages This patch shares compressed pool memory of the duplicate pages. When a new page is requested for swap-out to zswap; search for an identical page in the pages already stored in zswap. If an identical page is found then share the compressed page data of the identical page with the new page. This avoids allocation of memory in the compressed pool for a duplicate page. This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing performance test at low memory conditions. Around 15-20% of the pages swapped are duplicate of the pages existing in zswap, resulting in 15% saving of zswap memory pool when compared to the baseline version. Test Parameters BaselineWith patch Improvement Total RAM 955MB 955MB Available RAM 254MB 269MB 15MB Avg. App entry time 2.469sec2.207sec7% Avg. App close time 1.151sec1.085sec6% Apps launched in 1sec 5 12 7 There is little overhead in zswap store function due to the search operation for finding duplicate pages. However, if duplicate page is found it saves the compression and allocation time of the page. The average overhead per zswap_frontswap_store() function call in the experimental device is 9us. There is no overhead in case of zswap_frontswap_load() operation. Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime This patch adds a module parameter to enable or disable the sharing of duplicate zswap pages at runtime. Patch 3/4: zswap: Zero-filled pages handling This patch checks if a page to be stored in zswap is a zero-filled page (i.e. contents of the page are all zeros). If such page is found, compression and allocation of memory for the compressed page is avoided and instead the page is just marked as zero-filled page. Although, compressed size of a zero-filled page using LZO compressor is very less (52 bytes including zswap_header), this patch saves compression and allocation time during store operation and decompression time during zswap load operation for zero-filled pages. Experiments have shown that around 10-20% of pages stored in zswap are zero-filled. Patch 4/4: Update document with sharing of duplicate pages feature In this patch zswap document is updated with information on sharing of duplicate swap pages feature. Documentation/vm/zswap.txt | 18 +++ mm/zswap.c | 315 +--- 2 files changed, 316 insertions(+), 17 deletions(-)
[PATCH 0/4] zswap: Optimize compressed pool memory utilization
This series of patches optimize the memory utilized by zswap for storing the swapped out pages. Zswap is a cache which compresses the pages that are being swapped out and stores them into a dynamically allocated RAM-based memory pool. Experiments have shown that around 10-15% of pages stored in zswap are duplicates which results in 10-12% more RAM required to store these duplicate compressed pages. Around 10-20% of pages stored in zswap are zero-filled pages, but these pages are handled as normal pages by compressing and allocating memory in the pool. The following patch-set optimizes memory utilized by zswap by avoiding the storage of duplicate pages and zero-filled pages in zswap compressed memory pool. Patch 1/4: zswap: Share zpool memory of duplicate pages This patch shares compressed pool memory of the duplicate pages. When a new page is requested for swap-out to zswap; search for an identical page in the pages already stored in zswap. If an identical page is found then share the compressed page data of the identical page with the new page. This avoids allocation of memory in the compressed pool for a duplicate page. This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing performance test at low memory conditions. Around 15-20% of the pages swapped are duplicate of the pages existing in zswap, resulting in 15% saving of zswap memory pool when compared to the baseline version. Test Parameters BaselineWith patch Improvement Total RAM 955MB 955MB Available RAM 254MB 269MB 15MB Avg. App entry time 2.469sec2.207sec7% Avg. App close time 1.151sec1.085sec6% Apps launched in 1sec 5 12 7 There is little overhead in zswap store function due to the search operation for finding duplicate pages. However, if duplicate page is found it saves the compression and allocation time of the page. The average overhead per zswap_frontswap_store() function call in the experimental device is 9us. There is no overhead in case of zswap_frontswap_load() operation. Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime This patch adds a module parameter to enable or disable the sharing of duplicate zswap pages at runtime. Patch 3/4: zswap: Zero-filled pages handling This patch checks if a page to be stored in zswap is a zero-filled page (i.e. contents of the page are all zeros). If such page is found, compression and allocation of memory for the compressed page is avoided and instead the page is just marked as zero-filled page. Although, compressed size of a zero-filled page using LZO compressor is very less (52 bytes including zswap_header), this patch saves compression and allocation time during store operation and decompression time during zswap load operation for zero-filled pages. Experiments have shown that around 10-20% of pages stored in zswap are zero-filled. Patch 4/4: Update document with sharing of duplicate pages feature In this patch zswap document is updated with information on sharing of duplicate swap pages feature. Documentation/vm/zswap.txt | 18 +++ mm/zswap.c | 315 +--- 2 files changed, 316 insertions(+), 17 deletions(-)