Re: [PATCH v2 0/5] mm: migrate zbud pages
On wto, 2013-10-01 at 16:04 -0500, Seth Jennings wrote: > Yes, it is very similar. I'm beginning to like aspects of this patch > more as I explore this issue more. > > At first, I balked at the idea of yet another abstraction layer, but it > is very hard to avoid unless you want to completely collapse zswap and > zbud into one another and dissolve the layering. Then you could do a > direct swap_offset -> address mapping. After discussion with Tomasz Stanislawski we had an idea of merging the trees (zswap's rb and zbud's radix added in these patches) into one tree in zbud layer. This would simplify the design (if migration was added, of course). The idea looks like: 1. Get rid of the red-black tree in zswap. 2. Add radix tree to zbud (or use radix tree from address space). - Use offset (from swp_entry) as index to radix tree. - zbud page (struct page) stored in tree. 4. With both buddies filled one zbud page would be put in radix tree twice. 5. zbud API would look like: zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, pgoff_t offset) zbud_free(struct zbud_pool *pool, pgoff_t offset) zbud_map(struct zbud_pool *pool, pgoff_t offset) etc. 6. zbud_map/unmap() would be a little more complex than now as it would took over some code from zswap (finding offset in tree). 7. The radix tree would be used for: - finding entry by offset (for zswap_frontswap_load() and others), - migration. 8. In case of migration colliding with zbud_map/unmap() the locking could be limited (in comparison to my patch). Calling zbud_map() would mark a page "dirty". During migration if page was "dirtied" then migration would fail with EAGAIN. Of course migration won't start if zbud buddy was mapped. What do you think about this? Best regards, Krzysztof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On wto, 2013-10-01 at 16:04 -0500, Seth Jennings wrote: Yes, it is very similar. I'm beginning to like aspects of this patch more as I explore this issue more. At first, I balked at the idea of yet another abstraction layer, but it is very hard to avoid unless you want to completely collapse zswap and zbud into one another and dissolve the layering. Then you could do a direct swap_offset - address mapping. After discussion with Tomasz Stanislawski we had an idea of merging the trees (zswap's rb and zbud's radix added in these patches) into one tree in zbud layer. This would simplify the design (if migration was added, of course). The idea looks like: 1. Get rid of the red-black tree in zswap. 2. Add radix tree to zbud (or use radix tree from address space). - Use offset (from swp_entry) as index to radix tree. - zbud page (struct page) stored in tree. 4. With both buddies filled one zbud page would be put in radix tree twice. 5. zbud API would look like: zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, pgoff_t offset) zbud_free(struct zbud_pool *pool, pgoff_t offset) zbud_map(struct zbud_pool *pool, pgoff_t offset) etc. 6. zbud_map/unmap() would be a little more complex than now as it would took over some code from zswap (finding offset in tree). 7. The radix tree would be used for: - finding entry by offset (for zswap_frontswap_load() and others), - migration. 8. In case of migration colliding with zbud_map/unmap() the locking could be limited (in comparison to my patch). Calling zbud_map() would mark a page dirty. During migration if page was dirtied then migration would fail with EAGAIN. Of course migration won't start if zbud buddy was mapped. What do you think about this? Best regards, Krzysztof -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Mon, Sep 30, 2013 at 10:28:46AM +0200, Krzysztof Kozlowski wrote: > On pią, 2013-09-27 at 17:00 -0500, Seth Jennings wrote: > > I have to say that when I first came up with the idea, I was thinking > > the address space would be at the zswap layer and the radix slots would > > hold zbud handles, not struct page pointers. > > > > However, as I have discovered today, this is problematic when it comes > > to reclaim and migration and serializing access. > > > > I wanted to do as much as possible in the zswap layer since anything > > done in the zbud layer would need to be duplicated in any other future > > allocator that zswap wanted to support. > > > > Unfortunately, zbud abstracts away the struct page and that visibility > > is needed to properly do what we are talking about. > > > > So maybe it is inevitable that this will need to be in the zbud code > > with the radix tree slots pointing to struct pages after all. > > To me it looks very similar to the solution proposed in my patches. Yes, it is very similar. I'm beginning to like aspects of this patch more as I explore this issue more. At first, I balked at the idea of yet another abstraction layer, but it is very hard to avoid unless you want to completely collapse zswap and zbud into one another and dissolve the layering. Then you could do a direct swap_offset -> address mapping. > The > difference is that you wish to use offset as radix tree index. > I thought about this earlier but it imposed two problems: > > 1. A generalized handle (instead of offset) may be more suitable when > zbud will be used in other drivers (e.g. zram). > > 2. It requires redesigning of zswap architecture around > zswap_frontswap_store() in case of duplicated insertion. Currently when > storing a page the zswap: > - allocates zbud page, > - stores new data in it, > - checks whether it is a duplicated page (same offset present in > rbtree), > - if yes (duplicated) then zswap frees previous entry. > The problem here lies in allocating zbud page under the same offset. > This step would replace old data (because we are using the same offset > in radix tree). Yes, but the offset is always going to be the key at the top layer because that is was the swap subsystem uses. So we'd have to have a swap_offset -> handle -> address translation (2 abstraction layers) the first of which would need to deal with the duplicate store issue. Seth > > In my opinion using zbud handle is in this case more flexible. > > > Best regards, > Krzysztof > > > I like the idea of masking the bit into the struct page pointer to > > indicate which buddy maps to the offset. > > > > There is a twist here in that, unlike a normal page cache tree, we can > > have two offsets pointing at different buddies in the same frame > > which means we'll have to do some custom stuff for migration. > > > > The rabbit hole I was going down today has come to an end so I'll take a > > fresh look next week. > > > > Thanks for your ideas and discussion! Maybe we can make zswap/zbud an > > upstanding MM citizen yet! > > > > Seth > > > > > > > > >> > > > >> In case of zbud, there are two swap offset pointing to > > > >> the same page. There might be more if zsmalloc is used. > > > >> What is worse it is possible that one swap entry could > > > >> point to data that cross a page boundary. > > > > > > > > We just won't set page->index since it doesn't have a good meaning in > > > > our case. Swap cache pages also don't use index, although is seems to > > > > me that they could since there is a 1:1 mapping of a swap cache page to > > > > a swap offset and the index field isn't being used for anything else. > > > > But I digress... > > > > > > OK. > > > > > > > > > > >> > > > >> Of course, one could try to modify MM to support > > > >> multiple mapping of a page in the radix tree. > > > >> But I think that MM guys will consider this as a hack > > > >> and they will not accept it. > > > > > > > > Yes, it will require some changes to the MM to handle zbud pages on the > > > > LRU. I'm thinking that it won't be too intrusive, depending on how we > > > > choose to mark zbud pages. > > > > > > > > > > Anyway, I think that zswap should use two index engines. > > > I mean index in Data Base meaning. > > > One index is used to translate swap_entry to compressed page. > > > And another one to be used by reclaim and migration by MM, > > > probably address_space is a best choice. > > > Zbud would responsible for keeping consistency > > > between mentioned indexes. > > > > > > Regards, > > > Tomasz Stanislawski > > > > > > > Seth > > > > > > > >> > > > >> Regards, > > > >> Tomasz Stanislawski > > > >> > > > >> > > > >>> -- > > > >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in > > > >>> the body to majord...@kvack.org. For more info on Linux MM, > > > >>> see: http://www.linux-mm.org/ . > > > >>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > > >>> > > > >> > > > > > > >
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Mon, Sep 30, 2013 at 10:28:46AM +0200, Krzysztof Kozlowski wrote: On pią, 2013-09-27 at 17:00 -0500, Seth Jennings wrote: I have to say that when I first came up with the idea, I was thinking the address space would be at the zswap layer and the radix slots would hold zbud handles, not struct page pointers. However, as I have discovered today, this is problematic when it comes to reclaim and migration and serializing access. I wanted to do as much as possible in the zswap layer since anything done in the zbud layer would need to be duplicated in any other future allocator that zswap wanted to support. Unfortunately, zbud abstracts away the struct page and that visibility is needed to properly do what we are talking about. So maybe it is inevitable that this will need to be in the zbud code with the radix tree slots pointing to struct pages after all. To me it looks very similar to the solution proposed in my patches. Yes, it is very similar. I'm beginning to like aspects of this patch more as I explore this issue more. At first, I balked at the idea of yet another abstraction layer, but it is very hard to avoid unless you want to completely collapse zswap and zbud into one another and dissolve the layering. Then you could do a direct swap_offset - address mapping. The difference is that you wish to use offset as radix tree index. I thought about this earlier but it imposed two problems: 1. A generalized handle (instead of offset) may be more suitable when zbud will be used in other drivers (e.g. zram). 2. It requires redesigning of zswap architecture around zswap_frontswap_store() in case of duplicated insertion. Currently when storing a page the zswap: - allocates zbud page, - stores new data in it, - checks whether it is a duplicated page (same offset present in rbtree), - if yes (duplicated) then zswap frees previous entry. The problem here lies in allocating zbud page under the same offset. This step would replace old data (because we are using the same offset in radix tree). Yes, but the offset is always going to be the key at the top layer because that is was the swap subsystem uses. So we'd have to have a swap_offset - handle - address translation (2 abstraction layers) the first of which would need to deal with the duplicate store issue. Seth In my opinion using zbud handle is in this case more flexible. Best regards, Krzysztof I like the idea of masking the bit into the struct page pointer to indicate which buddy maps to the offset. There is a twist here in that, unlike a normal page cache tree, we can have two offsets pointing at different buddies in the same frame which means we'll have to do some custom stuff for migration. The rabbit hole I was going down today has come to an end so I'll take a fresh look next week. Thanks for your ideas and discussion! Maybe we can make zswap/zbud an upstanding MM citizen yet! Seth In case of zbud, there are two swap offset pointing to the same page. There might be more if zsmalloc is used. What is worse it is possible that one swap entry could point to data that cross a page boundary. We just won't set page-index since it doesn't have a good meaning in our case. Swap cache pages also don't use index, although is seems to me that they could since there is a 1:1 mapping of a swap cache page to a swap offset and the index field isn't being used for anything else. But I digress... OK. Of course, one could try to modify MM to support multiple mapping of a page in the radix tree. But I think that MM guys will consider this as a hack and they will not accept it. Yes, it will require some changes to the MM to handle zbud pages on the LRU. I'm thinking that it won't be too intrusive, depending on how we choose to mark zbud pages. Anyway, I think that zswap should use two index engines. I mean index in Data Base meaning. One index is used to translate swap_entry to compressed page. And another one to be used by reclaim and migration by MM, probably address_space is a best choice. Zbud would responsible for keeping consistency between mentioned indexes. Regards, Tomasz Stanislawski Seth Regards, Tomasz Stanislawski -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe
Re: [PATCH v2 0/5] mm: migrate zbud pages
On pią, 2013-09-27 at 17:00 -0500, Seth Jennings wrote: > I have to say that when I first came up with the idea, I was thinking > the address space would be at the zswap layer and the radix slots would > hold zbud handles, not struct page pointers. > > However, as I have discovered today, this is problematic when it comes > to reclaim and migration and serializing access. > > I wanted to do as much as possible in the zswap layer since anything > done in the zbud layer would need to be duplicated in any other future > allocator that zswap wanted to support. > > Unfortunately, zbud abstracts away the struct page and that visibility > is needed to properly do what we are talking about. > > So maybe it is inevitable that this will need to be in the zbud code > with the radix tree slots pointing to struct pages after all. To me it looks very similar to the solution proposed in my patches. The difference is that you wish to use offset as radix tree index. I thought about this earlier but it imposed two problems: 1. A generalized handle (instead of offset) may be more suitable when zbud will be used in other drivers (e.g. zram). 2. It requires redesigning of zswap architecture around zswap_frontswap_store() in case of duplicated insertion. Currently when storing a page the zswap: - allocates zbud page, - stores new data in it, - checks whether it is a duplicated page (same offset present in rbtree), - if yes (duplicated) then zswap frees previous entry. The problem here lies in allocating zbud page under the same offset. This step would replace old data (because we are using the same offset in radix tree). In my opinion using zbud handle is in this case more flexible. Best regards, Krzysztof > I like the idea of masking the bit into the struct page pointer to > indicate which buddy maps to the offset. > > There is a twist here in that, unlike a normal page cache tree, we can > have two offsets pointing at different buddies in the same frame > which means we'll have to do some custom stuff for migration. > > The rabbit hole I was going down today has come to an end so I'll take a > fresh look next week. > > Thanks for your ideas and discussion! Maybe we can make zswap/zbud an > upstanding MM citizen yet! > > Seth > > > > > >> > > >> In case of zbud, there are two swap offset pointing to > > >> the same page. There might be more if zsmalloc is used. > > >> What is worse it is possible that one swap entry could > > >> point to data that cross a page boundary. > > > > > > We just won't set page->index since it doesn't have a good meaning in > > > our case. Swap cache pages also don't use index, although is seems to > > > me that they could since there is a 1:1 mapping of a swap cache page to > > > a swap offset and the index field isn't being used for anything else. > > > But I digress... > > > > OK. > > > > > > > >> > > >> Of course, one could try to modify MM to support > > >> multiple mapping of a page in the radix tree. > > >> But I think that MM guys will consider this as a hack > > >> and they will not accept it. > > > > > > Yes, it will require some changes to the MM to handle zbud pages on the > > > LRU. I'm thinking that it won't be too intrusive, depending on how we > > > choose to mark zbud pages. > > > > > > > Anyway, I think that zswap should use two index engines. > > I mean index in Data Base meaning. > > One index is used to translate swap_entry to compressed page. > > And another one to be used by reclaim and migration by MM, > > probably address_space is a best choice. > > Zbud would responsible for keeping consistency > > between mentioned indexes. > > > > Regards, > > Tomasz Stanislawski > > > > > Seth > > > > > >> > > >> Regards, > > >> Tomasz Stanislawski > > >> > > >> > > >>> -- > > >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in > > >>> the body to majord...@kvack.org. For more info on Linux MM, > > >>> see: http://www.linux-mm.org/ . > > >>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > >>> > > >> > > > > > > -- > > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > > the body to majord...@kvack.org. For more info on Linux MM, > > > see: http://www.linux-mm.org/ . > > > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On pią, 2013-09-27 at 17:00 -0500, Seth Jennings wrote: I have to say that when I first came up with the idea, I was thinking the address space would be at the zswap layer and the radix slots would hold zbud handles, not struct page pointers. However, as I have discovered today, this is problematic when it comes to reclaim and migration and serializing access. I wanted to do as much as possible in the zswap layer since anything done in the zbud layer would need to be duplicated in any other future allocator that zswap wanted to support. Unfortunately, zbud abstracts away the struct page and that visibility is needed to properly do what we are talking about. So maybe it is inevitable that this will need to be in the zbud code with the radix tree slots pointing to struct pages after all. To me it looks very similar to the solution proposed in my patches. The difference is that you wish to use offset as radix tree index. I thought about this earlier but it imposed two problems: 1. A generalized handle (instead of offset) may be more suitable when zbud will be used in other drivers (e.g. zram). 2. It requires redesigning of zswap architecture around zswap_frontswap_store() in case of duplicated insertion. Currently when storing a page the zswap: - allocates zbud page, - stores new data in it, - checks whether it is a duplicated page (same offset present in rbtree), - if yes (duplicated) then zswap frees previous entry. The problem here lies in allocating zbud page under the same offset. This step would replace old data (because we are using the same offset in radix tree). In my opinion using zbud handle is in this case more flexible. Best regards, Krzysztof I like the idea of masking the bit into the struct page pointer to indicate which buddy maps to the offset. There is a twist here in that, unlike a normal page cache tree, we can have two offsets pointing at different buddies in the same frame which means we'll have to do some custom stuff for migration. The rabbit hole I was going down today has come to an end so I'll take a fresh look next week. Thanks for your ideas and discussion! Maybe we can make zswap/zbud an upstanding MM citizen yet! Seth In case of zbud, there are two swap offset pointing to the same page. There might be more if zsmalloc is used. What is worse it is possible that one swap entry could point to data that cross a page boundary. We just won't set page-index since it doesn't have a good meaning in our case. Swap cache pages also don't use index, although is seems to me that they could since there is a 1:1 mapping of a swap cache page to a swap offset and the index field isn't being used for anything else. But I digress... OK. Of course, one could try to modify MM to support multiple mapping of a page in the radix tree. But I think that MM guys will consider this as a hack and they will not accept it. Yes, it will require some changes to the MM to handle zbud pages on the LRU. I'm thinking that it won't be too intrusive, depending on how we choose to mark zbud pages. Anyway, I think that zswap should use two index engines. I mean index in Data Base meaning. One index is used to translate swap_entry to compressed page. And another one to be used by reclaim and migration by MM, probably address_space is a best choice. Zbud would responsible for keeping consistency between mentioned indexes. Regards, Tomasz Stanislawski Seth Regards, Tomasz Stanislawski -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On 09/28/2013 06:00 AM, Seth Jennings wrote: > On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote: >> On 09/25/2013 11:57 PM, Seth Jennings wrote: >>> On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote: > I just had an idea this afternoon to potentially kill both these birds > with one > stone: Replace the rbtree in zswap with an address_space. > > Each swap type would have its own page_tree to organize the compressed > objects > by type and offset (radix tree is more suited for this anyway) and a_ops > that > could be called by shrink_page_list() (writepage) or the migration code > (migratepage). > > Then zbud pages could be put on the normal LRU list, maybe at the > beginning of > the inactive LRU so they would live for another cycle through the list, > then be > reclaimed in the normal way with the mapping->a_ops->writepage() pointing > to a > zswap_writepage() function that would decompress the pages and call > __swap_writepage() on them. > > This might actually do away with the explicit pool size too as the > compressed > pool pages wouldn't be outside the control of the MM anymore. > > I'm just starting to explore this but I think it has promise. > > Seth > Hi Seth, There is a problem with the proposed idea. The radix tree used 'struct address_space' is a part of a bigger data structure. The radix tree is used to translate an offset to a page. That is ok for zswap. But struct page has a field named 'index'. The MM assumes that this index is an offset in radix tree where one can find the page. A lot is done by MM to sustain this consistency. >>> >>> Yes, this is how it is for page cache pages. However, the MM is able to >>> work differently with anonymous pages. In the case of an anonymous >>> page, the mapping field points to an anon_vma struct, or, if ksm in >>> enabled and dedup'ing the page, a private ksm tracking structure. If >>> the anonymous page is fully unmapped and resides only in the swap cache, >>> the page mapping is NULL. So there is precedent for the fields to mean >>> other things. >> >> Hi Seth, >> You are right that page->mapping is NULL for pages in swap_cache but >> page_mapping() is not NULL in such a case. The mapping is taken from >> struct address_space swapper_spaces[]. It is still an address space, >> and it should preserve constraints for struct address_space. >> The same happen for page->index and page_index(). >> >>> >>> The question is how to mark and identify zbud pages among the other page >>> types that will be on the LRU. There are many ways. The question is >>> what is the best and most acceptable way. >>> >> >> If you consider hacking I have some idea how address_space could utilized >> for ZBUD. >> One solution whould be using tags in a radix tree. Every entry in a radix >> tree >> can have a few bits assigned to it. Currently 3 bits are supported: >> >> From include/linux/fs.h >> #define PAGECACHE_TAG_DIRTY 0 >> #define PAGECACHE_TAG_WRITEBACK 1 >> #define PAGECACHE_TAG_TOWRITE2 >> >> You could add a new bit or utilize one of existing ones. >> >> The other idea is use a trick from a RB trees and scatter-gather lists. >> I mean using the last bits of pointers to keep some metadata. >> Values of 'struct page *' variables are aligned to a pointer alignment which >> is >> 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could >> could use the last bit of page pointer in a radix tree to track if a swap >> entry >> refers to a lower or a higher part of a ZBUD page. >> I think it is a serious hacking/obfuscation but it may work with the minimal >> amount of changes to MM. Adding only (x&~3) while extracting page pointer is >> probably enough. >> >> What do you think about this idea? > > I think it is a good one. > > I have to say that when I first came up with the idea, I was thinking > the address space would be at the zswap layer and the radix slots would > hold zbud handles, not struct page pointers. > > However, as I have discovered today, this is problematic when it comes > to reclaim and migration and serializing access. > > I wanted to do as much as possible in the zswap layer since anything > done in the zbud layer would need to be duplicated in any other future > allocator that zswap wanted to support. > > Unfortunately, zbud abstracts away the struct page and that visibility > is needed to properly do what we are talking about. > > So maybe it is inevitable that this will need to be in the zbud code > with the radix tree slots pointing to struct pages after all. > But in this way, zswap_frontswap_load() can't find zswap_entry. We still need the rbtree in current zswap. > I like the idea of masking the bit into the struct page pointer to > indicate which buddy maps to the offset. > I have no idea why
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote: > On 09/25/2013 11:57 PM, Seth Jennings wrote: > > On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote: > >>> I just had an idea this afternoon to potentially kill both these birds > >>> with one > >>> stone: Replace the rbtree in zswap with an address_space. > >>> > >>> Each swap type would have its own page_tree to organize the compressed > >>> objects > >>> by type and offset (radix tree is more suited for this anyway) and a_ops > >>> that > >>> could be called by shrink_page_list() (writepage) or the migration code > >>> (migratepage). > >>> > >>> Then zbud pages could be put on the normal LRU list, maybe at the > >>> beginning of > >>> the inactive LRU so they would live for another cycle through the list, > >>> then be > >>> reclaimed in the normal way with the mapping->a_ops->writepage() pointing > >>> to a > >>> zswap_writepage() function that would decompress the pages and call > >>> __swap_writepage() on them. > >>> > >>> This might actually do away with the explicit pool size too as the > >>> compressed > >>> pool pages wouldn't be outside the control of the MM anymore. > >>> > >>> I'm just starting to explore this but I think it has promise. > >>> > >>> Seth > >>> > >> > >> Hi Seth, > >> There is a problem with the proposed idea. > >> The radix tree used 'struct address_space' is a part of > >> a bigger data structure. > >> The radix tree is used to translate an offset to a page. > >> That is ok for zswap. But struct page has a field named 'index'. > >> The MM assumes that this index is an offset in radix tree > >> where one can find the page. A lot is done by MM to sustain > >> this consistency. > > > > Yes, this is how it is for page cache pages. However, the MM is able to > > work differently with anonymous pages. In the case of an anonymous > > page, the mapping field points to an anon_vma struct, or, if ksm in > > enabled and dedup'ing the page, a private ksm tracking structure. If > > the anonymous page is fully unmapped and resides only in the swap cache, > > the page mapping is NULL. So there is precedent for the fields to mean > > other things. > > Hi Seth, > You are right that page->mapping is NULL for pages in swap_cache but > page_mapping() is not NULL in such a case. The mapping is taken from > struct address_space swapper_spaces[]. It is still an address space, > and it should preserve constraints for struct address_space. > The same happen for page->index and page_index(). > > > > > The question is how to mark and identify zbud pages among the other page > > types that will be on the LRU. There are many ways. The question is > > what is the best and most acceptable way. > > > > If you consider hacking I have some idea how address_space could utilized for > ZBUD. > One solution whould be using tags in a radix tree. Every entry in a radix tree > can have a few bits assigned to it. Currently 3 bits are supported: > > From include/linux/fs.h > #define PAGECACHE_TAG_DIRTY 0 > #define PAGECACHE_TAG_WRITEBACK 1 > #define PAGECACHE_TAG_TOWRITE2 > > You could add a new bit or utilize one of existing ones. > > The other idea is use a trick from a RB trees and scatter-gather lists. > I mean using the last bits of pointers to keep some metadata. > Values of 'struct page *' variables are aligned to a pointer alignment which > is > 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could > could use the last bit of page pointer in a radix tree to track if a swap > entry > refers to a lower or a higher part of a ZBUD page. > I think it is a serious hacking/obfuscation but it may work with the minimal > amount of changes to MM. Adding only (x&~3) while extracting page pointer is > probably enough. > > What do you think about this idea? I think it is a good one. I have to say that when I first came up with the idea, I was thinking the address space would be at the zswap layer and the radix slots would hold zbud handles, not struct page pointers. However, as I have discovered today, this is problematic when it comes to reclaim and migration and serializing access. I wanted to do as much as possible in the zswap layer since anything done in the zbud layer would need to be duplicated in any other future allocator that zswap wanted to support. Unfortunately, zbud abstracts away the struct page and that visibility is needed to properly do what we are talking about. So maybe it is inevitable that this will need to be in the zbud code with the radix tree slots pointing to struct pages after all. I like the idea of masking the bit into the struct page pointer to indicate which buddy maps to the offset. There is a twist here in that, unlike a normal page cache tree, we can have two offsets pointing at different buddies in the same frame which means we'll have to do some custom stuff for migration. The rabbit hole I was going down today has come to an
Re: [PATCH v2 0/5] mm: migrate zbud pages
On 09/25/2013 11:57 PM, Seth Jennings wrote: > On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote: >>> I just had an idea this afternoon to potentially kill both these birds with >>> one >>> stone: Replace the rbtree in zswap with an address_space. >>> >>> Each swap type would have its own page_tree to organize the compressed >>> objects >>> by type and offset (radix tree is more suited for this anyway) and a_ops >>> that >>> could be called by shrink_page_list() (writepage) or the migration code >>> (migratepage). >>> >>> Then zbud pages could be put on the normal LRU list, maybe at the beginning >>> of >>> the inactive LRU so they would live for another cycle through the list, >>> then be >>> reclaimed in the normal way with the mapping->a_ops->writepage() pointing >>> to a >>> zswap_writepage() function that would decompress the pages and call >>> __swap_writepage() on them. >>> >>> This might actually do away with the explicit pool size too as the >>> compressed >>> pool pages wouldn't be outside the control of the MM anymore. >>> >>> I'm just starting to explore this but I think it has promise. >>> >>> Seth >>> >> >> Hi Seth, >> There is a problem with the proposed idea. >> The radix tree used 'struct address_space' is a part of >> a bigger data structure. >> The radix tree is used to translate an offset to a page. >> That is ok for zswap. But struct page has a field named 'index'. >> The MM assumes that this index is an offset in radix tree >> where one can find the page. A lot is done by MM to sustain >> this consistency. > > Yes, this is how it is for page cache pages. However, the MM is able to > work differently with anonymous pages. In the case of an anonymous > page, the mapping field points to an anon_vma struct, or, if ksm in > enabled and dedup'ing the page, a private ksm tracking structure. If > the anonymous page is fully unmapped and resides only in the swap cache, > the page mapping is NULL. So there is precedent for the fields to mean > other things. Hi Seth, You are right that page->mapping is NULL for pages in swap_cache but page_mapping() is not NULL in such a case. The mapping is taken from struct address_space swapper_spaces[]. It is still an address space, and it should preserve constraints for struct address_space. The same happen for page->index and page_index(). > > The question is how to mark and identify zbud pages among the other page > types that will be on the LRU. There are many ways. The question is > what is the best and most acceptable way. > If you consider hacking I have some idea how address_space could utilized for ZBUD. One solution whould be using tags in a radix tree. Every entry in a radix tree can have a few bits assigned to it. Currently 3 bits are supported: >From include/linux/fs.h #define PAGECACHE_TAG_DIRTY 0 #define PAGECACHE_TAG_WRITEBACK 1 #define PAGECACHE_TAG_TOWRITE2 You could add a new bit or utilize one of existing ones. The other idea is use a trick from a RB trees and scatter-gather lists. I mean using the last bits of pointers to keep some metadata. Values of 'struct page *' variables are aligned to a pointer alignment which is 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could could use the last bit of page pointer in a radix tree to track if a swap entry refers to a lower or a higher part of a ZBUD page. I think it is a serious hacking/obfuscation but it may work with the minimal amount of changes to MM. Adding only (x&~3) while extracting page pointer is probably enough. What do you think about this idea? >> >> In case of zbud, there are two swap offset pointing to >> the same page. There might be more if zsmalloc is used. >> What is worse it is possible that one swap entry could >> point to data that cross a page boundary. > > We just won't set page->index since it doesn't have a good meaning in > our case. Swap cache pages also don't use index, although is seems to > me that they could since there is a 1:1 mapping of a swap cache page to > a swap offset and the index field isn't being used for anything else. > But I digress... OK. > >> >> Of course, one could try to modify MM to support >> multiple mapping of a page in the radix tree. >> But I think that MM guys will consider this as a hack >> and they will not accept it. > > Yes, it will require some changes to the MM to handle zbud pages on the > LRU. I'm thinking that it won't be too intrusive, depending on how we > choose to mark zbud pages. > Anyway, I think that zswap should use two index engines. I mean index in Data Base meaning. One index is used to translate swap_entry to compressed page. And another one to be used by reclaim and migration by MM, probably address_space is a best choice. Zbud would responsible for keeping consistency between mentioned indexes. Regards, Tomasz Stanislawski > Seth > >> >> Regards, >> Tomasz Stanislawski >> >> >>> -- >>> To unsubscribe, send a message with
Re: [PATCH v2 0/5] mm: migrate zbud pages
On 09/25/2013 11:57 PM, Seth Jennings wrote: On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote: I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. This might actually do away with the explicit pool size too as the compressed pool pages wouldn't be outside the control of the MM anymore. I'm just starting to explore this but I think it has promise. Seth Hi Seth, There is a problem with the proposed idea. The radix tree used 'struct address_space' is a part of a bigger data structure. The radix tree is used to translate an offset to a page. That is ok for zswap. But struct page has a field named 'index'. The MM assumes that this index is an offset in radix tree where one can find the page. A lot is done by MM to sustain this consistency. Yes, this is how it is for page cache pages. However, the MM is able to work differently with anonymous pages. In the case of an anonymous page, the mapping field points to an anon_vma struct, or, if ksm in enabled and dedup'ing the page, a private ksm tracking structure. If the anonymous page is fully unmapped and resides only in the swap cache, the page mapping is NULL. So there is precedent for the fields to mean other things. Hi Seth, You are right that page-mapping is NULL for pages in swap_cache but page_mapping() is not NULL in such a case. The mapping is taken from struct address_space swapper_spaces[]. It is still an address space, and it should preserve constraints for struct address_space. The same happen for page-index and page_index(). The question is how to mark and identify zbud pages among the other page types that will be on the LRU. There are many ways. The question is what is the best and most acceptable way. If you consider hacking I have some idea how address_space could utilized for ZBUD. One solution whould be using tags in a radix tree. Every entry in a radix tree can have a few bits assigned to it. Currently 3 bits are supported: From include/linux/fs.h #define PAGECACHE_TAG_DIRTY 0 #define PAGECACHE_TAG_WRITEBACK 1 #define PAGECACHE_TAG_TOWRITE2 You could add a new bit or utilize one of existing ones. The other idea is use a trick from a RB trees and scatter-gather lists. I mean using the last bits of pointers to keep some metadata. Values of 'struct page *' variables are aligned to a pointer alignment which is 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could could use the last bit of page pointer in a radix tree to track if a swap entry refers to a lower or a higher part of a ZBUD page. I think it is a serious hacking/obfuscation but it may work with the minimal amount of changes to MM. Adding only (x~3) while extracting page pointer is probably enough. What do you think about this idea? In case of zbud, there are two swap offset pointing to the same page. There might be more if zsmalloc is used. What is worse it is possible that one swap entry could point to data that cross a page boundary. We just won't set page-index since it doesn't have a good meaning in our case. Swap cache pages also don't use index, although is seems to me that they could since there is a 1:1 mapping of a swap cache page to a swap offset and the index field isn't being used for anything else. But I digress... OK. Of course, one could try to modify MM to support multiple mapping of a page in the radix tree. But I think that MM guys will consider this as a hack and they will not accept it. Yes, it will require some changes to the MM to handle zbud pages on the LRU. I'm thinking that it won't be too intrusive, depending on how we choose to mark zbud pages. Anyway, I think that zswap should use two index engines. I mean index in Data Base meaning. One index is used to translate swap_entry to compressed page. And another one to be used by reclaim and migration by MM, probably address_space is a best choice. Zbud would responsible for keeping consistency between mentioned indexes. Regards, Tomasz Stanislawski Seth Regards, Tomasz Stanislawski -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org;
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote: On 09/25/2013 11:57 PM, Seth Jennings wrote: On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote: I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. This might actually do away with the explicit pool size too as the compressed pool pages wouldn't be outside the control of the MM anymore. I'm just starting to explore this but I think it has promise. Seth Hi Seth, There is a problem with the proposed idea. The radix tree used 'struct address_space' is a part of a bigger data structure. The radix tree is used to translate an offset to a page. That is ok for zswap. But struct page has a field named 'index'. The MM assumes that this index is an offset in radix tree where one can find the page. A lot is done by MM to sustain this consistency. Yes, this is how it is for page cache pages. However, the MM is able to work differently with anonymous pages. In the case of an anonymous page, the mapping field points to an anon_vma struct, or, if ksm in enabled and dedup'ing the page, a private ksm tracking structure. If the anonymous page is fully unmapped and resides only in the swap cache, the page mapping is NULL. So there is precedent for the fields to mean other things. Hi Seth, You are right that page-mapping is NULL for pages in swap_cache but page_mapping() is not NULL in such a case. The mapping is taken from struct address_space swapper_spaces[]. It is still an address space, and it should preserve constraints for struct address_space. The same happen for page-index and page_index(). The question is how to mark and identify zbud pages among the other page types that will be on the LRU. There are many ways. The question is what is the best and most acceptable way. If you consider hacking I have some idea how address_space could utilized for ZBUD. One solution whould be using tags in a radix tree. Every entry in a radix tree can have a few bits assigned to it. Currently 3 bits are supported: From include/linux/fs.h #define PAGECACHE_TAG_DIRTY 0 #define PAGECACHE_TAG_WRITEBACK 1 #define PAGECACHE_TAG_TOWRITE2 You could add a new bit or utilize one of existing ones. The other idea is use a trick from a RB trees and scatter-gather lists. I mean using the last bits of pointers to keep some metadata. Values of 'struct page *' variables are aligned to a pointer alignment which is 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could could use the last bit of page pointer in a radix tree to track if a swap entry refers to a lower or a higher part of a ZBUD page. I think it is a serious hacking/obfuscation but it may work with the minimal amount of changes to MM. Adding only (x~3) while extracting page pointer is probably enough. What do you think about this idea? I think it is a good one. I have to say that when I first came up with the idea, I was thinking the address space would be at the zswap layer and the radix slots would hold zbud handles, not struct page pointers. However, as I have discovered today, this is problematic when it comes to reclaim and migration and serializing access. I wanted to do as much as possible in the zswap layer since anything done in the zbud layer would need to be duplicated in any other future allocator that zswap wanted to support. Unfortunately, zbud abstracts away the struct page and that visibility is needed to properly do what we are talking about. So maybe it is inevitable that this will need to be in the zbud code with the radix tree slots pointing to struct pages after all. I like the idea of masking the bit into the struct page pointer to indicate which buddy maps to the offset. There is a twist here in that, unlike a normal page cache tree, we can have two offsets pointing at different buddies in the same frame which means we'll have to do some custom stuff for migration. The rabbit hole I was going down today has come to an end so I'll take a fresh look next week. Thanks for your ideas and discussion! Maybe we can make zswap/zbud an upstanding MM citizen yet! Seth In case of zbud, there are two swap offset pointing to
Re: [PATCH v2 0/5] mm: migrate zbud pages
On 09/28/2013 06:00 AM, Seth Jennings wrote: On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote: On 09/25/2013 11:57 PM, Seth Jennings wrote: On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote: I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. This might actually do away with the explicit pool size too as the compressed pool pages wouldn't be outside the control of the MM anymore. I'm just starting to explore this but I think it has promise. Seth Hi Seth, There is a problem with the proposed idea. The radix tree used 'struct address_space' is a part of a bigger data structure. The radix tree is used to translate an offset to a page. That is ok for zswap. But struct page has a field named 'index'. The MM assumes that this index is an offset in radix tree where one can find the page. A lot is done by MM to sustain this consistency. Yes, this is how it is for page cache pages. However, the MM is able to work differently with anonymous pages. In the case of an anonymous page, the mapping field points to an anon_vma struct, or, if ksm in enabled and dedup'ing the page, a private ksm tracking structure. If the anonymous page is fully unmapped and resides only in the swap cache, the page mapping is NULL. So there is precedent for the fields to mean other things. Hi Seth, You are right that page-mapping is NULL for pages in swap_cache but page_mapping() is not NULL in such a case. The mapping is taken from struct address_space swapper_spaces[]. It is still an address space, and it should preserve constraints for struct address_space. The same happen for page-index and page_index(). The question is how to mark and identify zbud pages among the other page types that will be on the LRU. There are many ways. The question is what is the best and most acceptable way. If you consider hacking I have some idea how address_space could utilized for ZBUD. One solution whould be using tags in a radix tree. Every entry in a radix tree can have a few bits assigned to it. Currently 3 bits are supported: From include/linux/fs.h #define PAGECACHE_TAG_DIRTY 0 #define PAGECACHE_TAG_WRITEBACK 1 #define PAGECACHE_TAG_TOWRITE2 You could add a new bit or utilize one of existing ones. The other idea is use a trick from a RB trees and scatter-gather lists. I mean using the last bits of pointers to keep some metadata. Values of 'struct page *' variables are aligned to a pointer alignment which is 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could could use the last bit of page pointer in a radix tree to track if a swap entry refers to a lower or a higher part of a ZBUD page. I think it is a serious hacking/obfuscation but it may work with the minimal amount of changes to MM. Adding only (x~3) while extracting page pointer is probably enough. What do you think about this idea? I think it is a good one. I have to say that when I first came up with the idea, I was thinking the address space would be at the zswap layer and the radix slots would hold zbud handles, not struct page pointers. However, as I have discovered today, this is problematic when it comes to reclaim and migration and serializing access. I wanted to do as much as possible in the zswap layer since anything done in the zbud layer would need to be duplicated in any other future allocator that zswap wanted to support. Unfortunately, zbud abstracts away the struct page and that visibility is needed to properly do what we are talking about. So maybe it is inevitable that this will need to be in the zbud code with the radix tree slots pointing to struct pages after all. But in this way, zswap_frontswap_load() can't find zswap_entry. We still need the rbtree in current zswap. I like the idea of masking the bit into the struct page pointer to indicate which buddy maps to the offset. I have no idea why we need this. My idea is connect zbud page with a address space and add zbud page to LRU list only without any radix tree. zswap_entry can be still in rbtree or maybe changed to radix tree. There is a sample code in my previous email. -- Regards, -Bob -- To unsubscribe from this list: send the line unsubscribe
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote: > > I just had an idea this afternoon to potentially kill both these birds with > > one > > stone: Replace the rbtree in zswap with an address_space. > > > > Each swap type would have its own page_tree to organize the compressed > > objects > > by type and offset (radix tree is more suited for this anyway) and a_ops > > that > > could be called by shrink_page_list() (writepage) or the migration code > > (migratepage). > > > > Then zbud pages could be put on the normal LRU list, maybe at the beginning > > of > > the inactive LRU so they would live for another cycle through the list, > > then be > > reclaimed in the normal way with the mapping->a_ops->writepage() pointing > > to a > > zswap_writepage() function that would decompress the pages and call > > __swap_writepage() on them. > > > > This might actually do away with the explicit pool size too as the > > compressed > > pool pages wouldn't be outside the control of the MM anymore. > > > > I'm just starting to explore this but I think it has promise. > > > > Seth > > > > Hi Seth, > There is a problem with the proposed idea. > The radix tree used 'struct address_space' is a part of > a bigger data structure. > The radix tree is used to translate an offset to a page. > That is ok for zswap. But struct page has a field named 'index'. > The MM assumes that this index is an offset in radix tree > where one can find the page. A lot is done by MM to sustain > this consistency. Yes, this is how it is for page cache pages. However, the MM is able to work differently with anonymous pages. In the case of an anonymous page, the mapping field points to an anon_vma struct, or, if ksm in enabled and dedup'ing the page, a private ksm tracking structure. If the anonymous page is fully unmapped and resides only in the swap cache, the page mapping is NULL. So there is precedent for the fields to mean other things. The question is how to mark and identify zbud pages among the other page types that will be on the LRU. There are many ways. The question is what is the best and most acceptable way. > > In case of zbud, there are two swap offset pointing to > the same page. There might be more if zsmalloc is used. > What is worse it is possible that one swap entry could > point to data that cross a page boundary. We just won't set page->index since it doesn't have a good meaning in our case. Swap cache pages also don't use index, although is seems to me that they could since there is a 1:1 mapping of a swap cache page to a swap offset and the index field isn't being used for anything else. But I digress... > > Of course, one could try to modify MM to support > multiple mapping of a page in the radix tree. > But I think that MM guys will consider this as a hack > and they will not accept it. Yes, it will require some changes to the MM to handle zbud pages on the LRU. I'm thinking that it won't be too intrusive, depending on how we choose to mark zbud pages. Seth > > Regards, > Tomasz Stanislawski > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majord...@kvack.org. For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
> I just had an idea this afternoon to potentially kill both these birds with > one > stone: Replace the rbtree in zswap with an address_space. > > Each swap type would have its own page_tree to organize the compressed objects > by type and offset (radix tree is more suited for this anyway) and a_ops that > could be called by shrink_page_list() (writepage) or the migration code > (migratepage). > > Then zbud pages could be put on the normal LRU list, maybe at the beginning of > the inactive LRU so they would live for another cycle through the list, then > be > reclaimed in the normal way with the mapping->a_ops->writepage() pointing to a > zswap_writepage() function that would decompress the pages and call > __swap_writepage() on them. > > This might actually do away with the explicit pool size too as the compressed > pool pages wouldn't be outside the control of the MM anymore. > > I'm just starting to explore this but I think it has promise. > > Seth > Hi Seth, There is a problem with the proposed idea. The radix tree used 'struct address_space' is a part of a bigger data structure. The radix tree is used to translate an offset to a page. That is ok for zswap. But struct page has a field named 'index'. The MM assumes that this index is an offset in radix tree where one can find the page. A lot is done by MM to sustain this consistency. In case of zbud, there are two swap offset pointing to the same page. There might be more if zsmalloc is used. What is worse it is possible that one swap entry could point to data that cross a page boundary. Of course, one could try to modify MM to support multiple mapping of a page in the radix tree. But I think that MM guys will consider this as a hack and they will not accept it. Regards, Tomasz Stanislawski > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. This might actually do away with the explicit pool size too as the compressed pool pages wouldn't be outside the control of the MM anymore. I'm just starting to explore this but I think it has promise. Seth Hi Seth, There is a problem with the proposed idea. The radix tree used 'struct address_space' is a part of a bigger data structure. The radix tree is used to translate an offset to a page. That is ok for zswap. But struct page has a field named 'index'. The MM assumes that this index is an offset in radix tree where one can find the page. A lot is done by MM to sustain this consistency. In case of zbud, there are two swap offset pointing to the same page. There might be more if zsmalloc is used. What is worse it is possible that one swap entry could point to data that cross a page boundary. Of course, one could try to modify MM to support multiple mapping of a page in the radix tree. But I think that MM guys will consider this as a hack and they will not accept it. Regards, Tomasz Stanislawski -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote: I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. This might actually do away with the explicit pool size too as the compressed pool pages wouldn't be outside the control of the MM anymore. I'm just starting to explore this but I think it has promise. Seth Hi Seth, There is a problem with the proposed idea. The radix tree used 'struct address_space' is a part of a bigger data structure. The radix tree is used to translate an offset to a page. That is ok for zswap. But struct page has a field named 'index'. The MM assumes that this index is an offset in radix tree where one can find the page. A lot is done by MM to sustain this consistency. Yes, this is how it is for page cache pages. However, the MM is able to work differently with anonymous pages. In the case of an anonymous page, the mapping field points to an anon_vma struct, or, if ksm in enabled and dedup'ing the page, a private ksm tracking structure. If the anonymous page is fully unmapped and resides only in the swap cache, the page mapping is NULL. So there is precedent for the fields to mean other things. The question is how to mark and identify zbud pages among the other page types that will be on the LRU. There are many ways. The question is what is the best and most acceptable way. In case of zbud, there are two swap offset pointing to the same page. There might be more if zsmalloc is used. What is worse it is possible that one swap entry could point to data that cross a page boundary. We just won't set page-index since it doesn't have a good meaning in our case. Swap cache pages also don't use index, although is seems to me that they could since there is a 1:1 mapping of a swap cache page to a swap offset and the index field isn't being used for anything else. But I digress... Of course, one could try to modify MM to support multiple mapping of a page in the radix tree. But I think that MM guys will consider this as a hack and they will not accept it. Yes, it will require some changes to the MM to handle zbud pages on the LRU. I'm thinking that it won't be too intrusive, depending on how we choose to mark zbud pages. Seth Regards, Tomasz Stanislawski -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Tue, Sep 24, 2013 at 5:20 PM, Krzysztof Kozlowski wrote: > Hi, > > On pon, 2013-09-23 at 17:07 -0500, Seth Jennings wrote: >> On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote: >> > Mel mentioned several problems about zswap/zbud in thread "[PATCH v6 >> > 0/5] zram/zsmalloc promotion". >> > >> > Like "it's clunky as hell and the layering between zswap and zbud is >> > twisty" and "I think I brought up its stalling behaviour during review >> > when it was being merged. It would have been preferable if writeback >> > could be initiated in batches and then waited on at the very least.. >> > It's worse that it uses _swap_writepage directly instead of going >> > through a writepage ops. It would have been better if zbud pages >> > existed on the LRU and written back with an address space ops and >> > properly handled asynchonous writeback." >> > >> > So I think it would be better if we can address those issues at first >> > and it would be easier to address these issues before adding more new >> > features. Welcome any ideas. >> >> I just had an idea this afternoon to potentially kill both these birds with >> one >> stone: Replace the rbtree in zswap with an address_space. >> >> Each swap type would have its own page_tree to organize the compressed >> objects >> by type and offset (radix tree is more suited for this anyway) and a_ops that >> could be called by shrink_page_list() (writepage) or the migration code >> (migratepage). >> >> Then zbud pages could be put on the normal LRU list, maybe at the beginning >> of >> the inactive LRU so they would live for another cycle through the list, then >> be >> reclaimed in the normal way with the mapping->a_ops->writepage() pointing to >> a >> zswap_writepage() function that would decompress the pages and call >> __swap_writepage() on them. > > How exactly the address space can be used here? Do you want to point to > zbud pages in address_space.page_tree? If yes then which index should be > used? > I didn't get the point neither. I think introduce address_space is enough. 1. zbud.c: static struct address_space_operations zbud_aops = { .writepage= zswap_write_page, }; struct address_space zbud_space = { .a_ops = _aops, }; zbud_alloc() { zbud_page = alloc_page(); zbud_page->mapping = (struct address_space *)_space; set_page_private(page, (unsigned long)pool); lru_add_anon(zbud_page); } 2. zswap.c static int zswap_writepage(struct page *page, struct writeback_control *wbc) { handle = encode_handle(page_address(page), FIRST)); zswap_writeback_entry(pool, handle); handle = encode_handle(page_address(page), LAST)); zswap_writeback_entry(pool, handle); } Of course it may need lots of work for core MM subsystem can maintain zbud pages. But in this way, we can get rid of the clunky reclaiming layer and integrate zswap closely with core MM subsystem which knows better how many zbud pages can be used and when should trigger the zbud pages reclaim. -- Regards, --Bob -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
Hi, On pon, 2013-09-23 at 17:07 -0500, Seth Jennings wrote: > On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote: > > Mel mentioned several problems about zswap/zbud in thread "[PATCH v6 > > 0/5] zram/zsmalloc promotion". > > > > Like "it's clunky as hell and the layering between zswap and zbud is > > twisty" and "I think I brought up its stalling behaviour during review > > when it was being merged. It would have been preferable if writeback > > could be initiated in batches and then waited on at the very least.. > > It's worse that it uses _swap_writepage directly instead of going > > through a writepage ops. It would have been better if zbud pages > > existed on the LRU and written back with an address space ops and > > properly handled asynchonous writeback." > > > > So I think it would be better if we can address those issues at first > > and it would be easier to address these issues before adding more new > > features. Welcome any ideas. > > I just had an idea this afternoon to potentially kill both these birds with > one > stone: Replace the rbtree in zswap with an address_space. > > Each swap type would have its own page_tree to organize the compressed objects > by type and offset (radix tree is more suited for this anyway) and a_ops that > could be called by shrink_page_list() (writepage) or the migration code > (migratepage). > > Then zbud pages could be put on the normal LRU list, maybe at the beginning of > the inactive LRU so they would live for another cycle through the list, then > be > reclaimed in the normal way with the mapping->a_ops->writepage() pointing to a > zswap_writepage() function that would decompress the pages and call > __swap_writepage() on them. How exactly the address space can be used here? Do you want to point to zbud pages in address_space.page_tree? If yes then which index should be used? Best regards, Krzysztof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
Hi, On pon, 2013-09-23 at 17:07 -0500, Seth Jennings wrote: On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote: Mel mentioned several problems about zswap/zbud in thread [PATCH v6 0/5] zram/zsmalloc promotion. Like it's clunky as hell and the layering between zswap and zbud is twisty and I think I brought up its stalling behaviour during review when it was being merged. It would have been preferable if writeback could be initiated in batches and then waited on at the very least.. It's worse that it uses _swap_writepage directly instead of going through a writepage ops. It would have been better if zbud pages existed on the LRU and written back with an address space ops and properly handled asynchonous writeback. So I think it would be better if we can address those issues at first and it would be easier to address these issues before adding more new features. Welcome any ideas. I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. How exactly the address space can be used here? Do you want to point to zbud pages in address_space.page_tree? If yes then which index should be used? Best regards, Krzysztof -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Tue, Sep 24, 2013 at 5:20 PM, Krzysztof Kozlowski k.kozlow...@samsung.com wrote: Hi, On pon, 2013-09-23 at 17:07 -0500, Seth Jennings wrote: On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote: Mel mentioned several problems about zswap/zbud in thread [PATCH v6 0/5] zram/zsmalloc promotion. Like it's clunky as hell and the layering between zswap and zbud is twisty and I think I brought up its stalling behaviour during review when it was being merged. It would have been preferable if writeback could be initiated in batches and then waited on at the very least.. It's worse that it uses _swap_writepage directly instead of going through a writepage ops. It would have been better if zbud pages existed on the LRU and written back with an address space ops and properly handled asynchonous writeback. So I think it would be better if we can address those issues at first and it would be easier to address these issues before adding more new features. Welcome any ideas. I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. How exactly the address space can be used here? Do you want to point to zbud pages in address_space.page_tree? If yes then which index should be used? I didn't get the point neither. I think introduce address_space is enough. 1. zbud.c: static struct address_space_operations zbud_aops = { .writepage= zswap_write_page, }; struct address_space zbud_space = { .a_ops = zbud_aops, }; zbud_alloc() { zbud_page = alloc_page(); zbud_page-mapping = (struct address_space *)zbud_space; set_page_private(page, (unsigned long)pool); lru_add_anon(zbud_page); } 2. zswap.c static int zswap_writepage(struct page *page, struct writeback_control *wbc) { handle = encode_handle(page_address(page), FIRST)); zswap_writeback_entry(pool, handle); handle = encode_handle(page_address(page), LAST)); zswap_writeback_entry(pool, handle); } Of course it may need lots of work for core MM subsystem can maintain zbud pages. But in this way, we can get rid of the clunky reclaiming layer and integrate zswap closely with core MM subsystem which knows better how many zbud pages can be used and when should trigger the zbud pages reclaim. -- Regards, --Bob -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote: > Mel mentioned several problems about zswap/zbud in thread "[PATCH v6 > 0/5] zram/zsmalloc promotion". > > Like "it's clunky as hell and the layering between zswap and zbud is > twisty" and "I think I brought up its stalling behaviour during review > when it was being merged. It would have been preferable if writeback > could be initiated in batches and then waited on at the very least.. > It's worse that it uses _swap_writepage directly instead of going > through a writepage ops. It would have been better if zbud pages > existed on the LRU and written back with an address space ops and > properly handled asynchonous writeback." > > So I think it would be better if we can address those issues at first > and it would be easier to address these issues before adding more new > features. Welcome any ideas. I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping->a_ops->writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. This might actually do away with the explicit pool size too as the compressed pool pages wouldn't be outside the control of the MM anymore. I'm just starting to explore this but I think it has promise. Seth -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote: > Hi Krzysztof, > > On 09/11/2013 04:58 PM, Krzysztof Kozlowski wrote: > > Hi, > > > > Currently zbud pages are not movable and they cannot be allocated from CMA > > (Contiguous Memory Allocator) region. These patches add migration of zbud > > pages. > > > > I agree that the migration of zbud pages is important so that system > will not enter order-0 page fragmentation and can be helpful for page > compaction/huge pages etc.. > > But after I looked at the [patch 4/5], I found it will make zbud very > complicated. > I'd prefer to add this migration feature later until current version > zswap/zbud becomes better enough and more stable. I agree with this. We are also looking to add zsmalloc as an option too. It would be nice to come up with a solution that worked for both (any) allocator that zswap used. > > Mel mentioned several problems about zswap/zbud in thread "[PATCH v6 > 0/5] zram/zsmalloc promotion". > > Like "it's clunky as hell and the layering between zswap and zbud is > twisty" and "I think I brought up its stalling behaviour during review > when it was being merged. It would have been preferable if writeback > could be initiated in batches and then waited on at the very least.. > It's worse that it uses _swap_writepage directly instead of going > through a writepage ops. It would have been better if zbud pages > existed on the LRU and written back with an address space ops and > properly handled asynchonous writeback." Yes, the laying in zswap vs zbud is wonky and should be addressed before adding new layers. > > So I think it would be better if we can address those issues at first > and it would be easier to address these issues before adding more new > features. Welcome any ideas. Agreed. Seth -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote: Hi Krzysztof, On 09/11/2013 04:58 PM, Krzysztof Kozlowski wrote: Hi, Currently zbud pages are not movable and they cannot be allocated from CMA (Contiguous Memory Allocator) region. These patches add migration of zbud pages. I agree that the migration of zbud pages is important so that system will not enter order-0 page fragmentation and can be helpful for page compaction/huge pages etc.. But after I looked at the [patch 4/5], I found it will make zbud very complicated. I'd prefer to add this migration feature later until current version zswap/zbud becomes better enough and more stable. I agree with this. We are also looking to add zsmalloc as an option too. It would be nice to come up with a solution that worked for both (any) allocator that zswap used. Mel mentioned several problems about zswap/zbud in thread [PATCH v6 0/5] zram/zsmalloc promotion. Like it's clunky as hell and the layering between zswap and zbud is twisty and I think I brought up its stalling behaviour during review when it was being merged. It would have been preferable if writeback could be initiated in batches and then waited on at the very least.. It's worse that it uses _swap_writepage directly instead of going through a writepage ops. It would have been better if zbud pages existed on the LRU and written back with an address space ops and properly handled asynchonous writeback. Yes, the laying in zswap vs zbud is wonky and should be addressed before adding new layers. So I think it would be better if we can address those issues at first and it would be easier to address these issues before adding more new features. Welcome any ideas. Agreed. Seth -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote: Mel mentioned several problems about zswap/zbud in thread [PATCH v6 0/5] zram/zsmalloc promotion. Like it's clunky as hell and the layering between zswap and zbud is twisty and I think I brought up its stalling behaviour during review when it was being merged. It would have been preferable if writeback could be initiated in batches and then waited on at the very least.. It's worse that it uses _swap_writepage directly instead of going through a writepage ops. It would have been better if zbud pages existed on the LRU and written back with an address space ops and properly handled asynchonous writeback. So I think it would be better if we can address those issues at first and it would be easier to address these issues before adding more new features. Welcome any ideas. I just had an idea this afternoon to potentially kill both these birds with one stone: Replace the rbtree in zswap with an address_space. Each swap type would have its own page_tree to organize the compressed objects by type and offset (radix tree is more suited for this anyway) and a_ops that could be called by shrink_page_list() (writepage) or the migration code (migratepage). Then zbud pages could be put on the normal LRU list, maybe at the beginning of the inactive LRU so they would live for another cycle through the list, then be reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a zswap_writepage() function that would decompress the pages and call __swap_writepage() on them. This might actually do away with the explicit pool size too as the compressed pool pages wouldn't be outside the control of the MM anymore. I'm just starting to explore this but I think it has promise. Seth -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
Hi Krzysztof, On 09/11/2013 04:58 PM, Krzysztof Kozlowski wrote: > Hi, > > Currently zbud pages are not movable and they cannot be allocated from CMA > (Contiguous Memory Allocator) region. These patches add migration of zbud > pages. > I agree that the migration of zbud pages is important so that system will not enter order-0 page fragmentation and can be helpful for page compaction/huge pages etc.. But after I looked at the [patch 4/5], I found it will make zbud very complicated. I'd prefer to add this migration feature later until current version zswap/zbud becomes better enough and more stable. Mel mentioned several problems about zswap/zbud in thread "[PATCH v6 0/5] zram/zsmalloc promotion". Like "it's clunky as hell and the layering between zswap and zbud is twisty" and "I think I brought up its stalling behaviour during review when it was being merged. It would have been preferable if writeback could be initiated in batches and then waited on at the very least.. It's worse that it uses _swap_writepage directly instead of going through a writepage ops. It would have been better if zbud pages existed on the LRU and written back with an address space ops and properly handled asynchonous writeback." So I think it would be better if we can address those issues at first and it would be easier to address these issues before adding more new features. Welcome any ideas. -- Regards, -Bob -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] mm: migrate zbud pages
Hi Krzysztof, On 09/11/2013 04:58 PM, Krzysztof Kozlowski wrote: Hi, Currently zbud pages are not movable and they cannot be allocated from CMA (Contiguous Memory Allocator) region. These patches add migration of zbud pages. I agree that the migration of zbud pages is important so that system will not enter order-0 page fragmentation and can be helpful for page compaction/huge pages etc.. But after I looked at the [patch 4/5], I found it will make zbud very complicated. I'd prefer to add this migration feature later until current version zswap/zbud becomes better enough and more stable. Mel mentioned several problems about zswap/zbud in thread [PATCH v6 0/5] zram/zsmalloc promotion. Like it's clunky as hell and the layering between zswap and zbud is twisty and I think I brought up its stalling behaviour during review when it was being merged. It would have been preferable if writeback could be initiated in batches and then waited on at the very least.. It's worse that it uses _swap_writepage directly instead of going through a writepage ops. It would have been better if zbud pages existed on the LRU and written back with an address space ops and properly handled asynchonous writeback. So I think it would be better if we can address those issues at first and it would be easier to address these issues before adding more new features. Welcome any ideas. -- Regards, -Bob -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/