Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-10-03 Thread Krzysztof Kozlowski
On wto, 2013-10-01 at 16:04 -0500, Seth Jennings wrote:
> Yes, it is very similar.  I'm beginning to like aspects of this patch
> more as I explore this issue more.
> 
> At first, I balked at the idea of yet another abstraction layer, but it
> is very hard to avoid unless you want to completely collapse zswap and
> zbud into one another and dissolve the layering.  Then you could do a
> direct swap_offset -> address mapping.

After discussion with Tomasz Stanislawski we had an idea of merging the
trees (zswap's rb and zbud's radix added in these patches) into one tree
in zbud layer.

This would simplify the design (if migration was added, of course).

The idea looks like:
1. Get rid of the red-black tree in zswap.
2. Add radix tree to zbud (or use radix tree from address space).
 - Use offset (from swp_entry) as index to radix tree.
 - zbud page (struct page) stored in tree.
4. With both buddies filled one zbud page would be put in radix tree
twice.
5. zbud API would look like:
zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, pgoff_t offset)
zbud_free(struct zbud_pool *pool, pgoff_t offset)
zbud_map(struct zbud_pool *pool, pgoff_t offset)
etc.

6. zbud_map/unmap() would be a little more complex than now as it would
took over some code from zswap (finding offset in tree).

7. The radix tree would be used for:
 - finding entry by offset (for zswap_frontswap_load() and others),
 - migration.

8. In case of migration colliding with zbud_map/unmap() the locking
could be limited (in comparison to my patch). Calling zbud_map() would
mark a page "dirty". During migration if page was "dirtied" then
migration would fail with EAGAIN. Of course migration won't start if
zbud buddy was mapped.


What do you think about this?


Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-10-03 Thread Krzysztof Kozlowski
On wto, 2013-10-01 at 16:04 -0500, Seth Jennings wrote:
 Yes, it is very similar.  I'm beginning to like aspects of this patch
 more as I explore this issue more.
 
 At first, I balked at the idea of yet another abstraction layer, but it
 is very hard to avoid unless you want to completely collapse zswap and
 zbud into one another and dissolve the layering.  Then you could do a
 direct swap_offset - address mapping.

After discussion with Tomasz Stanislawski we had an idea of merging the
trees (zswap's rb and zbud's radix added in these patches) into one tree
in zbud layer.

This would simplify the design (if migration was added, of course).

The idea looks like:
1. Get rid of the red-black tree in zswap.
2. Add radix tree to zbud (or use radix tree from address space).
 - Use offset (from swp_entry) as index to radix tree.
 - zbud page (struct page) stored in tree.
4. With both buddies filled one zbud page would be put in radix tree
twice.
5. zbud API would look like:
zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, pgoff_t offset)
zbud_free(struct zbud_pool *pool, pgoff_t offset)
zbud_map(struct zbud_pool *pool, pgoff_t offset)
etc.

6. zbud_map/unmap() would be a little more complex than now as it would
took over some code from zswap (finding offset in tree).

7. The radix tree would be used for:
 - finding entry by offset (for zswap_frontswap_load() and others),
 - migration.

8. In case of migration colliding with zbud_map/unmap() the locking
could be limited (in comparison to my patch). Calling zbud_map() would
mark a page dirty. During migration if page was dirtied then
migration would fail with EAGAIN. Of course migration won't start if
zbud buddy was mapped.


What do you think about this?


Best regards,
Krzysztof

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-10-01 Thread Seth Jennings
On Mon, Sep 30, 2013 at 10:28:46AM +0200, Krzysztof Kozlowski wrote:
> On pią, 2013-09-27 at 17:00 -0500, Seth Jennings wrote:
> > I have to say that when I first came up with the idea, I was thinking
> > the address space would be at the zswap layer and the radix slots would
> > hold zbud handles, not struct page pointers.
> > 
> > However, as I have discovered today, this is problematic when it comes
> > to reclaim and migration and serializing access.
> > 
> > I wanted to do as much as possible in the zswap layer since anything
> > done in the zbud layer would need to be duplicated in any other future
> > allocator that zswap wanted to support.
> > 
> > Unfortunately, zbud abstracts away the struct page and that visibility
> > is needed to properly do what we are talking about.
> > 
> > So maybe it is inevitable that this will need to be in the zbud code
> > with the radix tree slots pointing to struct pages after all.
> 
> To me it looks very similar to the solution proposed in my patches.

Yes, it is very similar.  I'm beginning to like aspects of this patch
more as I explore this issue more.

At first, I balked at the idea of yet another abstraction layer, but it
is very hard to avoid unless you want to completely collapse zswap and
zbud into one another and dissolve the layering.  Then you could do a
direct swap_offset -> address mapping.

> The
> difference is that you wish to use offset as radix tree index.
> I thought about this earlier but it imposed two problems:
> 
> 1. A generalized handle (instead of offset) may be more suitable when
> zbud will be used in other drivers (e.g. zram).
> 
> 2. It requires redesigning of zswap architecture around
> zswap_frontswap_store() in case of duplicated insertion. Currently when
> storing a page the zswap:
>  - allocates zbud page,
>  - stores new data in it,
>  - checks whether it is a duplicated page (same offset present in
> rbtree),
>  - if yes (duplicated) then zswap frees previous entry.
> The problem here lies in allocating zbud page under the same offset.
> This step would replace old data (because we are using the same offset
> in radix tree).

Yes, but the offset is always going to be the key at the top layer
because that is was the swap subsystem uses.  So we'd have to have a
swap_offset -> handle -> address translation (2 abstraction layers) the
first of which would need to deal with the duplicate store issue.

Seth

> 
> In my opinion using zbud handle is in this case more flexible.
> 
> 
> Best regards,
> Krzysztof
> 
> > I like the idea of masking the bit into the struct page pointer to
> > indicate which buddy maps to the offset.
> > 
> > There is a twist here in that, unlike a normal page cache tree, we can
> > have two offsets pointing at different buddies in the same frame
> > which means we'll have to do some custom stuff for migration.
> > 
> > The rabbit hole I was going down today has come to an end so I'll take a
> > fresh look next week.
> > 
> > Thanks for your ideas and discussion! Maybe we can make zswap/zbud an
> > upstanding MM citizen yet!
> > 
> > Seth
> > 
> > > 
> > > >>
> > > >> In case of zbud, there are two swap offset pointing to
> > > >> the same page. There might be more if zsmalloc is used.
> > > >> What is worse it is possible that one swap entry could
> > > >> point to data that cross a page boundary.
> > > > 
> > > > We just won't set page->index since it doesn't have a good meaning in
> > > > our case.  Swap cache pages also don't use index, although is seems to
> > > > me that they could since there is a 1:1 mapping of a swap cache page to
> > > > a swap offset and the index field isn't being used for anything else.
> > > > But I digress...
> > > 
> > > OK.
> > > 
> > > > 
> > > >>
> > > >> Of course, one could try to modify MM to support
> > > >> multiple mapping of a page in the radix tree.
> > > >> But I think that MM guys will consider this as a hack
> > > >> and they will not accept it.
> > > > 
> > > > Yes, it will require some changes to the MM to handle zbud pages on the
> > > > LRU.  I'm thinking that it won't be too intrusive, depending on how we
> > > > choose to mark zbud pages.
> > > > 
> > > 
> > > Anyway, I think that zswap should use two index engines.
> > > I mean index in Data Base meaning.
> > > One index is used to translate swap_entry to compressed page.
> > > And another one to be used by reclaim and migration by MM,
> > > probably address_space is a best choice.
> > > Zbud would responsible for keeping consistency
> > > between mentioned indexes.
> > > 
> > > Regards,
> > > Tomasz Stanislawski
> > > 
> > > > Seth
> > > > 
> > > >>
> > > >> Regards,
> > > >> Tomasz Stanislawski
> > > >>
> > > >>
> > > >>> --
> > > >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > >>> the body to majord...@kvack.org.  For more info on Linux MM,
> > > >>> see: http://www.linux-mm.org/ .
> > > >>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> > > >>>
> > > >>
> > > > 
> > > 

Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-10-01 Thread Seth Jennings
On Mon, Sep 30, 2013 at 10:28:46AM +0200, Krzysztof Kozlowski wrote:
 On pią, 2013-09-27 at 17:00 -0500, Seth Jennings wrote:
  I have to say that when I first came up with the idea, I was thinking
  the address space would be at the zswap layer and the radix slots would
  hold zbud handles, not struct page pointers.
  
  However, as I have discovered today, this is problematic when it comes
  to reclaim and migration and serializing access.
  
  I wanted to do as much as possible in the zswap layer since anything
  done in the zbud layer would need to be duplicated in any other future
  allocator that zswap wanted to support.
  
  Unfortunately, zbud abstracts away the struct page and that visibility
  is needed to properly do what we are talking about.
  
  So maybe it is inevitable that this will need to be in the zbud code
  with the radix tree slots pointing to struct pages after all.
 
 To me it looks very similar to the solution proposed in my patches.

Yes, it is very similar.  I'm beginning to like aspects of this patch
more as I explore this issue more.

At first, I balked at the idea of yet another abstraction layer, but it
is very hard to avoid unless you want to completely collapse zswap and
zbud into one another and dissolve the layering.  Then you could do a
direct swap_offset - address mapping.

 The
 difference is that you wish to use offset as radix tree index.
 I thought about this earlier but it imposed two problems:
 
 1. A generalized handle (instead of offset) may be more suitable when
 zbud will be used in other drivers (e.g. zram).
 
 2. It requires redesigning of zswap architecture around
 zswap_frontswap_store() in case of duplicated insertion. Currently when
 storing a page the zswap:
  - allocates zbud page,
  - stores new data in it,
  - checks whether it is a duplicated page (same offset present in
 rbtree),
  - if yes (duplicated) then zswap frees previous entry.
 The problem here lies in allocating zbud page under the same offset.
 This step would replace old data (because we are using the same offset
 in radix tree).

Yes, but the offset is always going to be the key at the top layer
because that is was the swap subsystem uses.  So we'd have to have a
swap_offset - handle - address translation (2 abstraction layers) the
first of which would need to deal with the duplicate store issue.

Seth

 
 In my opinion using zbud handle is in this case more flexible.
 
 
 Best regards,
 Krzysztof
 
  I like the idea of masking the bit into the struct page pointer to
  indicate which buddy maps to the offset.
  
  There is a twist here in that, unlike a normal page cache tree, we can
  have two offsets pointing at different buddies in the same frame
  which means we'll have to do some custom stuff for migration.
  
  The rabbit hole I was going down today has come to an end so I'll take a
  fresh look next week.
  
  Thanks for your ideas and discussion! Maybe we can make zswap/zbud an
  upstanding MM citizen yet!
  
  Seth
  
   
   
In case of zbud, there are two swap offset pointing to
the same page. There might be more if zsmalloc is used.
What is worse it is possible that one swap entry could
point to data that cross a page boundary.

We just won't set page-index since it doesn't have a good meaning in
our case.  Swap cache pages also don't use index, although is seems to
me that they could since there is a 1:1 mapping of a swap cache page to
a swap offset and the index field isn't being used for anything else.
But I digress...
   
   OK.
   

   
Of course, one could try to modify MM to support
multiple mapping of a page in the radix tree.
But I think that MM guys will consider this as a hack
and they will not accept it.

Yes, it will require some changes to the MM to handle zbud pages on the
LRU.  I'm thinking that it won't be too intrusive, depending on how we
choose to mark zbud pages.

   
   Anyway, I think that zswap should use two index engines.
   I mean index in Data Base meaning.
   One index is used to translate swap_entry to compressed page.
   And another one to be used by reclaim and migration by MM,
   probably address_space is a best choice.
   Zbud would responsible for keeping consistency
   between mentioned indexes.
   
   Regards,
   Tomasz Stanislawski
   
Seth

   
Regards,
Tomasz Stanislawski
   
   
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
   
   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

   
 

--
To unsubscribe from this list: send the line unsubscribe 

Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-30 Thread Krzysztof Kozlowski
On pią, 2013-09-27 at 17:00 -0500, Seth Jennings wrote:
> I have to say that when I first came up with the idea, I was thinking
> the address space would be at the zswap layer and the radix slots would
> hold zbud handles, not struct page pointers.
> 
> However, as I have discovered today, this is problematic when it comes
> to reclaim and migration and serializing access.
> 
> I wanted to do as much as possible in the zswap layer since anything
> done in the zbud layer would need to be duplicated in any other future
> allocator that zswap wanted to support.
> 
> Unfortunately, zbud abstracts away the struct page and that visibility
> is needed to properly do what we are talking about.
> 
> So maybe it is inevitable that this will need to be in the zbud code
> with the radix tree slots pointing to struct pages after all.

To me it looks very similar to the solution proposed in my patches. The
difference is that you wish to use offset as radix tree index.
I thought about this earlier but it imposed two problems:

1. A generalized handle (instead of offset) may be more suitable when
zbud will be used in other drivers (e.g. zram).

2. It requires redesigning of zswap architecture around
zswap_frontswap_store() in case of duplicated insertion. Currently when
storing a page the zswap:
 - allocates zbud page,
 - stores new data in it,
 - checks whether it is a duplicated page (same offset present in
rbtree),
 - if yes (duplicated) then zswap frees previous entry.
The problem here lies in allocating zbud page under the same offset.
This step would replace old data (because we are using the same offset
in radix tree).

In my opinion using zbud handle is in this case more flexible.


Best regards,
Krzysztof

> I like the idea of masking the bit into the struct page pointer to
> indicate which buddy maps to the offset.
> 
> There is a twist here in that, unlike a normal page cache tree, we can
> have two offsets pointing at different buddies in the same frame
> which means we'll have to do some custom stuff for migration.
> 
> The rabbit hole I was going down today has come to an end so I'll take a
> fresh look next week.
> 
> Thanks for your ideas and discussion! Maybe we can make zswap/zbud an
> upstanding MM citizen yet!
> 
> Seth
> 
> > 
> > >>
> > >> In case of zbud, there are two swap offset pointing to
> > >> the same page. There might be more if zsmalloc is used.
> > >> What is worse it is possible that one swap entry could
> > >> point to data that cross a page boundary.
> > > 
> > > We just won't set page->index since it doesn't have a good meaning in
> > > our case.  Swap cache pages also don't use index, although is seems to
> > > me that they could since there is a 1:1 mapping of a swap cache page to
> > > a swap offset and the index field isn't being used for anything else.
> > > But I digress...
> > 
> > OK.
> > 
> > > 
> > >>
> > >> Of course, one could try to modify MM to support
> > >> multiple mapping of a page in the radix tree.
> > >> But I think that MM guys will consider this as a hack
> > >> and they will not accept it.
> > > 
> > > Yes, it will require some changes to the MM to handle zbud pages on the
> > > LRU.  I'm thinking that it won't be too intrusive, depending on how we
> > > choose to mark zbud pages.
> > > 
> > 
> > Anyway, I think that zswap should use two index engines.
> > I mean index in Data Base meaning.
> > One index is used to translate swap_entry to compressed page.
> > And another one to be used by reclaim and migration by MM,
> > probably address_space is a best choice.
> > Zbud would responsible for keeping consistency
> > between mentioned indexes.
> > 
> > Regards,
> > Tomasz Stanislawski
> > 
> > > Seth
> > > 
> > >>
> > >> Regards,
> > >> Tomasz Stanislawski
> > >>
> > >>
> > >>> --
> > >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > >>> the body to majord...@kvack.org.  For more info on Linux MM,
> > >>> see: http://www.linux-mm.org/ .
> > >>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> > >>>
> > >>
> > > 
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > the body to majord...@kvack.org.  For more info on Linux MM,
> > > see: http://www.linux-mm.org/ .
> > > Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> > > 
> > 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-30 Thread Krzysztof Kozlowski
On pią, 2013-09-27 at 17:00 -0500, Seth Jennings wrote:
 I have to say that when I first came up with the idea, I was thinking
 the address space would be at the zswap layer and the radix slots would
 hold zbud handles, not struct page pointers.
 
 However, as I have discovered today, this is problematic when it comes
 to reclaim and migration and serializing access.
 
 I wanted to do as much as possible in the zswap layer since anything
 done in the zbud layer would need to be duplicated in any other future
 allocator that zswap wanted to support.
 
 Unfortunately, zbud abstracts away the struct page and that visibility
 is needed to properly do what we are talking about.
 
 So maybe it is inevitable that this will need to be in the zbud code
 with the radix tree slots pointing to struct pages after all.

To me it looks very similar to the solution proposed in my patches. The
difference is that you wish to use offset as radix tree index.
I thought about this earlier but it imposed two problems:

1. A generalized handle (instead of offset) may be more suitable when
zbud will be used in other drivers (e.g. zram).

2. It requires redesigning of zswap architecture around
zswap_frontswap_store() in case of duplicated insertion. Currently when
storing a page the zswap:
 - allocates zbud page,
 - stores new data in it,
 - checks whether it is a duplicated page (same offset present in
rbtree),
 - if yes (duplicated) then zswap frees previous entry.
The problem here lies in allocating zbud page under the same offset.
This step would replace old data (because we are using the same offset
in radix tree).

In my opinion using zbud handle is in this case more flexible.


Best regards,
Krzysztof

 I like the idea of masking the bit into the struct page pointer to
 indicate which buddy maps to the offset.
 
 There is a twist here in that, unlike a normal page cache tree, we can
 have two offsets pointing at different buddies in the same frame
 which means we'll have to do some custom stuff for migration.
 
 The rabbit hole I was going down today has come to an end so I'll take a
 fresh look next week.
 
 Thanks for your ideas and discussion! Maybe we can make zswap/zbud an
 upstanding MM citizen yet!
 
 Seth
 
  
  
   In case of zbud, there are two swap offset pointing to
   the same page. There might be more if zsmalloc is used.
   What is worse it is possible that one swap entry could
   point to data that cross a page boundary.
   
   We just won't set page-index since it doesn't have a good meaning in
   our case.  Swap cache pages also don't use index, although is seems to
   me that they could since there is a 1:1 mapping of a swap cache page to
   a swap offset and the index field isn't being used for anything else.
   But I digress...
  
  OK.
  
   
  
   Of course, one could try to modify MM to support
   multiple mapping of a page in the radix tree.
   But I think that MM guys will consider this as a hack
   and they will not accept it.
   
   Yes, it will require some changes to the MM to handle zbud pages on the
   LRU.  I'm thinking that it won't be too intrusive, depending on how we
   choose to mark zbud pages.
   
  
  Anyway, I think that zswap should use two index engines.
  I mean index in Data Base meaning.
  One index is used to translate swap_entry to compressed page.
  And another one to be used by reclaim and migration by MM,
  probably address_space is a best choice.
  Zbud would responsible for keeping consistency
  between mentioned indexes.
  
  Regards,
  Tomasz Stanislawski
  
   Seth
   
  
   Regards,
   Tomasz Stanislawski
  
  
   --
   To unsubscribe, send a message with 'unsubscribe linux-mm' in
   the body to majord...@kvack.org.  For more info on Linux MM,
   see: http://www.linux-mm.org/ .
   Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
  
  
   
   --
   To unsubscribe, send a message with 'unsubscribe linux-mm' in
   the body to majord...@kvack.org.  For more info on Linux MM,
   see: http://www.linux-mm.org/ .
   Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
   
  

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-27 Thread Bob Liu


On 09/28/2013 06:00 AM, Seth Jennings wrote:
> On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote:
>> On 09/25/2013 11:57 PM, Seth Jennings wrote:
>>> On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
> I just had an idea this afternoon to potentially kill both these birds 
> with one
> stone: Replace the rbtree in zswap with an address_space.
>
> Each swap type would have its own page_tree to organize the compressed 
> objects
> by type and offset (radix tree is more suited for this anyway) and a_ops 
> that
> could be called by shrink_page_list() (writepage) or the migration code
> (migratepage).
>
> Then zbud pages could be put on the normal LRU list, maybe at the 
> beginning of
> the inactive LRU so they would live for another cycle through the list, 
> then be
> reclaimed in the normal way with the mapping->a_ops->writepage() pointing 
> to a
> zswap_writepage() function that would decompress the pages and call
> __swap_writepage() on them.
>
> This might actually do away with the explicit pool size too as the 
> compressed
> pool pages wouldn't be outside the control of the MM anymore.
>
> I'm just starting to explore this but I think it has promise.
>
> Seth
>

 Hi Seth,
 There is a problem with the proposed idea.
 The radix tree used 'struct address_space' is a part of
 a bigger data structure.
 The radix tree is used to translate an offset to a page.
 That is ok for zswap. But struct page has a field named 'index'.
 The MM assumes that this index is an offset in radix tree
 where one can find the page. A lot is done by MM to sustain
 this consistency.
>>>
>>> Yes, this is how it is for page cache pages.  However, the MM is able to
>>> work differently with anonymous pages.  In the case of an anonymous
>>> page, the mapping field points to an anon_vma struct, or, if ksm in
>>> enabled and dedup'ing the page, a private ksm tracking structure.  If
>>> the anonymous page is fully unmapped and resides only in the swap cache,
>>> the page mapping is NULL.  So there is precedent for the fields to mean
>>> other things.
>>
>> Hi Seth,
>> You are right that page->mapping is NULL for pages in swap_cache but
>> page_mapping() is not NULL in such a case. The mapping is taken from
>> struct address_space swapper_spaces[]. It is still an address space,
>> and it should preserve constraints for struct address_space.
>> The same happen for page->index and page_index().
>>
>>>
>>> The question is how to mark and identify zbud pages among the other page
>>> types that will be on the LRU.  There are many ways.  The question is
>>> what is the best and most acceptable way.
>>>
>>
>> If you consider hacking I have some idea how address_space could utilized 
>> for ZBUD.
>> One solution whould be using tags in a radix tree. Every entry in a radix 
>> tree
>> can have a few bits assigned to it. Currently 3 bits are supported:
>>
>> From include/linux/fs.h
>> #define PAGECACHE_TAG_DIRTY  0
>> #define PAGECACHE_TAG_WRITEBACK  1
>> #define PAGECACHE_TAG_TOWRITE2
>>
>> You could add a new bit or utilize one of existing ones.
>>
>> The other idea is use a trick from a RB trees and scatter-gather lists.
>> I mean using the last bits of pointers to keep some metadata.
>> Values of 'struct page *' variables are aligned to a pointer alignment which 
>> is
>> 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could
>> could use the last bit of page pointer in a radix tree to track if a swap 
>> entry
>> refers to a lower or a higher part of a ZBUD page.
>> I think it is a serious hacking/obfuscation but it may work with the minimal
>> amount of changes to MM. Adding only (x&~3) while extracting page pointer is
>> probably enough.
>>
>> What do you think about this idea?
> 
> I think it is a good one.
> 
> I have to say that when I first came up with the idea, I was thinking
> the address space would be at the zswap layer and the radix slots would
> hold zbud handles, not struct page pointers.
> 
> However, as I have discovered today, this is problematic when it comes
> to reclaim and migration and serializing access.
> 
> I wanted to do as much as possible in the zswap layer since anything
> done in the zbud layer would need to be duplicated in any other future
> allocator that zswap wanted to support.
> 
> Unfortunately, zbud abstracts away the struct page and that visibility
> is needed to properly do what we are talking about.
> 
> So maybe it is inevitable that this will need to be in the zbud code
> with the radix tree slots pointing to struct pages after all.
> 

But in this way, zswap_frontswap_load() can't find zswap_entry. We still
need the rbtree in current zswap.

> I like the idea of masking the bit into the struct page pointer to
> indicate which buddy maps to the offset.
> 

I have no idea why 

Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-27 Thread Seth Jennings
On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote:
> On 09/25/2013 11:57 PM, Seth Jennings wrote:
> > On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
> >>> I just had an idea this afternoon to potentially kill both these birds 
> >>> with one
> >>> stone: Replace the rbtree in zswap with an address_space.
> >>>
> >>> Each swap type would have its own page_tree to organize the compressed 
> >>> objects
> >>> by type and offset (radix tree is more suited for this anyway) and a_ops 
> >>> that
> >>> could be called by shrink_page_list() (writepage) or the migration code
> >>> (migratepage).
> >>>
> >>> Then zbud pages could be put on the normal LRU list, maybe at the 
> >>> beginning of
> >>> the inactive LRU so they would live for another cycle through the list, 
> >>> then be
> >>> reclaimed in the normal way with the mapping->a_ops->writepage() pointing 
> >>> to a
> >>> zswap_writepage() function that would decompress the pages and call
> >>> __swap_writepage() on them.
> >>>
> >>> This might actually do away with the explicit pool size too as the 
> >>> compressed
> >>> pool pages wouldn't be outside the control of the MM anymore.
> >>>
> >>> I'm just starting to explore this but I think it has promise.
> >>>
> >>> Seth
> >>>
> >>
> >> Hi Seth,
> >> There is a problem with the proposed idea.
> >> The radix tree used 'struct address_space' is a part of
> >> a bigger data structure.
> >> The radix tree is used to translate an offset to a page.
> >> That is ok for zswap. But struct page has a field named 'index'.
> >> The MM assumes that this index is an offset in radix tree
> >> where one can find the page. A lot is done by MM to sustain
> >> this consistency.
> > 
> > Yes, this is how it is for page cache pages.  However, the MM is able to
> > work differently with anonymous pages.  In the case of an anonymous
> > page, the mapping field points to an anon_vma struct, or, if ksm in
> > enabled and dedup'ing the page, a private ksm tracking structure.  If
> > the anonymous page is fully unmapped and resides only in the swap cache,
> > the page mapping is NULL.  So there is precedent for the fields to mean
> > other things.
> 
> Hi Seth,
> You are right that page->mapping is NULL for pages in swap_cache but
> page_mapping() is not NULL in such a case. The mapping is taken from
> struct address_space swapper_spaces[]. It is still an address space,
> and it should preserve constraints for struct address_space.
> The same happen for page->index and page_index().
> 
> > 
> > The question is how to mark and identify zbud pages among the other page
> > types that will be on the LRU.  There are many ways.  The question is
> > what is the best and most acceptable way.
> > 
> 
> If you consider hacking I have some idea how address_space could utilized for 
> ZBUD.
> One solution whould be using tags in a radix tree. Every entry in a radix tree
> can have a few bits assigned to it. Currently 3 bits are supported:
> 
> From include/linux/fs.h
> #define PAGECACHE_TAG_DIRTY  0
> #define PAGECACHE_TAG_WRITEBACK  1
> #define PAGECACHE_TAG_TOWRITE2
> 
> You could add a new bit or utilize one of existing ones.
> 
> The other idea is use a trick from a RB trees and scatter-gather lists.
> I mean using the last bits of pointers to keep some metadata.
> Values of 'struct page *' variables are aligned to a pointer alignment which 
> is
> 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could
> could use the last bit of page pointer in a radix tree to track if a swap 
> entry
> refers to a lower or a higher part of a ZBUD page.
> I think it is a serious hacking/obfuscation but it may work with the minimal
> amount of changes to MM. Adding only (x&~3) while extracting page pointer is
> probably enough.
> 
> What do you think about this idea?

I think it is a good one.

I have to say that when I first came up with the idea, I was thinking
the address space would be at the zswap layer and the radix slots would
hold zbud handles, not struct page pointers.

However, as I have discovered today, this is problematic when it comes
to reclaim and migration and serializing access.

I wanted to do as much as possible in the zswap layer since anything
done in the zbud layer would need to be duplicated in any other future
allocator that zswap wanted to support.

Unfortunately, zbud abstracts away the struct page and that visibility
is needed to properly do what we are talking about.

So maybe it is inevitable that this will need to be in the zbud code
with the radix tree slots pointing to struct pages after all.

I like the idea of masking the bit into the struct page pointer to
indicate which buddy maps to the offset.

There is a twist here in that, unlike a normal page cache tree, we can
have two offsets pointing at different buddies in the same frame
which means we'll have to do some custom stuff for migration.

The rabbit hole I was going down today has come to an 

Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-27 Thread Tomasz Stanislawski
On 09/25/2013 11:57 PM, Seth Jennings wrote:
> On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
>>> I just had an idea this afternoon to potentially kill both these birds with 
>>> one
>>> stone: Replace the rbtree in zswap with an address_space.
>>>
>>> Each swap type would have its own page_tree to organize the compressed 
>>> objects
>>> by type and offset (radix tree is more suited for this anyway) and a_ops 
>>> that
>>> could be called by shrink_page_list() (writepage) or the migration code
>>> (migratepage).
>>>
>>> Then zbud pages could be put on the normal LRU list, maybe at the beginning 
>>> of
>>> the inactive LRU so they would live for another cycle through the list, 
>>> then be
>>> reclaimed in the normal way with the mapping->a_ops->writepage() pointing 
>>> to a
>>> zswap_writepage() function that would decompress the pages and call
>>> __swap_writepage() on them.
>>>
>>> This might actually do away with the explicit pool size too as the 
>>> compressed
>>> pool pages wouldn't be outside the control of the MM anymore.
>>>
>>> I'm just starting to explore this but I think it has promise.
>>>
>>> Seth
>>>
>>
>> Hi Seth,
>> There is a problem with the proposed idea.
>> The radix tree used 'struct address_space' is a part of
>> a bigger data structure.
>> The radix tree is used to translate an offset to a page.
>> That is ok for zswap. But struct page has a field named 'index'.
>> The MM assumes that this index is an offset in radix tree
>> where one can find the page. A lot is done by MM to sustain
>> this consistency.
> 
> Yes, this is how it is for page cache pages.  However, the MM is able to
> work differently with anonymous pages.  In the case of an anonymous
> page, the mapping field points to an anon_vma struct, or, if ksm in
> enabled and dedup'ing the page, a private ksm tracking structure.  If
> the anonymous page is fully unmapped and resides only in the swap cache,
> the page mapping is NULL.  So there is precedent for the fields to mean
> other things.

Hi Seth,
You are right that page->mapping is NULL for pages in swap_cache but
page_mapping() is not NULL in such a case. The mapping is taken from
struct address_space swapper_spaces[]. It is still an address space,
and it should preserve constraints for struct address_space.
The same happen for page->index and page_index().

> 
> The question is how to mark and identify zbud pages among the other page
> types that will be on the LRU.  There are many ways.  The question is
> what is the best and most acceptable way.
> 

If you consider hacking I have some idea how address_space could utilized for 
ZBUD.
One solution whould be using tags in a radix tree. Every entry in a radix tree
can have a few bits assigned to it. Currently 3 bits are supported:

>From include/linux/fs.h
#define PAGECACHE_TAG_DIRTY  0
#define PAGECACHE_TAG_WRITEBACK  1
#define PAGECACHE_TAG_TOWRITE2

You could add a new bit or utilize one of existing ones.

The other idea is use a trick from a RB trees and scatter-gather lists.
I mean using the last bits of pointers to keep some metadata.
Values of 'struct page *' variables are aligned to a pointer alignment which is
4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could
could use the last bit of page pointer in a radix tree to track if a swap entry
refers to a lower or a higher part of a ZBUD page.
I think it is a serious hacking/obfuscation but it may work with the minimal
amount of changes to MM. Adding only (x&~3) while extracting page pointer is
probably enough.

What do you think about this idea?

>>
>> In case of zbud, there are two swap offset pointing to
>> the same page. There might be more if zsmalloc is used.
>> What is worse it is possible that one swap entry could
>> point to data that cross a page boundary.
> 
> We just won't set page->index since it doesn't have a good meaning in
> our case.  Swap cache pages also don't use index, although is seems to
> me that they could since there is a 1:1 mapping of a swap cache page to
> a swap offset and the index field isn't being used for anything else.
> But I digress...

OK.

> 
>>
>> Of course, one could try to modify MM to support
>> multiple mapping of a page in the radix tree.
>> But I think that MM guys will consider this as a hack
>> and they will not accept it.
> 
> Yes, it will require some changes to the MM to handle zbud pages on the
> LRU.  I'm thinking that it won't be too intrusive, depending on how we
> choose to mark zbud pages.
> 

Anyway, I think that zswap should use two index engines.
I mean index in Data Base meaning.
One index is used to translate swap_entry to compressed page.
And another one to be used by reclaim and migration by MM,
probably address_space is a best choice.
Zbud would responsible for keeping consistency
between mentioned indexes.

Regards,
Tomasz Stanislawski

> Seth
> 
>>
>> Regards,
>> Tomasz Stanislawski
>>
>>
>>> --
>>> To unsubscribe, send a message with 

Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-27 Thread Tomasz Stanislawski
On 09/25/2013 11:57 PM, Seth Jennings wrote:
 On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
 I just had an idea this afternoon to potentially kill both these birds with 
 one
 stone: Replace the rbtree in zswap with an address_space.

 Each swap type would have its own page_tree to organize the compressed 
 objects
 by type and offset (radix tree is more suited for this anyway) and a_ops 
 that
 could be called by shrink_page_list() (writepage) or the migration code
 (migratepage).

 Then zbud pages could be put on the normal LRU list, maybe at the beginning 
 of
 the inactive LRU so they would live for another cycle through the list, 
 then be
 reclaimed in the normal way with the mapping-a_ops-writepage() pointing 
 to a
 zswap_writepage() function that would decompress the pages and call
 __swap_writepage() on them.

 This might actually do away with the explicit pool size too as the 
 compressed
 pool pages wouldn't be outside the control of the MM anymore.

 I'm just starting to explore this but I think it has promise.

 Seth


 Hi Seth,
 There is a problem with the proposed idea.
 The radix tree used 'struct address_space' is a part of
 a bigger data structure.
 The radix tree is used to translate an offset to a page.
 That is ok for zswap. But struct page has a field named 'index'.
 The MM assumes that this index is an offset in radix tree
 where one can find the page. A lot is done by MM to sustain
 this consistency.
 
 Yes, this is how it is for page cache pages.  However, the MM is able to
 work differently with anonymous pages.  In the case of an anonymous
 page, the mapping field points to an anon_vma struct, or, if ksm in
 enabled and dedup'ing the page, a private ksm tracking structure.  If
 the anonymous page is fully unmapped and resides only in the swap cache,
 the page mapping is NULL.  So there is precedent for the fields to mean
 other things.

Hi Seth,
You are right that page-mapping is NULL for pages in swap_cache but
page_mapping() is not NULL in such a case. The mapping is taken from
struct address_space swapper_spaces[]. It is still an address space,
and it should preserve constraints for struct address_space.
The same happen for page-index and page_index().

 
 The question is how to mark and identify zbud pages among the other page
 types that will be on the LRU.  There are many ways.  The question is
 what is the best and most acceptable way.
 

If you consider hacking I have some idea how address_space could utilized for 
ZBUD.
One solution whould be using tags in a radix tree. Every entry in a radix tree
can have a few bits assigned to it. Currently 3 bits are supported:

From include/linux/fs.h
#define PAGECACHE_TAG_DIRTY  0
#define PAGECACHE_TAG_WRITEBACK  1
#define PAGECACHE_TAG_TOWRITE2

You could add a new bit or utilize one of existing ones.

The other idea is use a trick from a RB trees and scatter-gather lists.
I mean using the last bits of pointers to keep some metadata.
Values of 'struct page *' variables are aligned to a pointer alignment which is
4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could
could use the last bit of page pointer in a radix tree to track if a swap entry
refers to a lower or a higher part of a ZBUD page.
I think it is a serious hacking/obfuscation but it may work with the minimal
amount of changes to MM. Adding only (x~3) while extracting page pointer is
probably enough.

What do you think about this idea?


 In case of zbud, there are two swap offset pointing to
 the same page. There might be more if zsmalloc is used.
 What is worse it is possible that one swap entry could
 point to data that cross a page boundary.
 
 We just won't set page-index since it doesn't have a good meaning in
 our case.  Swap cache pages also don't use index, although is seems to
 me that they could since there is a 1:1 mapping of a swap cache page to
 a swap offset and the index field isn't being used for anything else.
 But I digress...

OK.

 

 Of course, one could try to modify MM to support
 multiple mapping of a page in the radix tree.
 But I think that MM guys will consider this as a hack
 and they will not accept it.
 
 Yes, it will require some changes to the MM to handle zbud pages on the
 LRU.  I'm thinking that it won't be too intrusive, depending on how we
 choose to mark zbud pages.
 

Anyway, I think that zswap should use two index engines.
I mean index in Data Base meaning.
One index is used to translate swap_entry to compressed page.
And another one to be used by reclaim and migration by MM,
probably address_space is a best choice.
Zbud would responsible for keeping consistency
between mentioned indexes.

Regards,
Tomasz Stanislawski

 Seth
 

 Regards,
 Tomasz Stanislawski


 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; 

Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-27 Thread Seth Jennings
On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote:
 On 09/25/2013 11:57 PM, Seth Jennings wrote:
  On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
  I just had an idea this afternoon to potentially kill both these birds 
  with one
  stone: Replace the rbtree in zswap with an address_space.
 
  Each swap type would have its own page_tree to organize the compressed 
  objects
  by type and offset (radix tree is more suited for this anyway) and a_ops 
  that
  could be called by shrink_page_list() (writepage) or the migration code
  (migratepage).
 
  Then zbud pages could be put on the normal LRU list, maybe at the 
  beginning of
  the inactive LRU so they would live for another cycle through the list, 
  then be
  reclaimed in the normal way with the mapping-a_ops-writepage() pointing 
  to a
  zswap_writepage() function that would decompress the pages and call
  __swap_writepage() on them.
 
  This might actually do away with the explicit pool size too as the 
  compressed
  pool pages wouldn't be outside the control of the MM anymore.
 
  I'm just starting to explore this but I think it has promise.
 
  Seth
 
 
  Hi Seth,
  There is a problem with the proposed idea.
  The radix tree used 'struct address_space' is a part of
  a bigger data structure.
  The radix tree is used to translate an offset to a page.
  That is ok for zswap. But struct page has a field named 'index'.
  The MM assumes that this index is an offset in radix tree
  where one can find the page. A lot is done by MM to sustain
  this consistency.
  
  Yes, this is how it is for page cache pages.  However, the MM is able to
  work differently with anonymous pages.  In the case of an anonymous
  page, the mapping field points to an anon_vma struct, or, if ksm in
  enabled and dedup'ing the page, a private ksm tracking structure.  If
  the anonymous page is fully unmapped and resides only in the swap cache,
  the page mapping is NULL.  So there is precedent for the fields to mean
  other things.
 
 Hi Seth,
 You are right that page-mapping is NULL for pages in swap_cache but
 page_mapping() is not NULL in such a case. The mapping is taken from
 struct address_space swapper_spaces[]. It is still an address space,
 and it should preserve constraints for struct address_space.
 The same happen for page-index and page_index().
 
  
  The question is how to mark and identify zbud pages among the other page
  types that will be on the LRU.  There are many ways.  The question is
  what is the best and most acceptable way.
  
 
 If you consider hacking I have some idea how address_space could utilized for 
 ZBUD.
 One solution whould be using tags in a radix tree. Every entry in a radix tree
 can have a few bits assigned to it. Currently 3 bits are supported:
 
 From include/linux/fs.h
 #define PAGECACHE_TAG_DIRTY  0
 #define PAGECACHE_TAG_WRITEBACK  1
 #define PAGECACHE_TAG_TOWRITE2
 
 You could add a new bit or utilize one of existing ones.
 
 The other idea is use a trick from a RB trees and scatter-gather lists.
 I mean using the last bits of pointers to keep some metadata.
 Values of 'struct page *' variables are aligned to a pointer alignment which 
 is
 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could
 could use the last bit of page pointer in a radix tree to track if a swap 
 entry
 refers to a lower or a higher part of a ZBUD page.
 I think it is a serious hacking/obfuscation but it may work with the minimal
 amount of changes to MM. Adding only (x~3) while extracting page pointer is
 probably enough.
 
 What do you think about this idea?

I think it is a good one.

I have to say that when I first came up with the idea, I was thinking
the address space would be at the zswap layer and the radix slots would
hold zbud handles, not struct page pointers.

However, as I have discovered today, this is problematic when it comes
to reclaim and migration and serializing access.

I wanted to do as much as possible in the zswap layer since anything
done in the zbud layer would need to be duplicated in any other future
allocator that zswap wanted to support.

Unfortunately, zbud abstracts away the struct page and that visibility
is needed to properly do what we are talking about.

So maybe it is inevitable that this will need to be in the zbud code
with the radix tree slots pointing to struct pages after all.

I like the idea of masking the bit into the struct page pointer to
indicate which buddy maps to the offset.

There is a twist here in that, unlike a normal page cache tree, we can
have two offsets pointing at different buddies in the same frame
which means we'll have to do some custom stuff for migration.

The rabbit hole I was going down today has come to an end so I'll take a
fresh look next week.

Thanks for your ideas and discussion! Maybe we can make zswap/zbud an
upstanding MM citizen yet!

Seth

 
 
  In case of zbud, there are two swap offset pointing to
  

Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-27 Thread Bob Liu


On 09/28/2013 06:00 AM, Seth Jennings wrote:
 On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote:
 On 09/25/2013 11:57 PM, Seth Jennings wrote:
 On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
 I just had an idea this afternoon to potentially kill both these birds 
 with one
 stone: Replace the rbtree in zswap with an address_space.

 Each swap type would have its own page_tree to organize the compressed 
 objects
 by type and offset (radix tree is more suited for this anyway) and a_ops 
 that
 could be called by shrink_page_list() (writepage) or the migration code
 (migratepage).

 Then zbud pages could be put on the normal LRU list, maybe at the 
 beginning of
 the inactive LRU so they would live for another cycle through the list, 
 then be
 reclaimed in the normal way with the mapping-a_ops-writepage() pointing 
 to a
 zswap_writepage() function that would decompress the pages and call
 __swap_writepage() on them.

 This might actually do away with the explicit pool size too as the 
 compressed
 pool pages wouldn't be outside the control of the MM anymore.

 I'm just starting to explore this but I think it has promise.

 Seth


 Hi Seth,
 There is a problem with the proposed idea.
 The radix tree used 'struct address_space' is a part of
 a bigger data structure.
 The radix tree is used to translate an offset to a page.
 That is ok for zswap. But struct page has a field named 'index'.
 The MM assumes that this index is an offset in radix tree
 where one can find the page. A lot is done by MM to sustain
 this consistency.

 Yes, this is how it is for page cache pages.  However, the MM is able to
 work differently with anonymous pages.  In the case of an anonymous
 page, the mapping field points to an anon_vma struct, or, if ksm in
 enabled and dedup'ing the page, a private ksm tracking structure.  If
 the anonymous page is fully unmapped and resides only in the swap cache,
 the page mapping is NULL.  So there is precedent for the fields to mean
 other things.

 Hi Seth,
 You are right that page-mapping is NULL for pages in swap_cache but
 page_mapping() is not NULL in such a case. The mapping is taken from
 struct address_space swapper_spaces[]. It is still an address space,
 and it should preserve constraints for struct address_space.
 The same happen for page-index and page_index().


 The question is how to mark and identify zbud pages among the other page
 types that will be on the LRU.  There are many ways.  The question is
 what is the best and most acceptable way.


 If you consider hacking I have some idea how address_space could utilized 
 for ZBUD.
 One solution whould be using tags in a radix tree. Every entry in a radix 
 tree
 can have a few bits assigned to it. Currently 3 bits are supported:

 From include/linux/fs.h
 #define PAGECACHE_TAG_DIRTY  0
 #define PAGECACHE_TAG_WRITEBACK  1
 #define PAGECACHE_TAG_TOWRITE2

 You could add a new bit or utilize one of existing ones.

 The other idea is use a trick from a RB trees and scatter-gather lists.
 I mean using the last bits of pointers to keep some metadata.
 Values of 'struct page *' variables are aligned to a pointer alignment which 
 is
 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could
 could use the last bit of page pointer in a radix tree to track if a swap 
 entry
 refers to a lower or a higher part of a ZBUD page.
 I think it is a serious hacking/obfuscation but it may work with the minimal
 amount of changes to MM. Adding only (x~3) while extracting page pointer is
 probably enough.

 What do you think about this idea?
 
 I think it is a good one.
 
 I have to say that when I first came up with the idea, I was thinking
 the address space would be at the zswap layer and the radix slots would
 hold zbud handles, not struct page pointers.
 
 However, as I have discovered today, this is problematic when it comes
 to reclaim and migration and serializing access.
 
 I wanted to do as much as possible in the zswap layer since anything
 done in the zbud layer would need to be duplicated in any other future
 allocator that zswap wanted to support.
 
 Unfortunately, zbud abstracts away the struct page and that visibility
 is needed to properly do what we are talking about.
 
 So maybe it is inevitable that this will need to be in the zbud code
 with the radix tree slots pointing to struct pages after all.
 

But in this way, zswap_frontswap_load() can't find zswap_entry. We still
need the rbtree in current zswap.

 I like the idea of masking the bit into the struct page pointer to
 indicate which buddy maps to the offset.
 

I have no idea why we need this.
My idea is connect zbud page with a address space and add zbud page to
LRU list only without any radix tree.

zswap_entry can be still in rbtree or maybe changed to radix tree.
There is a sample code in my previous email.

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line unsubscribe 

Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-25 Thread Seth Jennings
On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
> > I just had an idea this afternoon to potentially kill both these birds with 
> > one
> > stone: Replace the rbtree in zswap with an address_space.
> > 
> > Each swap type would have its own page_tree to organize the compressed 
> > objects
> > by type and offset (radix tree is more suited for this anyway) and a_ops 
> > that
> > could be called by shrink_page_list() (writepage) or the migration code
> > (migratepage).
> > 
> > Then zbud pages could be put on the normal LRU list, maybe at the beginning 
> > of
> > the inactive LRU so they would live for another cycle through the list, 
> > then be
> > reclaimed in the normal way with the mapping->a_ops->writepage() pointing 
> > to a
> > zswap_writepage() function that would decompress the pages and call
> > __swap_writepage() on them.
> > 
> > This might actually do away with the explicit pool size too as the 
> > compressed
> > pool pages wouldn't be outside the control of the MM anymore.
> > 
> > I'm just starting to explore this but I think it has promise.
> > 
> > Seth
> > 
> 
> Hi Seth,
> There is a problem with the proposed idea.
> The radix tree used 'struct address_space' is a part of
> a bigger data structure.
> The radix tree is used to translate an offset to a page.
> That is ok for zswap. But struct page has a field named 'index'.
> The MM assumes that this index is an offset in radix tree
> where one can find the page. A lot is done by MM to sustain
> this consistency.

Yes, this is how it is for page cache pages.  However, the MM is able to
work differently with anonymous pages.  In the case of an anonymous
page, the mapping field points to an anon_vma struct, or, if ksm in
enabled and dedup'ing the page, a private ksm tracking structure.  If
the anonymous page is fully unmapped and resides only in the swap cache,
the page mapping is NULL.  So there is precedent for the fields to mean
other things.

The question is how to mark and identify zbud pages among the other page
types that will be on the LRU.  There are many ways.  The question is
what is the best and most acceptable way.

> 
> In case of zbud, there are two swap offset pointing to
> the same page. There might be more if zsmalloc is used.
> What is worse it is possible that one swap entry could
> point to data that cross a page boundary.

We just won't set page->index since it doesn't have a good meaning in
our case.  Swap cache pages also don't use index, although is seems to
me that they could since there is a 1:1 mapping of a swap cache page to
a swap offset and the index field isn't being used for anything else.
But I digress...

> 
> Of course, one could try to modify MM to support
> multiple mapping of a page in the radix tree.
> But I think that MM guys will consider this as a hack
> and they will not accept it.

Yes, it will require some changes to the MM to handle zbud pages on the
LRU.  I'm thinking that it won't be too intrusive, depending on how we
choose to mark zbud pages.

Seth

> 
> Regards,
> Tomasz Stanislawski
> 
> 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majord...@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-25 Thread Tomasz Stanislawski
> I just had an idea this afternoon to potentially kill both these birds with 
> one
> stone: Replace the rbtree in zswap with an address_space.
> 
> Each swap type would have its own page_tree to organize the compressed objects
> by type and offset (radix tree is more suited for this anyway) and a_ops that
> could be called by shrink_page_list() (writepage) or the migration code
> (migratepage).
> 
> Then zbud pages could be put on the normal LRU list, maybe at the beginning of
> the inactive LRU so they would live for another cycle through the list, then 
> be
> reclaimed in the normal way with the mapping->a_ops->writepage() pointing to a
> zswap_writepage() function that would decompress the pages and call
> __swap_writepage() on them.
> 
> This might actually do away with the explicit pool size too as the compressed
> pool pages wouldn't be outside the control of the MM anymore.
> 
> I'm just starting to explore this but I think it has promise.
> 
> Seth
> 

Hi Seth,
There is a problem with the proposed idea.
The radix tree used 'struct address_space' is a part of
a bigger data structure.
The radix tree is used to translate an offset to a page.
That is ok for zswap. But struct page has a field named 'index'.
The MM assumes that this index is an offset in radix tree
where one can find the page. A lot is done by MM to sustain
this consistency.

In case of zbud, there are two swap offset pointing to
the same page. There might be more if zsmalloc is used.
What is worse it is possible that one swap entry could
point to data that cross a page boundary.

Of course, one could try to modify MM to support
multiple mapping of a page in the radix tree.
But I think that MM guys will consider this as a hack
and they will not accept it.

Regards,
Tomasz Stanislawski


> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-25 Thread Tomasz Stanislawski
 I just had an idea this afternoon to potentially kill both these birds with 
 one
 stone: Replace the rbtree in zswap with an address_space.
 
 Each swap type would have its own page_tree to organize the compressed objects
 by type and offset (radix tree is more suited for this anyway) and a_ops that
 could be called by shrink_page_list() (writepage) or the migration code
 (migratepage).
 
 Then zbud pages could be put on the normal LRU list, maybe at the beginning of
 the inactive LRU so they would live for another cycle through the list, then 
 be
 reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a
 zswap_writepage() function that would decompress the pages and call
 __swap_writepage() on them.
 
 This might actually do away with the explicit pool size too as the compressed
 pool pages wouldn't be outside the control of the MM anymore.
 
 I'm just starting to explore this but I think it has promise.
 
 Seth
 

Hi Seth,
There is a problem with the proposed idea.
The radix tree used 'struct address_space' is a part of
a bigger data structure.
The radix tree is used to translate an offset to a page.
That is ok for zswap. But struct page has a field named 'index'.
The MM assumes that this index is an offset in radix tree
where one can find the page. A lot is done by MM to sustain
this consistency.

In case of zbud, there are two swap offset pointing to
the same page. There might be more if zsmalloc is used.
What is worse it is possible that one swap entry could
point to data that cross a page boundary.

Of course, one could try to modify MM to support
multiple mapping of a page in the radix tree.
But I think that MM guys will consider this as a hack
and they will not accept it.

Regards,
Tomasz Stanislawski


 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-25 Thread Seth Jennings
On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
  I just had an idea this afternoon to potentially kill both these birds with 
  one
  stone: Replace the rbtree in zswap with an address_space.
  
  Each swap type would have its own page_tree to organize the compressed 
  objects
  by type and offset (radix tree is more suited for this anyway) and a_ops 
  that
  could be called by shrink_page_list() (writepage) or the migration code
  (migratepage).
  
  Then zbud pages could be put on the normal LRU list, maybe at the beginning 
  of
  the inactive LRU so they would live for another cycle through the list, 
  then be
  reclaimed in the normal way with the mapping-a_ops-writepage() pointing 
  to a
  zswap_writepage() function that would decompress the pages and call
  __swap_writepage() on them.
  
  This might actually do away with the explicit pool size too as the 
  compressed
  pool pages wouldn't be outside the control of the MM anymore.
  
  I'm just starting to explore this but I think it has promise.
  
  Seth
  
 
 Hi Seth,
 There is a problem with the proposed idea.
 The radix tree used 'struct address_space' is a part of
 a bigger data structure.
 The radix tree is used to translate an offset to a page.
 That is ok for zswap. But struct page has a field named 'index'.
 The MM assumes that this index is an offset in radix tree
 where one can find the page. A lot is done by MM to sustain
 this consistency.

Yes, this is how it is for page cache pages.  However, the MM is able to
work differently with anonymous pages.  In the case of an anonymous
page, the mapping field points to an anon_vma struct, or, if ksm in
enabled and dedup'ing the page, a private ksm tracking structure.  If
the anonymous page is fully unmapped and resides only in the swap cache,
the page mapping is NULL.  So there is precedent for the fields to mean
other things.

The question is how to mark and identify zbud pages among the other page
types that will be on the LRU.  There are many ways.  The question is
what is the best and most acceptable way.

 
 In case of zbud, there are two swap offset pointing to
 the same page. There might be more if zsmalloc is used.
 What is worse it is possible that one swap entry could
 point to data that cross a page boundary.

We just won't set page-index since it doesn't have a good meaning in
our case.  Swap cache pages also don't use index, although is seems to
me that they could since there is a 1:1 mapping of a swap cache page to
a swap offset and the index field isn't being used for anything else.
But I digress...

 
 Of course, one could try to modify MM to support
 multiple mapping of a page in the radix tree.
 But I think that MM guys will consider this as a hack
 and they will not accept it.

Yes, it will require some changes to the MM to handle zbud pages on the
LRU.  I'm thinking that it won't be too intrusive, depending on how we
choose to mark zbud pages.

Seth

 
 Regards,
 Tomasz Stanislawski
 
 
  --
  To unsubscribe, send a message with 'unsubscribe linux-mm' in
  the body to majord...@kvack.org.  For more info on Linux MM,
  see: http://www.linux-mm.org/ .
  Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
  
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-24 Thread Bob Liu
On Tue, Sep 24, 2013 at 5:20 PM, Krzysztof Kozlowski
 wrote:
> Hi,
>
> On pon, 2013-09-23 at 17:07 -0500, Seth Jennings wrote:
>> On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote:
>> > Mel mentioned several problems about zswap/zbud in thread "[PATCH v6
>> > 0/5] zram/zsmalloc promotion".
>> >
>> > Like "it's clunky as hell and the layering between zswap and zbud is
>> > twisty" and "I think I brought up its stalling behaviour during review
>> > when it was being merged. It would have been preferable if writeback
>> > could be initiated in batches and then waited on at the very least..
>> >  It's worse that it uses _swap_writepage directly instead of going
>> > through a writepage ops.  It would have been better if zbud pages
>> > existed on the LRU and written back with an address space ops and
>> > properly handled asynchonous writeback."
>> >
>> > So I think it would be better if we can address those issues at first
>> > and it would be easier to address these issues before adding more new
>> > features. Welcome any ideas.
>>
>> I just had an idea this afternoon to potentially kill both these birds with 
>> one
>> stone: Replace the rbtree in zswap with an address_space.
>>
>> Each swap type would have its own page_tree to organize the compressed 
>> objects
>> by type and offset (radix tree is more suited for this anyway) and a_ops that
>> could be called by shrink_page_list() (writepage) or the migration code
>> (migratepage).
>>
>> Then zbud pages could be put on the normal LRU list, maybe at the beginning 
>> of
>> the inactive LRU so they would live for another cycle through the list, then 
>> be
>> reclaimed in the normal way with the mapping->a_ops->writepage() pointing to 
>> a
>> zswap_writepage() function that would decompress the pages and call
>> __swap_writepage() on them.
>
> How exactly the address space can be used here? Do you want to point to
> zbud pages in address_space.page_tree? If yes then which index should be
> used?
>

I didn't get the point neither. I think introduce address_space is enough.
1. zbud.c:
static struct address_space_operations zbud_aops = {
.writepage= zswap_write_page,
};
struct address_space zbud_space = {
.a_ops = _aops,
};

zbud_alloc() {
zbud_page = alloc_page();
zbud_page->mapping = (struct address_space *)_space;
set_page_private(page, (unsigned long)pool);
lru_add_anon(zbud_page);
}

2. zswap.c
static int zswap_writepage(struct page *page, struct writeback_control *wbc)
{
handle = encode_handle(page_address(page), FIRST));
zswap_writeback_entry(pool, handle);

handle = encode_handle(page_address(page), LAST));
zswap_writeback_entry(pool, handle);
}

Of course it may need lots of work for core MM subsystem can maintain
zbud pages.
But in this way, we can get rid of the clunky reclaiming layer and
integrate zswap closely with core MM subsystem which knows better how
many zbud pages can be used and when should trigger the zbud pages
reclaim.

-- 
Regards,
--Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-24 Thread Krzysztof Kozlowski
Hi,

On pon, 2013-09-23 at 17:07 -0500, Seth Jennings wrote:
> On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote:
> > Mel mentioned several problems about zswap/zbud in thread "[PATCH v6
> > 0/5] zram/zsmalloc promotion".
> > 
> > Like "it's clunky as hell and the layering between zswap and zbud is
> > twisty" and "I think I brought up its stalling behaviour during review
> > when it was being merged. It would have been preferable if writeback
> > could be initiated in batches and then waited on at the very least..
> >  It's worse that it uses _swap_writepage directly instead of going
> > through a writepage ops.  It would have been better if zbud pages
> > existed on the LRU and written back with an address space ops and
> > properly handled asynchonous writeback."
> > 
> > So I think it would be better if we can address those issues at first
> > and it would be easier to address these issues before adding more new
> > features. Welcome any ideas.
> 
> I just had an idea this afternoon to potentially kill both these birds with 
> one
> stone: Replace the rbtree in zswap with an address_space.
> 
> Each swap type would have its own page_tree to organize the compressed objects
> by type and offset (radix tree is more suited for this anyway) and a_ops that
> could be called by shrink_page_list() (writepage) or the migration code
> (migratepage).
> 
> Then zbud pages could be put on the normal LRU list, maybe at the beginning of
> the inactive LRU so they would live for another cycle through the list, then 
> be
> reclaimed in the normal way with the mapping->a_ops->writepage() pointing to a
> zswap_writepage() function that would decompress the pages and call
> __swap_writepage() on them.

How exactly the address space can be used here? Do you want to point to
zbud pages in address_space.page_tree? If yes then which index should be
used?


Best regards,
Krzysztof



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-24 Thread Krzysztof Kozlowski
Hi,

On pon, 2013-09-23 at 17:07 -0500, Seth Jennings wrote:
 On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote:
  Mel mentioned several problems about zswap/zbud in thread [PATCH v6
  0/5] zram/zsmalloc promotion.
  
  Like it's clunky as hell and the layering between zswap and zbud is
  twisty and I think I brought up its stalling behaviour during review
  when it was being merged. It would have been preferable if writeback
  could be initiated in batches and then waited on at the very least..
   It's worse that it uses _swap_writepage directly instead of going
  through a writepage ops.  It would have been better if zbud pages
  existed on the LRU and written back with an address space ops and
  properly handled asynchonous writeback.
  
  So I think it would be better if we can address those issues at first
  and it would be easier to address these issues before adding more new
  features. Welcome any ideas.
 
 I just had an idea this afternoon to potentially kill both these birds with 
 one
 stone: Replace the rbtree in zswap with an address_space.
 
 Each swap type would have its own page_tree to organize the compressed objects
 by type and offset (radix tree is more suited for this anyway) and a_ops that
 could be called by shrink_page_list() (writepage) or the migration code
 (migratepage).
 
 Then zbud pages could be put on the normal LRU list, maybe at the beginning of
 the inactive LRU so they would live for another cycle through the list, then 
 be
 reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a
 zswap_writepage() function that would decompress the pages and call
 __swap_writepage() on them.

How exactly the address space can be used here? Do you want to point to
zbud pages in address_space.page_tree? If yes then which index should be
used?


Best regards,
Krzysztof



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-24 Thread Bob Liu
On Tue, Sep 24, 2013 at 5:20 PM, Krzysztof Kozlowski
k.kozlow...@samsung.com wrote:
 Hi,

 On pon, 2013-09-23 at 17:07 -0500, Seth Jennings wrote:
 On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote:
  Mel mentioned several problems about zswap/zbud in thread [PATCH v6
  0/5] zram/zsmalloc promotion.
 
  Like it's clunky as hell and the layering between zswap and zbud is
  twisty and I think I brought up its stalling behaviour during review
  when it was being merged. It would have been preferable if writeback
  could be initiated in batches and then waited on at the very least..
   It's worse that it uses _swap_writepage directly instead of going
  through a writepage ops.  It would have been better if zbud pages
  existed on the LRU and written back with an address space ops and
  properly handled asynchonous writeback.
 
  So I think it would be better if we can address those issues at first
  and it would be easier to address these issues before adding more new
  features. Welcome any ideas.

 I just had an idea this afternoon to potentially kill both these birds with 
 one
 stone: Replace the rbtree in zswap with an address_space.

 Each swap type would have its own page_tree to organize the compressed 
 objects
 by type and offset (radix tree is more suited for this anyway) and a_ops that
 could be called by shrink_page_list() (writepage) or the migration code
 (migratepage).

 Then zbud pages could be put on the normal LRU list, maybe at the beginning 
 of
 the inactive LRU so they would live for another cycle through the list, then 
 be
 reclaimed in the normal way with the mapping-a_ops-writepage() pointing to 
 a
 zswap_writepage() function that would decompress the pages and call
 __swap_writepage() on them.

 How exactly the address space can be used here? Do you want to point to
 zbud pages in address_space.page_tree? If yes then which index should be
 used?


I didn't get the point neither. I think introduce address_space is enough.
1. zbud.c:
static struct address_space_operations zbud_aops = {
.writepage= zswap_write_page,
};
struct address_space zbud_space = {
.a_ops = zbud_aops,
};

zbud_alloc() {
zbud_page = alloc_page();
zbud_page-mapping = (struct address_space *)zbud_space;
set_page_private(page, (unsigned long)pool);
lru_add_anon(zbud_page);
}

2. zswap.c
static int zswap_writepage(struct page *page, struct writeback_control *wbc)
{
handle = encode_handle(page_address(page), FIRST));
zswap_writeback_entry(pool, handle);

handle = encode_handle(page_address(page), LAST));
zswap_writeback_entry(pool, handle);
}

Of course it may need lots of work for core MM subsystem can maintain
zbud pages.
But in this way, we can get rid of the clunky reclaiming layer and
integrate zswap closely with core MM subsystem which knows better how
many zbud pages can be used and when should trigger the zbud pages
reclaim.

-- 
Regards,
--Bob
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-23 Thread Seth Jennings
On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote:
> Mel mentioned several problems about zswap/zbud in thread "[PATCH v6
> 0/5] zram/zsmalloc promotion".
> 
> Like "it's clunky as hell and the layering between zswap and zbud is
> twisty" and "I think I brought up its stalling behaviour during review
> when it was being merged. It would have been preferable if writeback
> could be initiated in batches and then waited on at the very least..
>  It's worse that it uses _swap_writepage directly instead of going
> through a writepage ops.  It would have been better if zbud pages
> existed on the LRU and written back with an address space ops and
> properly handled asynchonous writeback."
> 
> So I think it would be better if we can address those issues at first
> and it would be easier to address these issues before adding more new
> features. Welcome any ideas.

I just had an idea this afternoon to potentially kill both these birds with one
stone: Replace the rbtree in zswap with an address_space.

Each swap type would have its own page_tree to organize the compressed objects
by type and offset (radix tree is more suited for this anyway) and a_ops that
could be called by shrink_page_list() (writepage) or the migration code
(migratepage).

Then zbud pages could be put on the normal LRU list, maybe at the beginning of
the inactive LRU so they would live for another cycle through the list, then be
reclaimed in the normal way with the mapping->a_ops->writepage() pointing to a
zswap_writepage() function that would decompress the pages and call
__swap_writepage() on them.

This might actually do away with the explicit pool size too as the compressed
pool pages wouldn't be outside the control of the MM anymore.

I'm just starting to explore this but I think it has promise.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-23 Thread Seth Jennings
On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote:
> Hi Krzysztof,
> 
> On 09/11/2013 04:58 PM, Krzysztof Kozlowski wrote:
> > Hi,
> > 
> > Currently zbud pages are not movable and they cannot be allocated from CMA
> > (Contiguous Memory Allocator) region. These patches add migration of zbud 
> > pages.
> > 
> 
> I agree that the migration of zbud pages is important so that system
> will not enter order-0 page fragmentation and can be helpful for page
> compaction/huge pages etc..
> 
> But after I looked at the [patch 4/5], I found it will make zbud very
> complicated.
> I'd prefer to add this migration feature later until current version
> zswap/zbud becomes better enough and more stable.

I agree with this.  We are also looking to add zsmalloc as an option too.  It
would be nice to come up with a solution that worked for both (any) allocator
that zswap used.

> 
> Mel mentioned several problems about zswap/zbud in thread "[PATCH v6
> 0/5] zram/zsmalloc promotion".
> 
> Like "it's clunky as hell and the layering between zswap and zbud is
> twisty" and "I think I brought up its stalling behaviour during review
> when it was being merged. It would have been preferable if writeback
> could be initiated in batches and then waited on at the very least..
>  It's worse that it uses _swap_writepage directly instead of going
> through a writepage ops.  It would have been better if zbud pages
> existed on the LRU and written back with an address space ops and
> properly handled asynchonous writeback."

Yes, the laying in zswap vs zbud is wonky and should be addressed before adding
new layers.

> 
> So I think it would be better if we can address those issues at first
> and it would be easier to address these issues before adding more new
> features. Welcome any ideas.

Agreed.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-23 Thread Seth Jennings
On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote:
 Hi Krzysztof,
 
 On 09/11/2013 04:58 PM, Krzysztof Kozlowski wrote:
  Hi,
  
  Currently zbud pages are not movable and they cannot be allocated from CMA
  (Contiguous Memory Allocator) region. These patches add migration of zbud 
  pages.
  
 
 I agree that the migration of zbud pages is important so that system
 will not enter order-0 page fragmentation and can be helpful for page
 compaction/huge pages etc..
 
 But after I looked at the [patch 4/5], I found it will make zbud very
 complicated.
 I'd prefer to add this migration feature later until current version
 zswap/zbud becomes better enough and more stable.

I agree with this.  We are also looking to add zsmalloc as an option too.  It
would be nice to come up with a solution that worked for both (any) allocator
that zswap used.

 
 Mel mentioned several problems about zswap/zbud in thread [PATCH v6
 0/5] zram/zsmalloc promotion.
 
 Like it's clunky as hell and the layering between zswap and zbud is
 twisty and I think I brought up its stalling behaviour during review
 when it was being merged. It would have been preferable if writeback
 could be initiated in batches and then waited on at the very least..
  It's worse that it uses _swap_writepage directly instead of going
 through a writepage ops.  It would have been better if zbud pages
 existed on the LRU and written back with an address space ops and
 properly handled asynchonous writeback.

Yes, the laying in zswap vs zbud is wonky and should be addressed before adding
new layers.

 
 So I think it would be better if we can address those issues at first
 and it would be easier to address these issues before adding more new
 features. Welcome any ideas.

Agreed.

Seth

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-23 Thread Seth Jennings
On Tue, Sep 17, 2013 at 02:59:24PM +0800, Bob Liu wrote:
 Mel mentioned several problems about zswap/zbud in thread [PATCH v6
 0/5] zram/zsmalloc promotion.
 
 Like it's clunky as hell and the layering between zswap and zbud is
 twisty and I think I brought up its stalling behaviour during review
 when it was being merged. It would have been preferable if writeback
 could be initiated in batches and then waited on at the very least..
  It's worse that it uses _swap_writepage directly instead of going
 through a writepage ops.  It would have been better if zbud pages
 existed on the LRU and written back with an address space ops and
 properly handled asynchonous writeback.
 
 So I think it would be better if we can address those issues at first
 and it would be easier to address these issues before adding more new
 features. Welcome any ideas.

I just had an idea this afternoon to potentially kill both these birds with one
stone: Replace the rbtree in zswap with an address_space.

Each swap type would have its own page_tree to organize the compressed objects
by type and offset (radix tree is more suited for this anyway) and a_ops that
could be called by shrink_page_list() (writepage) or the migration code
(migratepage).

Then zbud pages could be put on the normal LRU list, maybe at the beginning of
the inactive LRU so they would live for another cycle through the list, then be
reclaimed in the normal way with the mapping-a_ops-writepage() pointing to a
zswap_writepage() function that would decompress the pages and call
__swap_writepage() on them.

This might actually do away with the explicit pool size too as the compressed
pool pages wouldn't be outside the control of the MM anymore.

I'm just starting to explore this but I think it has promise.

Seth

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-17 Thread Bob Liu
Hi Krzysztof,

On 09/11/2013 04:58 PM, Krzysztof Kozlowski wrote:
> Hi,
> 
> Currently zbud pages are not movable and they cannot be allocated from CMA
> (Contiguous Memory Allocator) region. These patches add migration of zbud 
> pages.
> 

I agree that the migration of zbud pages is important so that system
will not enter order-0 page fragmentation and can be helpful for page
compaction/huge pages etc..

But after I looked at the [patch 4/5], I found it will make zbud very
complicated.
I'd prefer to add this migration feature later until current version
zswap/zbud becomes better enough and more stable.

Mel mentioned several problems about zswap/zbud in thread "[PATCH v6
0/5] zram/zsmalloc promotion".

Like "it's clunky as hell and the layering between zswap and zbud is
twisty" and "I think I brought up its stalling behaviour during review
when it was being merged. It would have been preferable if writeback
could be initiated in batches and then waited on at the very least..
 It's worse that it uses _swap_writepage directly instead of going
through a writepage ops.  It would have been better if zbud pages
existed on the LRU and written back with an address space ops and
properly handled asynchonous writeback."

So I think it would be better if we can address those issues at first
and it would be easier to address these issues before adding more new
features. Welcome any ideas.

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-17 Thread Bob Liu
Hi Krzysztof,

On 09/11/2013 04:58 PM, Krzysztof Kozlowski wrote:
 Hi,
 
 Currently zbud pages are not movable and they cannot be allocated from CMA
 (Contiguous Memory Allocator) region. These patches add migration of zbud 
 pages.
 

I agree that the migration of zbud pages is important so that system
will not enter order-0 page fragmentation and can be helpful for page
compaction/huge pages etc..

But after I looked at the [patch 4/5], I found it will make zbud very
complicated.
I'd prefer to add this migration feature later until current version
zswap/zbud becomes better enough and more stable.

Mel mentioned several problems about zswap/zbud in thread [PATCH v6
0/5] zram/zsmalloc promotion.

Like it's clunky as hell and the layering between zswap and zbud is
twisty and I think I brought up its stalling behaviour during review
when it was being merged. It would have been preferable if writeback
could be initiated in batches and then waited on at the very least..
 It's worse that it uses _swap_writepage directly instead of going
through a writepage ops.  It would have been better if zbud pages
existed on the LRU and written back with an address space ops and
properly handled asynchonous writeback.

So I think it would be better if we can address those issues at first
and it would be easier to address these issues before adding more new
features. Welcome any ideas.

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/