Re: [RFC v4 0/3] Support volatile for anonymous range

2013-01-03 Thread Minchan Kim
On Fri, Dec 28, 2012 at 09:24:53AM +0900, Kamezawa Hiroyuki wrote:
> (2012/12/26 12:46), Minchan Kim wrote:
> >Hi Kame,
> >
> >What are you doing these holiday season? :)
> >I can't believe you sit down in front of computer.
> >
> Honestly, my holiday starts tomorrow ;) (but until 1/5 in the next year.)
> 
> >>
> >>Hm, by the way, the user need to attach pages to the process by causing 
> >>page-fault
> >>(as you do by memset()) before calling mvolatile() ?
> >
> >For effectiveness, Yes.
> >
> 
> Isn't it better to make page-fault by get_user_pages() in mvolatile() ?
> Calling page fault in userland seems just to increase burden of apps.

It seems you misunderstood. Firstly, this patch's goal is to minimize
minor fault + page allocation + memset_zero if possible on anon pages.

If someone(like allocator) calls madvise(DONTNEED)/munmap on range
which has garbage collected memory, VM zaps all the pte so if user
try to reuse that range, we can't avoid above overheads.

The mvolatile avoids them with not zapping ptes when memory pressure isn't
severe while VM can discard pages without swapping out if memory pressure
happens.

So, GUP in mvolatile isn't necessary.

> 
> >>
> >>I think your approach is interesting, anyway.
> >
> >Thanks for your interest, Kame.
> >
> >あけましておめでとう.
> >
> 
> A happy new year.
> 
> Thanks,
> -Kame
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v4 0/3] Support volatile for anonymous range

2013-01-03 Thread Minchan Kim
On Fri, Dec 28, 2012 at 09:24:53AM +0900, Kamezawa Hiroyuki wrote:
 (2012/12/26 12:46), Minchan Kim wrote:
 Hi Kame,
 
 What are you doing these holiday season? :)
 I can't believe you sit down in front of computer.
 
 Honestly, my holiday starts tomorrow ;) (but until 1/5 in the next year.)
 
 
 Hm, by the way, the user need to attach pages to the process by causing 
 page-fault
 (as you do by memset()) before calling mvolatile() ?
 
 For effectiveness, Yes.
 
 
 Isn't it better to make page-fault by get_user_pages() in mvolatile() ?
 Calling page fault in userland seems just to increase burden of apps.

It seems you misunderstood. Firstly, this patch's goal is to minimize
minor fault + page allocation + memset_zero if possible on anon pages.

If someone(like allocator) calls madvise(DONTNEED)/munmap on range
which has garbage collected memory, VM zaps all the pte so if user
try to reuse that range, we can't avoid above overheads.

The mvolatile avoids them with not zapping ptes when memory pressure isn't
severe while VM can discard pages without swapping out if memory pressure
happens.

So, GUP in mvolatile isn't necessary.

 
 
 I think your approach is interesting, anyway.
 
 Thanks for your interest, Kame.
 
 あけましておめでとう.
 
 
 A happy new year.
 
 Thanks,
 -Kame
 
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-27 Thread Kamezawa Hiroyuki

(2012/12/26 12:46), Minchan Kim wrote:

Hi Kame,

What are you doing these holiday season? :)
I can't believe you sit down in front of computer.


Honestly, my holiday starts tomorrow ;) (but until 1/5 in the next year.)



Hm, by the way, the user need to attach pages to the process by causing 
page-fault
(as you do by memset()) before calling mvolatile() ?


For effectiveness, Yes.



Isn't it better to make page-fault by get_user_pages() in mvolatile() ?
Calling page fault in userland seems just to increase burden of apps.



I think your approach is interesting, anyway.


Thanks for your interest, Kame.

あけましておめでとう.



A happy new year.

Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-27 Thread Kamezawa Hiroyuki

(2012/12/26 12:46), Minchan Kim wrote:

Hi Kame,

What are you doing these holiday season? :)
I can't believe you sit down in front of computer.


Honestly, my holiday starts tomorrow ;) (but until 1/5 in the next year.)



Hm, by the way, the user need to attach pages to the process by causing 
page-fault
(as you do by memset()) before calling mvolatile() ?


For effectiveness, Yes.



Isn't it better to make page-fault by get_user_pages() in mvolatile() ?
Calling page fault in userland seems just to increase burden of apps.



I think your approach is interesting, anyway.


Thanks for your interest, Kame.

あけましておめでとう.



A happy new year.

Thanks,
-Kame


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-25 Thread Minchan Kim
Hi Kame,

What are you doing these holiday season? :)
I can't believe you sit down in front of computer.

On Wed, Dec 26, 2012 at 11:37:02AM +0900, Kamezawa Hiroyuki wrote:
> (2012/12/18 15:47), Minchan Kim wrote:
> > This is still RFC because we need more input from user-space
> > people and discussion about interface/reclaim policy of volatile
> > pages and I want to expand this concept to tmpfs volatile range
> > if it is possbile without big performance drop of anonymous volatile
> > rnage (Let's define our term. anon volatile VS tmpfs volatile? John?)
> > 
> > NOTE: I didn't consider THP/KSM so for test, you should disable them.
> > 
> > I hope more inputs from user-space allocator people and test patch
> > with their allocator because it might need design change of arena
> > management for getting real vaule.
> > 
> > Changelog from v4
> > 
> >   * Add new system call mvolatile/mnovolatile
> >   * Add sigbus when user try to access volatile range
> >   * Rebased on v3.7
> >   * Applied bug fix from John Stultz, Thanks!
> > 
> > Changelog from v3
> > 
> >   * Removing madvise(addr, length, MADV_NOVOLATILE).
> >   * add vmstat about the number of discarded volatile pages
> >   * discard volatile pages without promotion in reclaim path
> > 
> > This is based on v3.7
> > 
> > - What's the mvolatile(addr, length)?
> > 
> >It's a hint that user deliver to kernel so kernel can *discard*
> >pages in a range anytime.
> > 
> 
> This can work against both of PRIVATE and SHARED mapping  ?

Yes.

> 
> What happens at fork() ? VOLATILE ranges are copied ?

Just child vma would have a VM_VOLATILE flag.
If a page is shared like above, the page could be discarded only when
all vmas pointing to the page are VM_VOLATILE.

> 
> 
> > - What happens if user access page(ie, virtual address) discarded
> >by kernel?
> > 
> >The user can encounter SIGBUS.
> > 
> > - What should user do for avoding SIGBUS?
> >He should call mnovolatie(addr, length) before accessing the range
> >which was called by mvolatile.
> > 
> Will mnovolatile() return whether the range is discarded or not ?

Absolutely.

> 
> What the user should do in signal handler ?

It depends on usecase.
Please read John's mail. http://lwn.net/Articles/518130/
Quote from the link
"
But one interesting new tweak on this design, suggested by the Taras
Glek and others at Mozilla, is as follows:

Instead of leaving volatile data access as being undefined , when
accessing volatile data, either the data expected will be returned
if it has not been purged, or the application will get a SIGBUS when
it accesses volatile data that has been purged.

Everything else remains the same (error on marking non-volatile
if data was purged, etc). This model allows applications to avoid
having to unmark volatile data when it wants to access it, then
immediately re-mark it as volatile when its done. It is in effect
"lazy" with its marking, allowing the kernel to hit it with a signal
when it gets unlucky and touches purged data. From the signal handler,
the application can note the address it faulted on, unmark the range,
and regenerate the needed data before returning to execution.

Since this approach avoids the more explicit unmark/access/mark
pattern, it avoids the extra overhead required to ensure data is
non-volatile before being accessed.

However, If applications don't want to deal with handling the
sigbus, they can use the more straightforward (but more costly)
unmark/access/mark pattern in the same way as my earlier proposals.

This allows folks to balance the cost vs complexity in their
application appropriately.

So that's a general overview of how the idea I'm proposing could
be used.
"

> Can the all expected opereations be done in signal-safe manner ?
> (IOW, can user do enough job easily without taking any locks in userland ?)

It depends on design of user application but some user space guys want
it so it could be done enoughly, I think. Expecially, Android have used it
by ashmem where was another interface for same goal but it works only tmpfs 
pages
but mine is normal anonymous page but the goal is to support both.

> 
> > - What happens if user access page(ie, virtual address) doesn't
> >discarded by kernel?
> > 
> >The user can see old data without page fault.
> > 
> 
> What happens when ther user calls mvolatile() against mlock()'d range or
> calling mlock() against mvolatile()'d range ?

-EINVAL

> 
> Hm, by the way, the user need to attach pages to the process by causing 
> page-fault
> (as you do by memset()) before calling mvolatile() ?

For effectiveness, Yes.

> 
> I think your approach is interesting, anyway.

Thanks for your interest, Kame.

あけましておめでとう.

> 
> Thanks,
> -Kame
> 
> 
> > - What's different with madvise(DONTNEED)?
> > 
> >System call semantic
> > 
> >DONTNEED makes sure user always can see zero-fill pages after
> >he calls madvise while mvolatile can see old data or encounter
> >SIGBUS.
> > 
> >

Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/18 15:47), Minchan Kim wrote:
> This is still RFC because we need more input from user-space
> people and discussion about interface/reclaim policy of volatile
> pages and I want to expand this concept to tmpfs volatile range
> if it is possbile without big performance drop of anonymous volatile
> rnage (Let's define our term. anon volatile VS tmpfs volatile? John?)
> 
> NOTE: I didn't consider THP/KSM so for test, you should disable them.
> 
> I hope more inputs from user-space allocator people and test patch
> with their allocator because it might need design change of arena
> management for getting real vaule.
> 
> Changelog from v4
> 
>   * Add new system call mvolatile/mnovolatile
>   * Add sigbus when user try to access volatile range
>   * Rebased on v3.7
>   * Applied bug fix from John Stultz, Thanks!
> 
> Changelog from v3
> 
>   * Removing madvise(addr, length, MADV_NOVOLATILE).
>   * add vmstat about the number of discarded volatile pages
>   * discard volatile pages without promotion in reclaim path
> 
> This is based on v3.7
> 
> - What's the mvolatile(addr, length)?
> 
>It's a hint that user deliver to kernel so kernel can *discard*
>pages in a range anytime.
> 

This can work against both of PRIVATE and SHARED mapping  ?

What happens at fork() ? VOLATILE ranges are copied ?


> - What happens if user access page(ie, virtual address) discarded
>by kernel?
> 
>The user can encounter SIGBUS.
> 
> - What should user do for avoding SIGBUS?
>He should call mnovolatie(addr, length) before accessing the range
>which was called by mvolatile.
> 
Will mnovolatile() return whether the range is discarded or not ?

What the user should do in signal handler ?
Can the all expected opereations be done in signal-safe manner ?
(IOW, can user do enough job easily without taking any locks in userland ?)

> - What happens if user access page(ie, virtual address) doesn't
>discarded by kernel?
> 
>The user can see old data without page fault.
> 

What happens when ther user calls mvolatile() against mlock()'d range or
calling mlock() against mvolatile()'d range ?

Hm, by the way, the user need to attach pages to the process by causing 
page-fault
(as you do by memset()) before calling mvolatile() ?

I think your approach is interesting, anyway.

Thanks,
-Kame


> - What's different with madvise(DONTNEED)?
> 
>System call semantic
> 
>DONTNEED makes sure user always can see zero-fill pages after
>he calls madvise while mvolatile can see old data or encounter
>SIGBUS.
> 
>Internal implementation
> 
>The madvise(DONTNEED) should zap all mapped pages in range so
>overhead is increased linearly with the number of mapped pages.
>Even, if user access zapped pages as write mode, page fault +
>page allocation + memset should be happened.
> 
>The mvolatile just marks the flag in a range(ie, VMA) instead of
>zapping all of pte in the vma so it doesn't touch ptes any more.
> 
> - What's the benefit compared to DONTNEED?
> 
>1. The system call overhead is smaller because mvolatile just marks
>   the flag to VMA instead of zapping all the page in a range so
>   overhead should be very small.
> 
>2. It has a chance to eliminate overheads (ex, zapping pte + page fault
>   + page allocation + memset(PAGE_SIZE)) if memory pressure isn't
>   severe.
> 
>3. It has a potential to zap all ptes and free the pages if memory
>   pressure is severe so reclaim overhead could be disappear - TODO
> 
> - Isn't there any drawback?
> 
>Madvise(DONTNEED) doesn't need exclusive mmap_sem so concurrent page
>fault of other threads could be allowed. But m[no]volatile needs
>exclusive mmap_sem so other thread would be blocked if they try to
>access not-yet-mapped pages. That's why I design m[no]volatile
>overhead should be small as far as possible.
> 
>It could suffer from max rss usage increasement because madvise(DONTNEED)
>deallocates pages instantly when the system call is issued while mvoatile
>delays it until memory pressure happens so if memory pressure is severe by
>max rss incresement, system would suffer. First of all, allocator needs
>some balance logic for that or kernel might handle it by zapping pages
>although user calls mvolatile if memory pressure is severe.
>The problem is how we know memory pressure is severe.
>One of solution is to see kswapd is active or not. Another solution is
>Anton's mempressure so allocator can handle it.
> 
> - What's for targetting?
> 
>Firstly, user-space allocator like ptmalloc, tcmalloc or heap management
>of virtual machine like Dalvik. Also, it comes in handy for embedded
>which doesn't have swap device so they can't reclaim anonymous pages.
>By discarding instead of swapout, it could be used in the non-swap system.
>For it, we have to age anon lru list although we don't have swap because
>I don't want to 

Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/18 15:47), Minchan Kim wrote:
 This is still RFC because we need more input from user-space
 people and discussion about interface/reclaim policy of volatile
 pages and I want to expand this concept to tmpfs volatile range
 if it is possbile without big performance drop of anonymous volatile
 rnage (Let's define our term. anon volatile VS tmpfs volatile? John?)
 
 NOTE: I didn't consider THP/KSM so for test, you should disable them.
 
 I hope more inputs from user-space allocator people and test patch
 with their allocator because it might need design change of arena
 management for getting real vaule.
 
 Changelog from v4
 
   * Add new system call mvolatile/mnovolatile
   * Add sigbus when user try to access volatile range
   * Rebased on v3.7
   * Applied bug fix from John Stultz, Thanks!
 
 Changelog from v3
 
   * Removing madvise(addr, length, MADV_NOVOLATILE).
   * add vmstat about the number of discarded volatile pages
   * discard volatile pages without promotion in reclaim path
 
 This is based on v3.7
 
 - What's the mvolatile(addr, length)?
 
It's a hint that user deliver to kernel so kernel can *discard*
pages in a range anytime.
 

This can work against both of PRIVATE and SHARED mapping  ?

What happens at fork() ? VOLATILE ranges are copied ?


 - What happens if user access page(ie, virtual address) discarded
by kernel?
 
The user can encounter SIGBUS.
 
 - What should user do for avoding SIGBUS?
He should call mnovolatie(addr, length) before accessing the range
which was called by mvolatile.
 
Will mnovolatile() return whether the range is discarded or not ?

What the user should do in signal handler ?
Can the all expected opereations be done in signal-safe manner ?
(IOW, can user do enough job easily without taking any locks in userland ?)

 - What happens if user access page(ie, virtual address) doesn't
discarded by kernel?
 
The user can see old data without page fault.
 

What happens when ther user calls mvolatile() against mlock()'d range or
calling mlock() against mvolatile()'d range ?

Hm, by the way, the user need to attach pages to the process by causing 
page-fault
(as you do by memset()) before calling mvolatile() ?

I think your approach is interesting, anyway.

Thanks,
-Kame


 - What's different with madvise(DONTNEED)?
 
System call semantic
 
DONTNEED makes sure user always can see zero-fill pages after
he calls madvise while mvolatile can see old data or encounter
SIGBUS.
 
Internal implementation
 
The madvise(DONTNEED) should zap all mapped pages in range so
overhead is increased linearly with the number of mapped pages.
Even, if user access zapped pages as write mode, page fault +
page allocation + memset should be happened.
 
The mvolatile just marks the flag in a range(ie, VMA) instead of
zapping all of pte in the vma so it doesn't touch ptes any more.
 
 - What's the benefit compared to DONTNEED?
 
1. The system call overhead is smaller because mvolatile just marks
   the flag to VMA instead of zapping all the page in a range so
   overhead should be very small.
 
2. It has a chance to eliminate overheads (ex, zapping pte + page fault
   + page allocation + memset(PAGE_SIZE)) if memory pressure isn't
   severe.
 
3. It has a potential to zap all ptes and free the pages if memory
   pressure is severe so reclaim overhead could be disappear - TODO
 
 - Isn't there any drawback?
 
Madvise(DONTNEED) doesn't need exclusive mmap_sem so concurrent page
fault of other threads could be allowed. But m[no]volatile needs
exclusive mmap_sem so other thread would be blocked if they try to
access not-yet-mapped pages. That's why I design m[no]volatile
overhead should be small as far as possible.
 
It could suffer from max rss usage increasement because madvise(DONTNEED)
deallocates pages instantly when the system call is issued while mvoatile
delays it until memory pressure happens so if memory pressure is severe by
max rss incresement, system would suffer. First of all, allocator needs
some balance logic for that or kernel might handle it by zapping pages
although user calls mvolatile if memory pressure is severe.
The problem is how we know memory pressure is severe.
One of solution is to see kswapd is active or not. Another solution is
Anton's mempressure so allocator can handle it.
 
 - What's for targetting?
 
Firstly, user-space allocator like ptmalloc, tcmalloc or heap management
of virtual machine like Dalvik. Also, it comes in handy for embedded
which doesn't have swap device so they can't reclaim anonymous pages.
By discarding instead of swapout, it could be used in the non-swap system.
For it, we have to age anon lru list although we don't have swap because
I don't want to discard volatile pages by top priority when memory pressure
happens as volatile in this patch means 

Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-25 Thread Minchan Kim
Hi Kame,

What are you doing these holiday season? :)
I can't believe you sit down in front of computer.

On Wed, Dec 26, 2012 at 11:37:02AM +0900, Kamezawa Hiroyuki wrote:
 (2012/12/18 15:47), Minchan Kim wrote:
  This is still RFC because we need more input from user-space
  people and discussion about interface/reclaim policy of volatile
  pages and I want to expand this concept to tmpfs volatile range
  if it is possbile without big performance drop of anonymous volatile
  rnage (Let's define our term. anon volatile VS tmpfs volatile? John?)
  
  NOTE: I didn't consider THP/KSM so for test, you should disable them.
  
  I hope more inputs from user-space allocator people and test patch
  with their allocator because it might need design change of arena
  management for getting real vaule.
  
  Changelog from v4
  
* Add new system call mvolatile/mnovolatile
* Add sigbus when user try to access volatile range
* Rebased on v3.7
* Applied bug fix from John Stultz, Thanks!
  
  Changelog from v3
  
* Removing madvise(addr, length, MADV_NOVOLATILE).
* add vmstat about the number of discarded volatile pages
* discard volatile pages without promotion in reclaim path
  
  This is based on v3.7
  
  - What's the mvolatile(addr, length)?
  
 It's a hint that user deliver to kernel so kernel can *discard*
 pages in a range anytime.
  
 
 This can work against both of PRIVATE and SHARED mapping  ?

Yes.

 
 What happens at fork() ? VOLATILE ranges are copied ?

Just child vma would have a VM_VOLATILE flag.
If a page is shared like above, the page could be discarded only when
all vmas pointing to the page are VM_VOLATILE.

 
 
  - What happens if user access page(ie, virtual address) discarded
 by kernel?
  
 The user can encounter SIGBUS.
  
  - What should user do for avoding SIGBUS?
 He should call mnovolatie(addr, length) before accessing the range
 which was called by mvolatile.
  
 Will mnovolatile() return whether the range is discarded or not ?

Absolutely.

 
 What the user should do in signal handler ?

It depends on usecase.
Please read John's mail. http://lwn.net/Articles/518130/
Quote from the link

But one interesting new tweak on this design, suggested by the Taras
Glek and others at Mozilla, is as follows:

Instead of leaving volatile data access as being undefined , when
accessing volatile data, either the data expected will be returned
if it has not been purged, or the application will get a SIGBUS when
it accesses volatile data that has been purged.

Everything else remains the same (error on marking non-volatile
if data was purged, etc). This model allows applications to avoid
having to unmark volatile data when it wants to access it, then
immediately re-mark it as volatile when its done. It is in effect
lazy with its marking, allowing the kernel to hit it with a signal
when it gets unlucky and touches purged data. From the signal handler,
the application can note the address it faulted on, unmark the range,
and regenerate the needed data before returning to execution.

Since this approach avoids the more explicit unmark/access/mark
pattern, it avoids the extra overhead required to ensure data is
non-volatile before being accessed.

However, If applications don't want to deal with handling the
sigbus, they can use the more straightforward (but more costly)
unmark/access/mark pattern in the same way as my earlier proposals.

This allows folks to balance the cost vs complexity in their
application appropriately.

So that's a general overview of how the idea I'm proposing could
be used.


 Can the all expected opereations be done in signal-safe manner ?
 (IOW, can user do enough job easily without taking any locks in userland ?)

It depends on design of user application but some user space guys want
it so it could be done enoughly, I think. Expecially, Android have used it
by ashmem where was another interface for same goal but it works only tmpfs 
pages
but mine is normal anonymous page but the goal is to support both.

 
  - What happens if user access page(ie, virtual address) doesn't
 discarded by kernel?
  
 The user can see old data without page fault.
  
 
 What happens when ther user calls mvolatile() against mlock()'d range or
 calling mlock() against mvolatile()'d range ?

-EINVAL

 
 Hm, by the way, the user need to attach pages to the process by causing 
 page-fault
 (as you do by memset()) before calling mvolatile() ?

For effectiveness, Yes.

 
 I think your approach is interesting, anyway.

Thanks for your interest, Kame.

あけましておめでとう.

 
 Thanks,
 -Kame
 
 
  - What's different with madvise(DONTNEED)?
  
 System call semantic
  
 DONTNEED makes sure user always can see zero-fill pages after
 he calls madvise while mvolatile can see old data or encounter
 SIGBUS.
  
 Internal implementation
  
 The madvise(DONTNEED) should zap all mapped pages in range so
 overhead is increased linearly with the 

Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-19 Thread Minchan Kim
On Tue, Dec 18, 2012 at 10:27:46AM -0800, Arun Sharma wrote:
> On 12/17/12 10:47 PM, Minchan Kim wrote:
> 
> >I hope more inputs from user-space allocator people and test patch
> >with their allocator because it might need design change of arena
> >management for getting real vaule.
> 
> jemalloc knows how to handle MADV_FREE on platforms that support it.
> This looks similar (we'll need a SIGBUS handler that does the right
> thing = zero the page + mark it as non-volatile in the common case).

Don't work because it's too late to mark it as non-volatile in signal
handler in case of malloc.

For example,
free(P1-P4) -> mvolatile(P1-P4) -> VM discard(P3) -> alloc(P1-P4) ->
use P1 -> VM discard(P1) -> use P3 -> SIGBUS -> mark nonvolatile ->
lost P1.

So, we should call mnovolatile before giving the free space to user.

> 
> All of this of course assumes that apps madvise the kernel through
> APIs exposed by the malloc implementation - not via a raw syscall.
> 
> In other words, some new user space code needs to be written to test

Agreed. I might want to design new allocator with this system calls if
existing allocators cannot use this system calls efficiently because it
might need allocator's design change. MADV_FREE/MADV_DONTNEED isn't cheap
due to enumerating ptes/page descriptors in that range to mark something
so I guess allocator avoids frequent calling of the such advise system call
and even if they call it, they want to call the big range as batch.
Just my imagine.

But mvolatile/mnovolatile is cheaper so you can call it more frequently
with smaller range so VM could have easy-reclaimable pages easily.
Another benefit of the mvolatile is it can change the behavior when memory
pressure is severe where it can zap all pages like DONTNEED so it could
work very flexible.
The downside of that approach is that if we call it with small range,
it can increase the number of VMA so we might tune point for VMA size.

> this out fully. Sounds feasible though.

Thanks!

> 
>  -Arun
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-19 Thread Minchan Kim
On Tue, Dec 18, 2012 at 10:27:46AM -0800, Arun Sharma wrote:
 On 12/17/12 10:47 PM, Minchan Kim wrote:
 
 I hope more inputs from user-space allocator people and test patch
 with their allocator because it might need design change of arena
 management for getting real vaule.
 
 jemalloc knows how to handle MADV_FREE on platforms that support it.
 This looks similar (we'll need a SIGBUS handler that does the right
 thing = zero the page + mark it as non-volatile in the common case).

Don't work because it's too late to mark it as non-volatile in signal
handler in case of malloc.

For example,
free(P1-P4) - mvolatile(P1-P4) - VM discard(P3) - alloc(P1-P4) -
use P1 - VM discard(P1) - use P3 - SIGBUS - mark nonvolatile -
lost P1.

So, we should call mnovolatile before giving the free space to user.

 
 All of this of course assumes that apps madvise the kernel through
 APIs exposed by the malloc implementation - not via a raw syscall.
 
 In other words, some new user space code needs to be written to test

Agreed. I might want to design new allocator with this system calls if
existing allocators cannot use this system calls efficiently because it
might need allocator's design change. MADV_FREE/MADV_DONTNEED isn't cheap
due to enumerating ptes/page descriptors in that range to mark something
so I guess allocator avoids frequent calling of the such advise system call
and even if they call it, they want to call the big range as batch.
Just my imagine.

But mvolatile/mnovolatile is cheaper so you can call it more frequently
with smaller range so VM could have easy-reclaimable pages easily.
Another benefit of the mvolatile is it can change the behavior when memory
pressure is severe where it can zap all pages like DONTNEED so it could
work very flexible.
The downside of that approach is that if we call it with small range,
it can increase the number of VMA so we might tune point for VMA size.

 this out fully. Sounds feasible though.

Thanks!

 
  -Arun
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-18 Thread Arun Sharma

On 12/17/12 10:47 PM, Minchan Kim wrote:


I hope more inputs from user-space allocator people and test patch
with their allocator because it might need design change of arena
management for getting real vaule.


jemalloc knows how to handle MADV_FREE on platforms that support it. 
This looks similar (we'll need a SIGBUS handler that does the right 
thing = zero the page + mark it as non-volatile in the common case).


All of this of course assumes that apps madvise the kernel through APIs 
exposed by the malloc implementation - not via a raw syscall.


In other words, some new user space code needs to be written to test 
this out fully. Sounds feasible though.


 -Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v4 0/3] Support volatile for anonymous range

2012-12-18 Thread Arun Sharma

On 12/17/12 10:47 PM, Minchan Kim wrote:


I hope more inputs from user-space allocator people and test patch
with their allocator because it might need design change of arena
management for getting real vaule.


jemalloc knows how to handle MADV_FREE on platforms that support it. 
This looks similar (we'll need a SIGBUS handler that does the right 
thing = zero the page + mark it as non-volatile in the common case).


All of this of course assumes that apps madvise the kernel through APIs 
exposed by the malloc implementation - not via a raw syscall.


In other words, some new user space code needs to be written to test 
this out fully. Sounds feasible though.


 -Arun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/