Re: [RFC PATCH 0/4] Support vranges on files

2013-04-09 Thread Minchan Kim
On Tue, Apr 09, 2013 at 03:36:20PM -0700, John Stultz wrote:
> On 04/08/2013 10:07 PM, Minchan Kim wrote:
> >On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote:
> >>marked volatile, it should remain volatile until someone who has the
> >>file open marks it as non-volatile.  The only time we clear the
> >>volatility is when the file is closed by all users.
> >Yes. We need it that clear volatile ranges when the file is closed
> >by ball users. That's what we need and blow my concern out.
> 
> Ok, sorry this wasn't more clear. In all the implementations I've
> pushed, the volatility only persists as long as someone holds the
> file open. Once its closed by all users, the volatility is cleared.

I now confirmed it with your implementation.
Sorry for the confusing without looking into your code in detail. :(

> 
> Hopefully that calms your worries here. :)

Yeb.

> 
> 
> 
> >>I think the concern about surprising an application that isn't
> >>expecting volatility is odd, since if an application jumped in and
> >>punched a hole in the data, that could surprise other applications
> >>as well.  If you're going to use a file that can be shared,
> >>applications have to deal with potential changes to that file by
> >>others.
> >True. My concern is delayed punching without any client of fd and
> >there is no interface to detect some range of file is volatile state or
> >not. It means anyone mapped a file with shared could encunter SIGBUS
> >although he try to best effort to check it with lsof before using.
> 
> I'll grant the SIGBUG semantics create the potential for stranger
> behavior then usual, but I think the use cases are still attractive
> enough to try to make it work.

Indeed.

> 
> 
> >>To me, the value in using volatile ranges on the file data is
> >>exactly because the file data can be shared. So it makes sense to me
> >>to have the volatility state be like the data in the file. I guess
> >>the only exception in my case is that if all the references to a
> >>file are closed, we can clear the volatility (since we don't have a
> >>sane way for the volatility to persist past that point).
> >Agree if you provide to clear out volatility when file are closed by
> >all stakeholder.
> 
> Agreed.
> 
> 
> >>One question that might help resolve this: Would having some sort of
> >>volatility checking interface be helpful in easing your concern
> >>about applications being surprised by volatility?
> >If we can provide above things, I think we don't need such interface
> >until someone want it with reasonable logic.
> 
> Sure, I just wanted to know if you saw a need right away. For now we
> can leave it be.
> 
> >>True. And performance needs to be good if this hinting interface is
> >>to be used easily. Although I worry about performance trumping sane
> >>semantics. So let me try to implement the desired behavior and we
> >>can measure the difference.
> >NP. But keep in mind that mmap_sem was really terrible for performance
> >when I took a expereiment(ie, concurrent page fault by many threads
> >while a thread calls mmap).
> >I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER.
> >So at least, we should avoid it by introducing new mode like
> >VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to
> >support mvrange-file and mvragne interface was thing userland people
> >really want although ashmem have used fd-based model.
> 
> The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting
> compromise.
> 
> Though, if one marks a VOLATILE_ANON range on an address that's an
> mmaped file, how do we detect this and provide a sane error value
> without checking the vmas?
> 

Should we check vma?
If there are conflict with existing vrange type, just return an -EINVAL?

> 
> thanks
> -john
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] Support vranges on files

2013-04-09 Thread John Stultz

On 04/08/2013 10:07 PM, Minchan Kim wrote:

On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote:

marked volatile, it should remain volatile until someone who has the
file open marks it as non-volatile.  The only time we clear the
volatility is when the file is closed by all users.

Yes. We need it that clear volatile ranges when the file is closed
by ball users. That's what we need and blow my concern out.


Ok, sorry this wasn't more clear. In all the implementations I've 
pushed, the volatility only persists as long as someone holds the file 
open. Once its closed by all users, the volatility is cleared.


Hopefully that calms your worries here. :)




I think the concern about surprising an application that isn't
expecting volatility is odd, since if an application jumped in and
punched a hole in the data, that could surprise other applications
as well.  If you're going to use a file that can be shared,
applications have to deal with potential changes to that file by
others.

True. My concern is delayed punching without any client of fd and
there is no interface to detect some range of file is volatile state or
not. It means anyone mapped a file with shared could encunter SIGBUS
although he try to best effort to check it with lsof before using.


I'll grant the SIGBUG semantics create the potential for stranger 
behavior then usual, but I think the use cases are still attractive 
enough to try to make it work.




To me, the value in using volatile ranges on the file data is
exactly because the file data can be shared. So it makes sense to me
to have the volatility state be like the data in the file. I guess
the only exception in my case is that if all the references to a
file are closed, we can clear the volatility (since we don't have a
sane way for the volatility to persist past that point).

Agree if you provide to clear out volatility when file are closed by
all stakeholder.


Agreed.



One question that might help resolve this: Would having some sort of
volatility checking interface be helpful in easing your concern
about applications being surprised by volatility?

If we can provide above things, I think we don't need such interface
until someone want it with reasonable logic.


Sure, I just wanted to know if you saw a need right away. For now we can 
leave it be.



True. And performance needs to be good if this hinting interface is
to be used easily. Although I worry about performance trumping sane
semantics. So let me try to implement the desired behavior and we
can measure the difference.

NP. But keep in mind that mmap_sem was really terrible for performance
when I took a expereiment(ie, concurrent page fault by many threads
while a thread calls mmap).
I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER.
So at least, we should avoid it by introducing new mode like
VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to
support mvrange-file and mvragne interface was thing userland people
really want although ashmem have used fd-based model.


The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting 
compromise.


Though, if one marks a VOLATILE_ANON range on an address that's an 
mmaped file, how do we detect this and provide a sane error value 
without checking the vmas?



thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] Support vranges on files

2013-04-09 Thread John Stultz

On 04/08/2013 10:07 PM, Minchan Kim wrote:

On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote:

marked volatile, it should remain volatile until someone who has the
file open marks it as non-volatile.  The only time we clear the
volatility is when the file is closed by all users.

Yes. We need it that clear volatile ranges when the file is closed
by ball users. That's what we need and blow my concern out.


Ok, sorry this wasn't more clear. In all the implementations I've 
pushed, the volatility only persists as long as someone holds the file 
open. Once its closed by all users, the volatility is cleared.


Hopefully that calms your worries here. :)




I think the concern about surprising an application that isn't
expecting volatility is odd, since if an application jumped in and
punched a hole in the data, that could surprise other applications
as well.  If you're going to use a file that can be shared,
applications have to deal with potential changes to that file by
others.

True. My concern is delayed punching without any client of fd and
there is no interface to detect some range of file is volatile state or
not. It means anyone mapped a file with shared could encunter SIGBUS
although he try to best effort to check it with lsof before using.


I'll grant the SIGBUG semantics create the potential for stranger 
behavior then usual, but I think the use cases are still attractive 
enough to try to make it work.




To me, the value in using volatile ranges on the file data is
exactly because the file data can be shared. So it makes sense to me
to have the volatility state be like the data in the file. I guess
the only exception in my case is that if all the references to a
file are closed, we can clear the volatility (since we don't have a
sane way for the volatility to persist past that point).

Agree if you provide to clear out volatility when file are closed by
all stakeholder.


Agreed.



One question that might help resolve this: Would having some sort of
volatility checking interface be helpful in easing your concern
about applications being surprised by volatility?

If we can provide above things, I think we don't need such interface
until someone want it with reasonable logic.


Sure, I just wanted to know if you saw a need right away. For now we can 
leave it be.



True. And performance needs to be good if this hinting interface is
to be used easily. Although I worry about performance trumping sane
semantics. So let me try to implement the desired behavior and we
can measure the difference.

NP. But keep in mind that mmap_sem was really terrible for performance
when I took a expereiment(ie, concurrent page fault by many threads
while a thread calls mmap).
I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER.
So at least, we should avoid it by introducing new mode like
VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to
support mvrange-file and mvragne interface was thing userland people
really want although ashmem have used fd-based model.


The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting 
compromise.


Though, if one marks a VOLATILE_ANON range on an address that's an 
mmaped file, how do we detect this and provide a sane error value 
without checking the vmas?



thanks
-john

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] Support vranges on files

2013-04-09 Thread Minchan Kim
On Tue, Apr 09, 2013 at 03:36:20PM -0700, John Stultz wrote:
 On 04/08/2013 10:07 PM, Minchan Kim wrote:
 On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote:
 marked volatile, it should remain volatile until someone who has the
 file open marks it as non-volatile.  The only time we clear the
 volatility is when the file is closed by all users.
 Yes. We need it that clear volatile ranges when the file is closed
 by ball users. That's what we need and blow my concern out.
 
 Ok, sorry this wasn't more clear. In all the implementations I've
 pushed, the volatility only persists as long as someone holds the
 file open. Once its closed by all users, the volatility is cleared.

I now confirmed it with your implementation.
Sorry for the confusing without looking into your code in detail. :(

 
 Hopefully that calms your worries here. :)

Yeb.

 
 
 
 I think the concern about surprising an application that isn't
 expecting volatility is odd, since if an application jumped in and
 punched a hole in the data, that could surprise other applications
 as well.  If you're going to use a file that can be shared,
 applications have to deal with potential changes to that file by
 others.
 True. My concern is delayed punching without any client of fd and
 there is no interface to detect some range of file is volatile state or
 not. It means anyone mapped a file with shared could encunter SIGBUS
 although he try to best effort to check it with lsof before using.
 
 I'll grant the SIGBUG semantics create the potential for stranger
 behavior then usual, but I think the use cases are still attractive
 enough to try to make it work.

Indeed.

 
 
 To me, the value in using volatile ranges on the file data is
 exactly because the file data can be shared. So it makes sense to me
 to have the volatility state be like the data in the file. I guess
 the only exception in my case is that if all the references to a
 file are closed, we can clear the volatility (since we don't have a
 sane way for the volatility to persist past that point).
 Agree if you provide to clear out volatility when file are closed by
 all stakeholder.
 
 Agreed.
 
 
 One question that might help resolve this: Would having some sort of
 volatility checking interface be helpful in easing your concern
 about applications being surprised by volatility?
 If we can provide above things, I think we don't need such interface
 until someone want it with reasonable logic.
 
 Sure, I just wanted to know if you saw a need right away. For now we
 can leave it be.
 
 True. And performance needs to be good if this hinting interface is
 to be used easily. Although I worry about performance trumping sane
 semantics. So let me try to implement the desired behavior and we
 can measure the difference.
 NP. But keep in mind that mmap_sem was really terrible for performance
 when I took a expereiment(ie, concurrent page fault by many threads
 while a thread calls mmap).
 I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER.
 So at least, we should avoid it by introducing new mode like
 VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to
 support mvrange-file and mvragne interface was thing userland people
 really want although ashmem have used fd-based model.
 
 The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting
 compromise.
 
 Though, if one marks a VOLATILE_ANON range on an address that's an
 mmaped file, how do we detect this and provide a sane error value
 without checking the vmas?
 

Should we check vma?
If there are conflict with existing vrange type, just return an -EINVAL?

 
 thanks
 -john
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] Support vranges on files

2013-04-08 Thread Minchan Kim
On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote:
> On 04/08/2013 07:18 PM, Minchan Kim wrote:
> >On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote:
> >>On 04/07/2013 05:46 PM, Minchan Kim wrote:
> >>>Hello John,
> >>>
> >>>As you know, userland people wanted to handle vrange with mmaped
> >>>pointer rather than fd-based and see the SIGBUS so I thought more
> >>>about semantic of vrange and want to make it very clear and easy.
> >>>So I suggest below semantic(Of course, it's not rock solid).
> >>>
> >>> mvrange(start_addr, lengh, mode, behavior)
> >>>
> >>>It's same with that I suggested lately but different name, just
> >>>adding prefix "m". It's per-process model(ie, mm_struct vrange)
> >>>so if process is exited, "volatility" isn't valid any more.
> >>>It isn't a problem in anonymous but could be in file-vrange so let's
> >>>introduce fvrange for covering the problem.
> >>>
> >>> fvrange(int fd, start_offset, length, mode, behavior)
> >>>
> >>>First of all, let's see mvrange with anonymous and file page POV.
> >>>
> >>>1) anon-mvrange
> >>>
> >>>The page in volaitle range will be purged only if all of processes
> >>>marked the range as volatile.
> >>>
> >>>If A process calls mvrange and is forked, vrange could be copied
> >>>from parent to child so not-yet-COWed pages could be purged
> >>>unless either one of both processes marks NO_VOLATILE explicitly.
> >>>
> >>>Of course, COWed page could be purged easily because there is no link
> >>>any more.
> >>Ack. This seems reasonable.
> >>
> >>
> >>>2) file-mvrange
> >>>
> >>>A page in volatile range will be purged only if all of processes mapped
> >>>the page marked it as volatile AND there is no process mapped the page
> >>>as "private". IOW, all of the process mapped the page should map it
> >>>with "shared" for purging.
> >>>
> >>>So, all of processes should mark each address range in own process
> >>>context if they want to collaborate with shared mapped file and gaurantee
> >>>there is no process mapped the range with "private".
> >>>
> >>>Of course, volatility state will be terminated as the process is gone.
> >>This case doesn't seem ideal to me, but is sort of how the current
> >>code works to avoid the complexity of dealing with memory volatile
> >>ranges that cross page types (file/anonymous). Although the current
> >>code just doesn't purge file pages marked with mvrange().
> >Personally, I don't think it's to avoid the complexity of implemenation.
> >I thought explict declaration volatility on range before using would be
> >more clear for userspace programmer.
> >Otherwise, he can encounter SIGBUS and got confused easily.
> >
> >Frankly speaking, I don't like to remain volatility permanently although
> >relavant processes go away and it could make processs using the file
> >much error-prone and hard to debug it.
> 
> So this is maybe is a contentious point we'll have to work out.
> 
> Maybe could you describe some use cases you envision where someone
> would want to mark pages volatile on a file that could be
> accidentally shared? Or how you think the per-mm sense of volatility
> would be beneficial in those use-cases?

My concern point is that following as

1. Process A calls mvrange for file F.
2. Process A is killed by someone or own BUG
3. Process B maps F with shared in his address space
4. Memory pressure happens
5. Process B is killed by SIGBUS but Process B really can't know why he
   was killed because he can't know anyone who open F except himself.
> 
> The use cases I envision where volatility would be used are when any
> sharing would be coordinated between processes.
> Again, that producer/consumer example from before where the empty
> portion of a very large circular buffer could be made volatile,
> scaling the actual memory usage to the actual need.
> 
> And really the same concern would likely apply in the common case
> when multiple applications mmap (shared) a file, but use fvrange()
> to mark the data as volatile. This is exactly the use case the
> Android ashmem interface works for. In that case, once the data is

I don't know Android ashmem interface well but if it works as I
mentioned early, I think it's not good interface.

> marked volatile, it should remain volatile until someone who has the
> file open marks it as non-volatile.  The only time we clear the
> volatility is when the file is closed by all users.

Yes. We need it that clear volatile ranges when the file is closed
by ball users. That's what we need and blow my concern out.

> 
> I think the concern about surprising an application that isn't
> expecting volatility is odd, since if an application jumped in and
> punched a hole in the data, that could surprise other applications
> as well.  If you're going to use a file that can be shared,
> applications have to deal with potential changes to that file by
> others.

True. My concern is delayed punching without any client of fd and
there is no interface to detect some range 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-08 Thread John Stultz

On 04/08/2013 07:18 PM, Minchan Kim wrote:

On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote:

On 04/07/2013 05:46 PM, Minchan Kim wrote:

Hello John,

As you know, userland people wanted to handle vrange with mmaped
pointer rather than fd-based and see the SIGBUS so I thought more
about semantic of vrange and want to make it very clear and easy.
So I suggest below semantic(Of course, it's not rock solid).

 mvrange(start_addr, lengh, mode, behavior)

It's same with that I suggested lately but different name, just
adding prefix "m". It's per-process model(ie, mm_struct vrange)
so if process is exited, "volatility" isn't valid any more.
It isn't a problem in anonymous but could be in file-vrange so let's
introduce fvrange for covering the problem.

 fvrange(int fd, start_offset, length, mode, behavior)

First of all, let's see mvrange with anonymous and file page POV.

1) anon-mvrange

The page in volaitle range will be purged only if all of processes
marked the range as volatile.

If A process calls mvrange and is forked, vrange could be copied

>from parent to child so not-yet-COWed pages could be purged

unless either one of both processes marks NO_VOLATILE explicitly.

Of course, COWed page could be purged easily because there is no link
any more.

Ack. This seems reasonable.



2) file-mvrange

A page in volatile range will be purged only if all of processes mapped
the page marked it as volatile AND there is no process mapped the page
as "private". IOW, all of the process mapped the page should map it
with "shared" for purging.

So, all of processes should mark each address range in own process
context if they want to collaborate with shared mapped file and gaurantee
there is no process mapped the range with "private".

Of course, volatility state will be terminated as the process is gone.

This case doesn't seem ideal to me, but is sort of how the current
code works to avoid the complexity of dealing with memory volatile
ranges that cross page types (file/anonymous). Although the current
code just doesn't purge file pages marked with mvrange().

Personally, I don't think it's to avoid the complexity of implemenation.
I thought explict declaration volatility on range before using would be
more clear for userspace programmer.
Otherwise, he can encounter SIGBUS and got confused easily.

Frankly speaking, I don't like to remain volatility permanently although
relavant processes go away and it could make processs using the file
much error-prone and hard to debug it.


So this is maybe is a contentious point we'll have to work out.

Maybe could you describe some use cases you envision where someone would 
want to mark pages volatile on a file that could be accidentally shared? 
Or how you think the per-mm sense of volatility would be beneficial in 
those use-cases?


The use cases I envision where volatility would be used are when any 
sharing would be coordinated between processes.
Again, that producer/consumer example from before where the empty 
portion of a very large circular buffer could be made volatile, scaling 
the actual memory usage to the actual need.


And really the same concern would likely apply in the common case when 
multiple applications mmap (shared) a file, but use fvrange() to mark 
the data as volatile. This is exactly the use case the Android ashmem 
interface works for. In that case, once the data is marked volatile, it 
should remain volatile until someone who has the file open marks it as 
non-volatile.  The only time we clear the volatility is when the file is 
closed by all users.


I think the concern about surprising an application that isn't expecting 
volatility is odd, since if an application jumped in and punched a hole 
in the data, that could surprise other applications as well.  If you're 
going to use a file that can be shared, applications have to deal with 
potential changes to that file by others.


To me, the value in using volatile ranges on the file data is exactly 
because the file data can be shared. So it makes sense to me to have the 
volatility state be like the data in the file. I guess the only 
exception in my case is that if all the references to a file are closed, 
we can clear the volatility (since we don't have a sane way for the 
volatility to persist past that point).


One question that might help resolve this: Would having some sort of 
volatility checking interface be helpful in easing your concern about 
applications being surprised by volatility?




Anyway, do you agree my suggestion that "we should not purge any page if
a process are using now with non-shared(ie, private)"?


Yes, or if we do purge any pages, they should not affect the private 
mapped pages (in other words, the COW link should be broken - as the 
backing page has in-effect been written to by purging).




I'd much prefer file-mvrange calls to behave identically to fvrange calls.

The important point here is that the kernel doesn't *have* to purge

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-08 Thread Minchan Kim
On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote:
> On 04/07/2013 05:46 PM, Minchan Kim wrote:
> >Hello John,
> >
> >As you know, userland people wanted to handle vrange with mmaped
> >pointer rather than fd-based and see the SIGBUS so I thought more
> >about semantic of vrange and want to make it very clear and easy.
> >So I suggest below semantic(Of course, it's not rock solid).
> >
> > mvrange(start_addr, lengh, mode, behavior)
> >
> >It's same with that I suggested lately but different name, just
> >adding prefix "m". It's per-process model(ie, mm_struct vrange)
> >so if process is exited, "volatility" isn't valid any more.
> >It isn't a problem in anonymous but could be in file-vrange so let's
> >introduce fvrange for covering the problem.
> >
> > fvrange(int fd, start_offset, length, mode, behavior)
> >
> >First of all, let's see mvrange with anonymous and file page POV.
> >
> >1) anon-mvrange
> >
> >The page in volaitle range will be purged only if all of processes
> >marked the range as volatile.
> >
> >If A process calls mvrange and is forked, vrange could be copied
> >from parent to child so not-yet-COWed pages could be purged
> >unless either one of both processes marks NO_VOLATILE explicitly.
> >
> >Of course, COWed page could be purged easily because there is no link
> >any more.
> 
> Ack. This seems reasonable.
> 
> 
> >2) file-mvrange
> >
> >A page in volatile range will be purged only if all of processes mapped
> >the page marked it as volatile AND there is no process mapped the page
> >as "private". IOW, all of the process mapped the page should map it
> >with "shared" for purging.
> >
> >So, all of processes should mark each address range in own process
> >context if they want to collaborate with shared mapped file and gaurantee
> >there is no process mapped the range with "private".
> >
> >Of course, volatility state will be terminated as the process is gone.
> 
> This case doesn't seem ideal to me, but is sort of how the current
> code works to avoid the complexity of dealing with memory volatile
> ranges that cross page types (file/anonymous). Although the current
> code just doesn't purge file pages marked with mvrange().

Personally, I don't think it's to avoid the complexity of implemenation.
I thought explict declaration volatility on range before using would be
more clear for userspace programmer.
Otherwise, he can encounter SIGBUS and got confused easily.

Frankly speaking, I don't like to remain volatility permanently although
relavant processes go away and it could make processs using the file
much error-prone and hard to debug it.

Anyway, do you agree my suggestion that "we should not purge any page if
a process are using now with non-shared(ie, private)"?

> 
> I'd much prefer file-mvrange calls to behave identically to fvrange calls.
> 
> The important point here is that the kernel doesn't *have* to purge
> anything ever. Its the kernel's discretion as to which volatile
> pages to purge when. So its easier for now to simply not purge file

Right.

> pages marked volatile via mvolatile.

NP but we should write down vague description. User try to use it
in file-backed pages and got disappointed, then is reluctant to use it
any more. :)

I'm not saying that let's write down description implementation specific
but want to say them at least new system call can affect anonymous or file
or both, at least from the beginning. Just hope.

> 
> There however is the inconsistency that file pages marked volatile
> via fvrange, then are marked non-volatile via mvrange() might still
> be purged. That is broken in my mind, and still needs to be
> addressed. The easiest out is probably just to return an error if
> any of the mvrange calls cover file pages. But I'd really like a

It needs vma enumeration and mmap_sem read-lock.
It could hurt anon-vrange performance severely.

> better fix.

Another idea is that we can move per-mm vrange element to address_space
when the process goes away if the element covers file-backd vma.
But I'm still very not sure whether we should keep it persistent.

> 
> 
> >3) fvrange
> >
> >It's same with 2) but volatility state could be persistent in address_space
> >until someone calls fvrange(NO_VOLATILE).
> >So it could remove the weakness of 2).
> >What do you think about above semantic?
> 
> 
> I'd still like mvrange() calls on shared mapped files to be stored
> on the address_space.
> 
> 
> >If you don't have any problem, we could implement it. I think 1) and 2) could
> >be handled with my base code for anon-vrange handling with tweaking
> >file-vrange and need your new patches in address_space for handling 3).
> 
> I think we can get it sorted out. It might just take a few iterations.

Sure!

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-08 Thread John Stultz

On 04/07/2013 05:46 PM, Minchan Kim wrote:

Hello John,

As you know, userland people wanted to handle vrange with mmaped
pointer rather than fd-based and see the SIGBUS so I thought more
about semantic of vrange and want to make it very clear and easy.
So I suggest below semantic(Of course, it's not rock solid).

 mvrange(start_addr, lengh, mode, behavior)

It's same with that I suggested lately but different name, just
adding prefix "m". It's per-process model(ie, mm_struct vrange)
so if process is exited, "volatility" isn't valid any more.
It isn't a problem in anonymous but could be in file-vrange so let's
introduce fvrange for covering the problem.

 fvrange(int fd, start_offset, length, mode, behavior)

First of all, let's see mvrange with anonymous and file page POV.

1) anon-mvrange

The page in volaitle range will be purged only if all of processes
marked the range as volatile.

If A process calls mvrange and is forked, vrange could be copied
from parent to child so not-yet-COWed pages could be purged
unless either one of both processes marks NO_VOLATILE explicitly.

Of course, COWed page could be purged easily because there is no link
any more.


Ack. This seems reasonable.



2) file-mvrange

A page in volatile range will be purged only if all of processes mapped
the page marked it as volatile AND there is no process mapped the page
as "private". IOW, all of the process mapped the page should map it
with "shared" for purging.

So, all of processes should mark each address range in own process
context if they want to collaborate with shared mapped file and gaurantee
there is no process mapped the range with "private".

Of course, volatility state will be terminated as the process is gone.


This case doesn't seem ideal to me, but is sort of how the current code 
works to avoid the complexity of dealing with memory volatile ranges 
that cross page types (file/anonymous). Although the current code just 
doesn't purge file pages marked with mvrange().


I'd much prefer file-mvrange calls to behave identically to fvrange calls.

The important point here is that the kernel doesn't *have* to purge 
anything ever. Its the kernel's discretion as to which volatile pages to 
purge when. So its easier for now to simply not purge file pages marked 
volatile via mvolatile.


There however is the inconsistency that file pages marked volatile via 
fvrange, then are marked non-volatile via mvrange() might still be 
purged. That is broken in my mind, and still needs to be addressed. The 
easiest out is probably just to return an error if any of the mvrange 
calls cover file pages. But I'd really like a better fix.




3) fvrange

It's same with 2) but volatility state could be persistent in address_space
until someone calls fvrange(NO_VOLATILE).
So it could remove the weakness of 2).
  
What do you think about above semantic?



I'd still like mvrange() calls on shared mapped files to be stored on 
the address_space.




If you don't have any problem, we could implement it. I think 1) and 2) could
be handled with my base code for anon-vrange handling with tweaking
file-vrange and need your new patches in address_space for handling 3).


I think we can get it sorted out. It might just take a few iterations.

thanks
-john



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] Support vranges on files

2013-04-08 Thread John Stultz

On 04/07/2013 05:46 PM, Minchan Kim wrote:

Hello John,

As you know, userland people wanted to handle vrange with mmaped
pointer rather than fd-based and see the SIGBUS so I thought more
about semantic of vrange and want to make it very clear and easy.
So I suggest below semantic(Of course, it's not rock solid).

 mvrange(start_addr, lengh, mode, behavior)

It's same with that I suggested lately but different name, just
adding prefix m. It's per-process model(ie, mm_struct vrange)
so if process is exited, volatility isn't valid any more.
It isn't a problem in anonymous but could be in file-vrange so let's
introduce fvrange for covering the problem.

 fvrange(int fd, start_offset, length, mode, behavior)

First of all, let's see mvrange with anonymous and file page POV.

1) anon-mvrange

The page in volaitle range will be purged only if all of processes
marked the range as volatile.

If A process calls mvrange and is forked, vrange could be copied
from parent to child so not-yet-COWed pages could be purged
unless either one of both processes marks NO_VOLATILE explicitly.

Of course, COWed page could be purged easily because there is no link
any more.


Ack. This seems reasonable.



2) file-mvrange

A page in volatile range will be purged only if all of processes mapped
the page marked it as volatile AND there is no process mapped the page
as private. IOW, all of the process mapped the page should map it
with shared for purging.

So, all of processes should mark each address range in own process
context if they want to collaborate with shared mapped file and gaurantee
there is no process mapped the range with private.

Of course, volatility state will be terminated as the process is gone.


This case doesn't seem ideal to me, but is sort of how the current code 
works to avoid the complexity of dealing with memory volatile ranges 
that cross page types (file/anonymous). Although the current code just 
doesn't purge file pages marked with mvrange().


I'd much prefer file-mvrange calls to behave identically to fvrange calls.

The important point here is that the kernel doesn't *have* to purge 
anything ever. Its the kernel's discretion as to which volatile pages to 
purge when. So its easier for now to simply not purge file pages marked 
volatile via mvolatile.


There however is the inconsistency that file pages marked volatile via 
fvrange, then are marked non-volatile via mvrange() might still be 
purged. That is broken in my mind, and still needs to be addressed. The 
easiest out is probably just to return an error if any of the mvrange 
calls cover file pages. But I'd really like a better fix.




3) fvrange

It's same with 2) but volatility state could be persistent in address_space
until someone calls fvrange(NO_VOLATILE).
So it could remove the weakness of 2).
  
What do you think about above semantic?



I'd still like mvrange() calls on shared mapped files to be stored on 
the address_space.




If you don't have any problem, we could implement it. I think 1) and 2) could
be handled with my base code for anon-vrange handling with tweaking
file-vrange and need your new patches in address_space for handling 3).


I think we can get it sorted out. It might just take a few iterations.

thanks
-john



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] Support vranges on files

2013-04-08 Thread Minchan Kim
On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote:
 On 04/07/2013 05:46 PM, Minchan Kim wrote:
 Hello John,
 
 As you know, userland people wanted to handle vrange with mmaped
 pointer rather than fd-based and see the SIGBUS so I thought more
 about semantic of vrange and want to make it very clear and easy.
 So I suggest below semantic(Of course, it's not rock solid).
 
  mvrange(start_addr, lengh, mode, behavior)
 
 It's same with that I suggested lately but different name, just
 adding prefix m. It's per-process model(ie, mm_struct vrange)
 so if process is exited, volatility isn't valid any more.
 It isn't a problem in anonymous but could be in file-vrange so let's
 introduce fvrange for covering the problem.
 
  fvrange(int fd, start_offset, length, mode, behavior)
 
 First of all, let's see mvrange with anonymous and file page POV.
 
 1) anon-mvrange
 
 The page in volaitle range will be purged only if all of processes
 marked the range as volatile.
 
 If A process calls mvrange and is forked, vrange could be copied
 from parent to child so not-yet-COWed pages could be purged
 unless either one of both processes marks NO_VOLATILE explicitly.
 
 Of course, COWed page could be purged easily because there is no link
 any more.
 
 Ack. This seems reasonable.
 
 
 2) file-mvrange
 
 A page in volatile range will be purged only if all of processes mapped
 the page marked it as volatile AND there is no process mapped the page
 as private. IOW, all of the process mapped the page should map it
 with shared for purging.
 
 So, all of processes should mark each address range in own process
 context if they want to collaborate with shared mapped file and gaurantee
 there is no process mapped the range with private.
 
 Of course, volatility state will be terminated as the process is gone.
 
 This case doesn't seem ideal to me, but is sort of how the current
 code works to avoid the complexity of dealing with memory volatile
 ranges that cross page types (file/anonymous). Although the current
 code just doesn't purge file pages marked with mvrange().

Personally, I don't think it's to avoid the complexity of implemenation.
I thought explict declaration volatility on range before using would be
more clear for userspace programmer.
Otherwise, he can encounter SIGBUS and got confused easily.

Frankly speaking, I don't like to remain volatility permanently although
relavant processes go away and it could make processs using the file
much error-prone and hard to debug it.

Anyway, do you agree my suggestion that we should not purge any page if
a process are using now with non-shared(ie, private)?

 
 I'd much prefer file-mvrange calls to behave identically to fvrange calls.
 
 The important point here is that the kernel doesn't *have* to purge
 anything ever. Its the kernel's discretion as to which volatile
 pages to purge when. So its easier for now to simply not purge file

Right.

 pages marked volatile via mvolatile.

NP but we should write down vague description. User try to use it
in file-backed pages and got disappointed, then is reluctant to use it
any more. :)

I'm not saying that let's write down description implementation specific
but want to say them at least new system call can affect anonymous or file
or both, at least from the beginning. Just hope.

 
 There however is the inconsistency that file pages marked volatile
 via fvrange, then are marked non-volatile via mvrange() might still
 be purged. That is broken in my mind, and still needs to be
 addressed. The easiest out is probably just to return an error if
 any of the mvrange calls cover file pages. But I'd really like a

It needs vma enumeration and mmap_sem read-lock.
It could hurt anon-vrange performance severely.

 better fix.

Another idea is that we can move per-mm vrange element to address_space
when the process goes away if the element covers file-backd vma.
But I'm still very not sure whether we should keep it persistent.

 
 
 3) fvrange
 
 It's same with 2) but volatility state could be persistent in address_space
 until someone calls fvrange(NO_VOLATILE).
 So it could remove the weakness of 2).
 What do you think about above semantic?
 
 
 I'd still like mvrange() calls on shared mapped files to be stored
 on the address_space.
 
 
 If you don't have any problem, we could implement it. I think 1) and 2) could
 be handled with my base code for anon-vrange handling with tweaking
 file-vrange and need your new patches in address_space for handling 3).
 
 I think we can get it sorted out. It might just take a few iterations.

Sure!

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/4] Support vranges on files

2013-04-08 Thread John Stultz

On 04/08/2013 07:18 PM, Minchan Kim wrote:

On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote:

On 04/07/2013 05:46 PM, Minchan Kim wrote:

Hello John,

As you know, userland people wanted to handle vrange with mmaped
pointer rather than fd-based and see the SIGBUS so I thought more
about semantic of vrange and want to make it very clear and easy.
So I suggest below semantic(Of course, it's not rock solid).

 mvrange(start_addr, lengh, mode, behavior)

It's same with that I suggested lately but different name, just
adding prefix m. It's per-process model(ie, mm_struct vrange)
so if process is exited, volatility isn't valid any more.
It isn't a problem in anonymous but could be in file-vrange so let's
introduce fvrange for covering the problem.

 fvrange(int fd, start_offset, length, mode, behavior)

First of all, let's see mvrange with anonymous and file page POV.

1) anon-mvrange

The page in volaitle range will be purged only if all of processes
marked the range as volatile.

If A process calls mvrange and is forked, vrange could be copied

from parent to child so not-yet-COWed pages could be purged

unless either one of both processes marks NO_VOLATILE explicitly.

Of course, COWed page could be purged easily because there is no link
any more.

Ack. This seems reasonable.



2) file-mvrange

A page in volatile range will be purged only if all of processes mapped
the page marked it as volatile AND there is no process mapped the page
as private. IOW, all of the process mapped the page should map it
with shared for purging.

So, all of processes should mark each address range in own process
context if they want to collaborate with shared mapped file and gaurantee
there is no process mapped the range with private.

Of course, volatility state will be terminated as the process is gone.

This case doesn't seem ideal to me, but is sort of how the current
code works to avoid the complexity of dealing with memory volatile
ranges that cross page types (file/anonymous). Although the current
code just doesn't purge file pages marked with mvrange().

Personally, I don't think it's to avoid the complexity of implemenation.
I thought explict declaration volatility on range before using would be
more clear for userspace programmer.
Otherwise, he can encounter SIGBUS and got confused easily.

Frankly speaking, I don't like to remain volatility permanently although
relavant processes go away and it could make processs using the file
much error-prone and hard to debug it.


So this is maybe is a contentious point we'll have to work out.

Maybe could you describe some use cases you envision where someone would 
want to mark pages volatile on a file that could be accidentally shared? 
Or how you think the per-mm sense of volatility would be beneficial in 
those use-cases?


The use cases I envision where volatility would be used are when any 
sharing would be coordinated between processes.
Again, that producer/consumer example from before where the empty 
portion of a very large circular buffer could be made volatile, scaling 
the actual memory usage to the actual need.


And really the same concern would likely apply in the common case when 
multiple applications mmap (shared) a file, but use fvrange() to mark 
the data as volatile. This is exactly the use case the Android ashmem 
interface works for. In that case, once the data is marked volatile, it 
should remain volatile until someone who has the file open marks it as 
non-volatile.  The only time we clear the volatility is when the file is 
closed by all users.


I think the concern about surprising an application that isn't expecting 
volatility is odd, since if an application jumped in and punched a hole 
in the data, that could surprise other applications as well.  If you're 
going to use a file that can be shared, applications have to deal with 
potential changes to that file by others.


To me, the value in using volatile ranges on the file data is exactly 
because the file data can be shared. So it makes sense to me to have the 
volatility state be like the data in the file. I guess the only 
exception in my case is that if all the references to a file are closed, 
we can clear the volatility (since we don't have a sane way for the 
volatility to persist past that point).


One question that might help resolve this: Would having some sort of 
volatility checking interface be helpful in easing your concern about 
applications being surprised by volatility?




Anyway, do you agree my suggestion that we should not purge any page if
a process are using now with non-shared(ie, private)?


Yes, or if we do purge any pages, they should not affect the private 
mapped pages (in other words, the COW link should be broken - as the 
backing page has in-effect been written to by purging).




I'd much prefer file-mvrange calls to behave identically to fvrange calls.

The important point here is that the kernel doesn't *have* to purge
anything 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-08 Thread Minchan Kim
On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote:
 On 04/08/2013 07:18 PM, Minchan Kim wrote:
 On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote:
 On 04/07/2013 05:46 PM, Minchan Kim wrote:
 Hello John,
 
 As you know, userland people wanted to handle vrange with mmaped
 pointer rather than fd-based and see the SIGBUS so I thought more
 about semantic of vrange and want to make it very clear and easy.
 So I suggest below semantic(Of course, it's not rock solid).
 
  mvrange(start_addr, lengh, mode, behavior)
 
 It's same with that I suggested lately but different name, just
 adding prefix m. It's per-process model(ie, mm_struct vrange)
 so if process is exited, volatility isn't valid any more.
 It isn't a problem in anonymous but could be in file-vrange so let's
 introduce fvrange for covering the problem.
 
  fvrange(int fd, start_offset, length, mode, behavior)
 
 First of all, let's see mvrange with anonymous and file page POV.
 
 1) anon-mvrange
 
 The page in volaitle range will be purged only if all of processes
 marked the range as volatile.
 
 If A process calls mvrange and is forked, vrange could be copied
 from parent to child so not-yet-COWed pages could be purged
 unless either one of both processes marks NO_VOLATILE explicitly.
 
 Of course, COWed page could be purged easily because there is no link
 any more.
 Ack. This seems reasonable.
 
 
 2) file-mvrange
 
 A page in volatile range will be purged only if all of processes mapped
 the page marked it as volatile AND there is no process mapped the page
 as private. IOW, all of the process mapped the page should map it
 with shared for purging.
 
 So, all of processes should mark each address range in own process
 context if they want to collaborate with shared mapped file and gaurantee
 there is no process mapped the range with private.
 
 Of course, volatility state will be terminated as the process is gone.
 This case doesn't seem ideal to me, but is sort of how the current
 code works to avoid the complexity of dealing with memory volatile
 ranges that cross page types (file/anonymous). Although the current
 code just doesn't purge file pages marked with mvrange().
 Personally, I don't think it's to avoid the complexity of implemenation.
 I thought explict declaration volatility on range before using would be
 more clear for userspace programmer.
 Otherwise, he can encounter SIGBUS and got confused easily.
 
 Frankly speaking, I don't like to remain volatility permanently although
 relavant processes go away and it could make processs using the file
 much error-prone and hard to debug it.
 
 So this is maybe is a contentious point we'll have to work out.
 
 Maybe could you describe some use cases you envision where someone
 would want to mark pages volatile on a file that could be
 accidentally shared? Or how you think the per-mm sense of volatility
 would be beneficial in those use-cases?

My concern point is that following as

1. Process A calls mvrange for file F.
2. Process A is killed by someone or own BUG
3. Process B maps F with shared in his address space
4. Memory pressure happens
5. Process B is killed by SIGBUS but Process B really can't know why he
   was killed because he can't know anyone who open F except himself.
 
 The use cases I envision where volatility would be used are when any
 sharing would be coordinated between processes.
 Again, that producer/consumer example from before where the empty
 portion of a very large circular buffer could be made volatile,
 scaling the actual memory usage to the actual need.
 
 And really the same concern would likely apply in the common case
 when multiple applications mmap (shared) a file, but use fvrange()
 to mark the data as volatile. This is exactly the use case the
 Android ashmem interface works for. In that case, once the data is

I don't know Android ashmem interface well but if it works as I
mentioned early, I think it's not good interface.

 marked volatile, it should remain volatile until someone who has the
 file open marks it as non-volatile.  The only time we clear the
 volatility is when the file is closed by all users.

Yes. We need it that clear volatile ranges when the file is closed
by ball users. That's what we need and blow my concern out.

 
 I think the concern about surprising an application that isn't
 expecting volatility is odd, since if an application jumped in and
 punched a hole in the data, that could surprise other applications
 as well.  If you're going to use a file that can be shared,
 applications have to deal with potential changes to that file by
 others.

True. My concern is delayed punching without any client of fd and
there is no interface to detect some range of file is volatile state or
not. It means anyone mapped a file with shared could encunter SIGBUS
although he try to best effort to check it with lsof before using.

 
 To me, the value in using volatile ranges on the file data is
 exactly because 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-07 Thread Minchan Kim
Hello John,

As you know, userland people wanted to handle vrange with mmaped
pointer rather than fd-based and see the SIGBUS so I thought more
about semantic of vrange and want to make it very clear and easy.
So I suggest below semantic(Of course, it's not rock solid).

mvrange(start_addr, lengh, mode, behavior)

It's same with that I suggested lately but different name, just
adding prefix "m". It's per-process model(ie, mm_struct vrange)
so if process is exited, "volatility" isn't valid any more.
It isn't a problem in anonymous but could be in file-vrange so let's
introduce fvrange for covering the problem.

fvrange(int fd, start_offset, length, mode, behavior)

First of all, let's see mvrange with anonymous and file page POV.

1) anon-mvrange

The page in volaitle range will be purged only if all of processes
marked the range as volatile.

If A process calls mvrange and is forked, vrange could be copied
from parent to child so not-yet-COWed pages could be purged
unless either one of both processes marks NO_VOLATILE explicitly.

Of course, COWed page could be purged easily because there is no link
any more.

2) file-mvrange

A page in volatile range will be purged only if all of processes mapped
the page marked it as volatile AND there is no process mapped the page
as "private". IOW, all of the process mapped the page should map it
with "shared" for purging.

So, all of processes should mark each address range in own process
context if they want to collaborate with shared mapped file and gaurantee
there is no process mapped the range with "private".

Of course, volatility state will be terminated as the process is gone.

3) fvrange

It's same with 2) but volatility state could be persistent in address_space
until someone calls fvrange(NO_VOLATILE).
So it could remove the weakness of 2).
 
What do you think about above semantic?

If you don't have any problem, we could implement it. I think 1) and 2) could
be handled with my base code for anon-vrange handling with tweaking
file-vrange and need your new patches in address_space for handling 3).

On Fri, Apr 05, 2013 at 04:55:04PM +0900, Minchan Kim wrote:
> Hi John,
> 
> On Thu, Apr 04, 2013 at 10:37:52AM -0700, John Stultz wrote:
> > On 04/03/2013 11:55 PM, Minchan Kim wrote:
> > >On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:
> > >>Next we introduce a parallel fvrange() syscall for creating
> > >>volatile ranges directly against files.
> > >Okay. It seems you want to replace ashmem interface with fvrange.
> > >I dobut we have to eat a slot for system call. Can't we add "int fd"
> > >in vrange systemcall without inventing new wheel?
> > 
> > Sure, that would be doable. I just added the new syscall to make the
> > differences in functionality clear.
> > Once the subtleties are understood, we can condense things down if
> > we think its best.
> 
> Fair enough.
> 
> > 
> > 
> > >>And finally, we change the range pruging logic to be able to
> > >>handle both anonymous and file volatile ranges.
> > >Okay. Then, what's the semantic file-vrange?
> > >
> > >There is a file F. Process A mapped some part of file into his
> > >address space. Then, Process B calls fvrange same part.
> > >As I looked over your code, it purges the range although process B
> > >is using now. Right? Is it your intention? Maybe isn't.
> > 
> > Not sure if you're example has a type-o and you meant "process A is
> > using it"?  If so, yes. The point is the volatility is shared and
> > consistent across all users of the file, in the same way the data in
> > the file is shared. If process B punched a hole in the file, process
> > A would see the effect immediately. With volatile ranges, the hole
> > punching is just delayed and possibly done later by the kernel, in
> > effect on behalf of process B, so the behavior is the same.
> > 
> > Consider the case where we could have two processes mmap a tmpfs
> > file in order to create a circular buffer shared between them. You
> > could then have a producer/consumer relationship with two processes
> > where any data not between the head & tail offsets were marked
> > volatile. The producer would mark tail+size non-volatile, write the
> > data, and update the tail offset. The consumer would read data from
> > the head offset, mark the just-read range as volatile, and update
> > the offset.
> > 
> > In this example, the producer would be the only process to mark data
> > non-volatile, while the consumer would be the only one marking
> > ranges volatile. Thus the state of volatility would need to be an
> > attribute of the file, not the process, in the same way the shared
> > data is.
> > 
> > Is that clear?
> 
> Yes, I got your point that you meant shared mapping.
> Let's enumerate more examples.
> 
> 1. Process A mapped FILE A with MAP_SHARED
>Process B mapped FILE A with MAP_SHARED
>Process C calls fvrange
>Discard all pages of process A and B -> Make sense to me.
> 
> 2. Process A mapped FILE A with 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-07 Thread Minchan Kim
Hello John,

As you know, userland people wanted to handle vrange with mmaped
pointer rather than fd-based and see the SIGBUS so I thought more
about semantic of vrange and want to make it very clear and easy.
So I suggest below semantic(Of course, it's not rock solid).

mvrange(start_addr, lengh, mode, behavior)

It's same with that I suggested lately but different name, just
adding prefix m. It's per-process model(ie, mm_struct vrange)
so if process is exited, volatility isn't valid any more.
It isn't a problem in anonymous but could be in file-vrange so let's
introduce fvrange for covering the problem.

fvrange(int fd, start_offset, length, mode, behavior)

First of all, let's see mvrange with anonymous and file page POV.

1) anon-mvrange

The page in volaitle range will be purged only if all of processes
marked the range as volatile.

If A process calls mvrange and is forked, vrange could be copied
from parent to child so not-yet-COWed pages could be purged
unless either one of both processes marks NO_VOLATILE explicitly.

Of course, COWed page could be purged easily because there is no link
any more.

2) file-mvrange

A page in volatile range will be purged only if all of processes mapped
the page marked it as volatile AND there is no process mapped the page
as private. IOW, all of the process mapped the page should map it
with shared for purging.

So, all of processes should mark each address range in own process
context if they want to collaborate with shared mapped file and gaurantee
there is no process mapped the range with private.

Of course, volatility state will be terminated as the process is gone.

3) fvrange

It's same with 2) but volatility state could be persistent in address_space
until someone calls fvrange(NO_VOLATILE).
So it could remove the weakness of 2).
 
What do you think about above semantic?

If you don't have any problem, we could implement it. I think 1) and 2) could
be handled with my base code for anon-vrange handling with tweaking
file-vrange and need your new patches in address_space for handling 3).

On Fri, Apr 05, 2013 at 04:55:04PM +0900, Minchan Kim wrote:
 Hi John,
 
 On Thu, Apr 04, 2013 at 10:37:52AM -0700, John Stultz wrote:
  On 04/03/2013 11:55 PM, Minchan Kim wrote:
  On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:
  Next we introduce a parallel fvrange() syscall for creating
  volatile ranges directly against files.
  Okay. It seems you want to replace ashmem interface with fvrange.
  I dobut we have to eat a slot for system call. Can't we add int fd
  in vrange systemcall without inventing new wheel?
  
  Sure, that would be doable. I just added the new syscall to make the
  differences in functionality clear.
  Once the subtleties are understood, we can condense things down if
  we think its best.
 
 Fair enough.
 
  
  
  And finally, we change the range pruging logic to be able to
  handle both anonymous and file volatile ranges.
  Okay. Then, what's the semantic file-vrange?
  
  There is a file F. Process A mapped some part of file into his
  address space. Then, Process B calls fvrange same part.
  As I looked over your code, it purges the range although process B
  is using now. Right? Is it your intention? Maybe isn't.
  
  Not sure if you're example has a type-o and you meant process A is
  using it?  If so, yes. The point is the volatility is shared and
  consistent across all users of the file, in the same way the data in
  the file is shared. If process B punched a hole in the file, process
  A would see the effect immediately. With volatile ranges, the hole
  punching is just delayed and possibly done later by the kernel, in
  effect on behalf of process B, so the behavior is the same.
  
  Consider the case where we could have two processes mmap a tmpfs
  file in order to create a circular buffer shared between them. You
  could then have a producer/consumer relationship with two processes
  where any data not between the head  tail offsets were marked
  volatile. The producer would mark tail+size non-volatile, write the
  data, and update the tail offset. The consumer would read data from
  the head offset, mark the just-read range as volatile, and update
  the offset.
  
  In this example, the producer would be the only process to mark data
  non-volatile, while the consumer would be the only one marking
  ranges volatile. Thus the state of volatility would need to be an
  attribute of the file, not the process, in the same way the shared
  data is.
  
  Is that clear?
 
 Yes, I got your point that you meant shared mapping.
 Let's enumerate more examples.
 
 1. Process A mapped FILE A with MAP_SHARED
Process B mapped FILE A with MAP_SHARED
Process C calls fvrange
Discard all pages of process A and B - Make sense to me.
 
 2. Process A mapped FILE A with MAP_PRIVATE and is using it with read-only
Process B mapped FILE A with MAP_PRIVATE and is using it with write-only
Process C calls fvrange
 
 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-05 Thread Minchan Kim
Hi John,

On Thu, Apr 04, 2013 at 10:37:52AM -0700, John Stultz wrote:
> On 04/03/2013 11:55 PM, Minchan Kim wrote:
> >On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:
> >>Next we introduce a parallel fvrange() syscall for creating
> >>volatile ranges directly against files.
> >Okay. It seems you want to replace ashmem interface with fvrange.
> >I dobut we have to eat a slot for system call. Can't we add "int fd"
> >in vrange systemcall without inventing new wheel?
> 
> Sure, that would be doable. I just added the new syscall to make the
> differences in functionality clear.
> Once the subtleties are understood, we can condense things down if
> we think its best.

Fair enough.

> 
> 
> >>And finally, we change the range pruging logic to be able to
> >>handle both anonymous and file volatile ranges.
> >Okay. Then, what's the semantic file-vrange?
> >
> >There is a file F. Process A mapped some part of file into his
> >address space. Then, Process B calls fvrange same part.
> >As I looked over your code, it purges the range although process B
> >is using now. Right? Is it your intention? Maybe isn't.
> 
> Not sure if you're example has a type-o and you meant "process A is
> using it"?  If so, yes. The point is the volatility is shared and
> consistent across all users of the file, in the same way the data in
> the file is shared. If process B punched a hole in the file, process
> A would see the effect immediately. With volatile ranges, the hole
> punching is just delayed and possibly done later by the kernel, in
> effect on behalf of process B, so the behavior is the same.
> 
> Consider the case where we could have two processes mmap a tmpfs
> file in order to create a circular buffer shared between them. You
> could then have a producer/consumer relationship with two processes
> where any data not between the head & tail offsets were marked
> volatile. The producer would mark tail+size non-volatile, write the
> data, and update the tail offset. The consumer would read data from
> the head offset, mark the just-read range as volatile, and update
> the offset.
> 
> In this example, the producer would be the only process to mark data
> non-volatile, while the consumer would be the only one marking
> ranges volatile. Thus the state of volatility would need to be an
> attribute of the file, not the process, in the same way the shared
> data is.
> 
> Is that clear?

Yes, I got your point that you meant shared mapping.
Let's enumerate more examples.

1. Process A mapped FILE A with MAP_SHARED
   Process B mapped FILE A with MAP_SHARED
   Process C calls fvrange
   Discard all pages of process A and B -> Make sense to me.

2. Process A mapped FILE A with MAP_PRIVATE and is using it with read-only
   Process B mapped FILE A with MAP_PRIVATE and is using it with write-only
   Process C calls fvrange

   What does it happens? I expect process A lost all pages while process B
   keeps COWed pages.

3. Process A mapped FILE A with MAP_PRIVATE and is using it with read/write
   Process C calls fvrange

   Some pages non-COWed in process A are lost while some pages COWed are kept.
   Mixing.

Above all are your intention?
It would be very clear if you should have wrote down semantic you intent
about private mapped file and shared mapped file. ;-)

> 
> 
> 
> >Let's define fvrange's semantic same with anon-vrange.
> >If there is a process using range with non-volatile, at least,
> >we shouldn't purge at all.
> 
> So this I'm not in agreement with.

I got your point.

> 
> Anonymous pages are for the most part not shared, except via COW.
> And for the COW case, yes, I agree, we shouldn't purge those pages.
> 
> Similarly (and I have yet to handle this in the code), for private
> mapped files, those pages shouldn't be purged either (or purging
> them shouldn't affect the private mapped pages - not sure which
> direction to go here).

Yeb. It's questionable.
It seems fallocate for punch hole removes non-COWed pages although
they are mapped privately if I didn't miss something to read code.
If I was right, it looks very strange to me. COWed pages remain
in memory while NOT-YET-COWed pages are discarded. :(
Ho, Hmm.

> 
> But for shared mapped files, we need to keep the volatility state
> shared as well.
> 
> 
> >>Now there are some quirks still to be resolved with the approach
> >>used here. The biggest one being the vrange() call can't be used to
> >>create volatile ranges against mmapped files. Instead only the
> >Why?
> 
> As explained above, the volatility is shared like the data. The
> current vrange() code creates per-mm volatile ranges, which aren't
> shared.

Strictly speaking, we can do it by only per-mm volatile range, I think.
But the concern if we choose the approach is that what you mention in
below is we have to iterate all process's mm_sturct to check in system
call context. Of course, I don't like it and too bad design.

> 
> 
> >
> >>fvrange() can be used to create file backed volatile ranges.
> 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-05 Thread Minchan Kim
Hi John,

On Thu, Apr 04, 2013 at 10:37:52AM -0700, John Stultz wrote:
 On 04/03/2013 11:55 PM, Minchan Kim wrote:
 On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:
 Next we introduce a parallel fvrange() syscall for creating
 volatile ranges directly against files.
 Okay. It seems you want to replace ashmem interface with fvrange.
 I dobut we have to eat a slot for system call. Can't we add int fd
 in vrange systemcall without inventing new wheel?
 
 Sure, that would be doable. I just added the new syscall to make the
 differences in functionality clear.
 Once the subtleties are understood, we can condense things down if
 we think its best.

Fair enough.

 
 
 And finally, we change the range pruging logic to be able to
 handle both anonymous and file volatile ranges.
 Okay. Then, what's the semantic file-vrange?
 
 There is a file F. Process A mapped some part of file into his
 address space. Then, Process B calls fvrange same part.
 As I looked over your code, it purges the range although process B
 is using now. Right? Is it your intention? Maybe isn't.
 
 Not sure if you're example has a type-o and you meant process A is
 using it?  If so, yes. The point is the volatility is shared and
 consistent across all users of the file, in the same way the data in
 the file is shared. If process B punched a hole in the file, process
 A would see the effect immediately. With volatile ranges, the hole
 punching is just delayed and possibly done later by the kernel, in
 effect on behalf of process B, so the behavior is the same.
 
 Consider the case where we could have two processes mmap a tmpfs
 file in order to create a circular buffer shared between them. You
 could then have a producer/consumer relationship with two processes
 where any data not between the head  tail offsets were marked
 volatile. The producer would mark tail+size non-volatile, write the
 data, and update the tail offset. The consumer would read data from
 the head offset, mark the just-read range as volatile, and update
 the offset.
 
 In this example, the producer would be the only process to mark data
 non-volatile, while the consumer would be the only one marking
 ranges volatile. Thus the state of volatility would need to be an
 attribute of the file, not the process, in the same way the shared
 data is.
 
 Is that clear?

Yes, I got your point that you meant shared mapping.
Let's enumerate more examples.

1. Process A mapped FILE A with MAP_SHARED
   Process B mapped FILE A with MAP_SHARED
   Process C calls fvrange
   Discard all pages of process A and B - Make sense to me.

2. Process A mapped FILE A with MAP_PRIVATE and is using it with read-only
   Process B mapped FILE A with MAP_PRIVATE and is using it with write-only
   Process C calls fvrange

   What does it happens? I expect process A lost all pages while process B
   keeps COWed pages.

3. Process A mapped FILE A with MAP_PRIVATE and is using it with read/write
   Process C calls fvrange

   Some pages non-COWed in process A are lost while some pages COWed are kept.
   Mixing.

Above all are your intention?
It would be very clear if you should have wrote down semantic you intent
about private mapped file and shared mapped file. ;-)

 
 
 
 Let's define fvrange's semantic same with anon-vrange.
 If there is a process using range with non-volatile, at least,
 we shouldn't purge at all.
 
 So this I'm not in agreement with.

I got your point.

 
 Anonymous pages are for the most part not shared, except via COW.
 And for the COW case, yes, I agree, we shouldn't purge those pages.
 
 Similarly (and I have yet to handle this in the code), for private
 mapped files, those pages shouldn't be purged either (or purging
 them shouldn't affect the private mapped pages - not sure which
 direction to go here).

Yeb. It's questionable.
It seems fallocate for punch hole removes non-COWed pages although
they are mapped privately if I didn't miss something to read code.
If I was right, it looks very strange to me. COWed pages remain
in memory while NOT-YET-COWed pages are discarded. :(
Ho, Hmm.

 
 But for shared mapped files, we need to keep the volatility state
 shared as well.
 
 
 Now there are some quirks still to be resolved with the approach
 used here. The biggest one being the vrange() call can't be used to
 create volatile ranges against mmapped files. Instead only the
 Why?
 
 As explained above, the volatility is shared like the data. The
 current vrange() code creates per-mm volatile ranges, which aren't
 shared.

Strictly speaking, we can do it by only per-mm volatile range, I think.
But the concern if we choose the approach is that what you mention in
below is we have to iterate all process's mm_sturct to check in system
call context. Of course, I don't like it and too bad design.

 
 
 
 fvrange() can be used to create file backed volatile ranges.
 I could't understand your point. It would be better to explain
 my thought firstly then, you could point out something 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-04 Thread John Stultz

On 04/03/2013 11:55 PM, Minchan Kim wrote:

On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:

Next we introduce a parallel fvrange() syscall for creating
volatile ranges directly against files.

Okay. It seems you want to replace ashmem interface with fvrange.
I dobut we have to eat a slot for system call. Can't we add "int fd"
in vrange systemcall without inventing new wheel?


Sure, that would be doable. I just added the new syscall to make the 
differences in functionality clear.
Once the subtleties are understood, we can condense things down if we 
think its best.




And finally, we change the range pruging logic to be able to
handle both anonymous and file volatile ranges.

Okay. Then, what's the semantic file-vrange?

There is a file F. Process A mapped some part of file into his
address space. Then, Process B calls fvrange same part.
As I looked over your code, it purges the range although process B
is using now. Right? Is it your intention? Maybe isn't.


Not sure if you're example has a type-o and you meant "process A is 
using it"?  If so, yes. The point is the volatility is shared and 
consistent across all users of the file, in the same way the data in the 
file is shared. If process B punched a hole in the file, process A would 
see the effect immediately. With volatile ranges, the hole punching is 
just delayed and possibly done later by the kernel, in effect on behalf 
of process B, so the behavior is the same.


Consider the case where we could have two processes mmap a tmpfs file in 
order to create a circular buffer shared between them. You could then 
have a producer/consumer relationship with two processes where any data 
not between the head & tail offsets were marked volatile. The producer 
would mark tail+size non-volatile, write the data, and update the tail 
offset. The consumer would read data from the head offset, mark the 
just-read range as volatile, and update the offset.


In this example, the producer would be the only process to mark data 
non-volatile, while the consumer would be the only one marking ranges 
volatile. Thus the state of volatility would need to be an attribute of 
the file, not the process, in the same way the shared data is.


Is that clear?




Let's define fvrange's semantic same with anon-vrange.
If there is a process using range with non-volatile, at least,
we shouldn't purge at all.


So this I'm not in agreement with.

Anonymous pages are for the most part not shared, except via COW. And 
for the COW case, yes, I agree, we shouldn't purge those pages.


Similarly (and I have yet to handle this in the code), for private 
mapped files, those pages shouldn't be purged either (or purging them 
shouldn't affect the private mapped pages - not sure which direction to 
go here).


But for shared mapped files, we need to keep the volatility state shared 
as well.




Now there are some quirks still to be resolved with the approach
used here. The biggest one being the vrange() call can't be used to
create volatile ranges against mmapped files. Instead only the

Why?


As explained above, the volatility is shared like the data. The current 
vrange() code creates per-mm volatile ranges, which aren't shared.






fvrange() can be used to create file backed volatile ranges.

I could't understand your point. It would be better to explain
my thought firstly then, you could point out something I am missing
now. Look below.


This could be overcome by iterating across all the process VMAs to
determine if they're anonymous or file based, and if file-based,
create a VMA sized volatile range on the mapping pointed to by the
VMA.

It needs just when we start to discard pages. Simply, it is related
to reclaim path, NOT system call path so it's not a problem.


The reason we can't defer this to only the reclaim path is if volatile 
ranges on shared mappings are stored in the mm_struct, if process A sets 
up a volatile range on a shared mapping, but stores the volatility in 
its own mm, then process B wants to clear the volatility on the range, 
process B would have to iterate over all processes that have those file 
vmas mapped and change them.


Additionally if process A sets up a volatile range on a shared mapped 
file, then quits, the volatility state dies with that process.


Either way, its not just a simple matter of handling data on your own 
mm_struct. That's fine for the process' own anonymous memory, but 
doesn't work for shared file mappings.






But this would have downsides, as Minchan has been clear that he wants
to optmize the vrange() calls so that it is very cheap to create and
destroy volatile ranges. Having simple per-process ranges be created
means we don't have to iterate across the vmas in the range to
determine if they're anonymous or file backed. Instead the current
vrange() code just creates per process ranges (which may or may not
cover mmapped file data), but will only purge anonymous pages in
that range. This keeps the vrange() 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-04 Thread Minchan Kim
Hey John,

First of all, I should confess I just glanced your code and poped
several questions. If I miss something, please slap me.

On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:
> This patchset is against Minchan's vrange work here:
>   https://lkml.org/lkml/2013/3/12/105
> 
> Extending it to support volatile ranges on files. In effect
> providing the same functionality of my earlier file based
> volatile range patches on-top of Minchan's anonymous volatile
> range work.
> 
> Volatile ranges on files are different then on anonymous memory,
> because the volatility state can be shared between multiple
> applications. This makes storing the volatile ranges exclusively
> in the mm_struct (or in vmas as in Minchan's earlier work)
> inappropriate.
> 
> The patchset starts with some minor cleanup.
> 
> Then we introduce the idea of a vrange_root, which provides a
> interval-tree root and a lock to protect the tree. This structure
> can then be stored in the mm_struct or in an addres_space. Then the
> same infrastructure can be used to manage volatile ranges on both
> anonymous and file backed memory.

Thanks for the above two patches. It is a nice cleanup.

> 
> Next we introduce a parallel fvrange() syscall for creating
> volatile ranges directly against files.

Okay. It seems you want to replace ashmem interface with fvrange.
I dobut we have to eat a slot for system call. Can't we add "int fd"
in vrange systemcall without inventing new wheel?

> 
> And finally, we change the range pruging logic to be able to
> handle both anonymous and file volatile ranges.

Okay. Then, what's the semantic file-vrange?

There is a file F. Process A mapped some part of file into his
address space. Then, Process B calls fvrange same part.
As I looked over your code, it purges the range although process B
is using now. Right? Is it your intention? Maybe isn't.

Let's define fvrange's semantic same with anon-vrange.
If there is a process using range with non-volatile, at least,
we shouldn't purge at all.

So your [4/4] should investigate all processes mapped the page
atomically. You could do it with i_mmap_mutex and vrange_lock
and percolate the logic into try_to_discard_vpage.

> 
> Now there are some quirks still to be resolved with the approach
> used here. The biggest one being the vrange() call can't be used to
> create volatile ranges against mmapped files. Instead only the

Why?

> fvrange() can be used to create file backed volatile ranges.

I could't understand your point. It would be better to explain
my thought firstly then, you could point out something I am missing
now. Look below.

> 
> This could be overcome by iterating across all the process VMAs to
> determine if they're anonymous or file based, and if file-based,
> create a VMA sized volatile range on the mapping pointed to by the
> VMA.

It needs just when we start to discard pages. Simply, it is related
to reclaim path, NOT system call path so it's not a problem.

> 
> But this would have downsides, as Minchan has been clear that he wants
> to optmize the vrange() calls so that it is very cheap to create and
> destroy volatile ranges. Having simple per-process ranges be created
> means we don't have to iterate across the vmas in the range to
> determine if they're anonymous or file backed. Instead the current
> vrange() code just creates per process ranges (which may or may not
> cover mmapped file data), but will only purge anonymous pages in
> that range. This keeps the vrange() call cheap.

Right.

> 
> Additionally, just creating or destroying a single range is very
> simple to do, and requires a fixed amount of memory known up front.
> Thus we can allocate needed data prior to making any modifications.
> 
> But If we were to create a range that crosses anonymous and file
> backed pages, it must create or destroy multiple per-process or
> per-file ranges. This could require an unknown number of allocations,

This is a part I can fail to parse your opinion.

> opening the possibility of getting an ENOMEM half-way through the
> operation, leaving the volatile range partially created or destroyed.
> 
> So to keep this simple for this first pass, for now we have two
> syscalls for two types of volatile ranges.


My idea is following as

vrange(fd, start, len, mode, behavior)

A) fd = 0

1) system call context - vrange system call registers new vrange
   in mm_struct.
2) Add new vrange into LRU
3) reclaim context - walk with rmap to confirm all processes make
   the range with volatile -> discard

B) fd = 1

1) system call context - vrange system call registers new vrange
   in address_space
2) Add new vrange into LRU
3) reclaim context - walk with rmap to confirm all processes make
   the range with volatile -> discard

What's the problem in this logic?

> 
> Let me know if you have any thoughts or comments. I'm sure there's
> plenty of room for improvement here.
> 
> In the meantime I'll be playing with some different approaches 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-04 Thread Minchan Kim
Hey John,

First of all, I should confess I just glanced your code and poped
several questions. If I miss something, please slap me.

On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:
 This patchset is against Minchan's vrange work here:
   https://lkml.org/lkml/2013/3/12/105
 
 Extending it to support volatile ranges on files. In effect
 providing the same functionality of my earlier file based
 volatile range patches on-top of Minchan's anonymous volatile
 range work.
 
 Volatile ranges on files are different then on anonymous memory,
 because the volatility state can be shared between multiple
 applications. This makes storing the volatile ranges exclusively
 in the mm_struct (or in vmas as in Minchan's earlier work)
 inappropriate.
 
 The patchset starts with some minor cleanup.
 
 Then we introduce the idea of a vrange_root, which provides a
 interval-tree root and a lock to protect the tree. This structure
 can then be stored in the mm_struct or in an addres_space. Then the
 same infrastructure can be used to manage volatile ranges on both
 anonymous and file backed memory.

Thanks for the above two patches. It is a nice cleanup.

 
 Next we introduce a parallel fvrange() syscall for creating
 volatile ranges directly against files.

Okay. It seems you want to replace ashmem interface with fvrange.
I dobut we have to eat a slot for system call. Can't we add int fd
in vrange systemcall without inventing new wheel?

 
 And finally, we change the range pruging logic to be able to
 handle both anonymous and file volatile ranges.

Okay. Then, what's the semantic file-vrange?

There is a file F. Process A mapped some part of file into his
address space. Then, Process B calls fvrange same part.
As I looked over your code, it purges the range although process B
is using now. Right? Is it your intention? Maybe isn't.

Let's define fvrange's semantic same with anon-vrange.
If there is a process using range with non-volatile, at least,
we shouldn't purge at all.

So your [4/4] should investigate all processes mapped the page
atomically. You could do it with i_mmap_mutex and vrange_lock
and percolate the logic into try_to_discard_vpage.

 
 Now there are some quirks still to be resolved with the approach
 used here. The biggest one being the vrange() call can't be used to
 create volatile ranges against mmapped files. Instead only the

Why?

 fvrange() can be used to create file backed volatile ranges.

I could't understand your point. It would be better to explain
my thought firstly then, you could point out something I am missing
now. Look below.

 
 This could be overcome by iterating across all the process VMAs to
 determine if they're anonymous or file based, and if file-based,
 create a VMA sized volatile range on the mapping pointed to by the
 VMA.

It needs just when we start to discard pages. Simply, it is related
to reclaim path, NOT system call path so it's not a problem.

 
 But this would have downsides, as Minchan has been clear that he wants
 to optmize the vrange() calls so that it is very cheap to create and
 destroy volatile ranges. Having simple per-process ranges be created
 means we don't have to iterate across the vmas in the range to
 determine if they're anonymous or file backed. Instead the current
 vrange() code just creates per process ranges (which may or may not
 cover mmapped file data), but will only purge anonymous pages in
 that range. This keeps the vrange() call cheap.

Right.

 
 Additionally, just creating or destroying a single range is very
 simple to do, and requires a fixed amount of memory known up front.
 Thus we can allocate needed data prior to making any modifications.
 
 But If we were to create a range that crosses anonymous and file
 backed pages, it must create or destroy multiple per-process or
 per-file ranges. This could require an unknown number of allocations,

This is a part I can fail to parse your opinion.

 opening the possibility of getting an ENOMEM half-way through the
 operation, leaving the volatile range partially created or destroyed.
 
 So to keep this simple for this first pass, for now we have two
 syscalls for two types of volatile ranges.


My idea is following as

vrange(fd, start, len, mode, behavior)

A) fd = 0

1) system call context - vrange system call registers new vrange
   in mm_struct.
2) Add new vrange into LRU
3) reclaim context - walk with rmap to confirm all processes make
   the range with volatile - discard

B) fd = 1

1) system call context - vrange system call registers new vrange
   in address_space
2) Add new vrange into LRU
3) reclaim context - walk with rmap to confirm all processes make
   the range with volatile - discard

What's the problem in this logic?

 
 Let me know if you have any thoughts or comments. I'm sure there's
 plenty of room for improvement here.
 
 In the meantime I'll be playing with some different approaches to
 try to handle single volatile ranges that cross file and 

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-04 Thread John Stultz

On 04/03/2013 11:55 PM, Minchan Kim wrote:

On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:

Next we introduce a parallel fvrange() syscall for creating
volatile ranges directly against files.

Okay. It seems you want to replace ashmem interface with fvrange.
I dobut we have to eat a slot for system call. Can't we add int fd
in vrange systemcall without inventing new wheel?


Sure, that would be doable. I just added the new syscall to make the 
differences in functionality clear.
Once the subtleties are understood, we can condense things down if we 
think its best.




And finally, we change the range pruging logic to be able to
handle both anonymous and file volatile ranges.

Okay. Then, what's the semantic file-vrange?

There is a file F. Process A mapped some part of file into his
address space. Then, Process B calls fvrange same part.
As I looked over your code, it purges the range although process B
is using now. Right? Is it your intention? Maybe isn't.


Not sure if you're example has a type-o and you meant process A is 
using it?  If so, yes. The point is the volatility is shared and 
consistent across all users of the file, in the same way the data in the 
file is shared. If process B punched a hole in the file, process A would 
see the effect immediately. With volatile ranges, the hole punching is 
just delayed and possibly done later by the kernel, in effect on behalf 
of process B, so the behavior is the same.


Consider the case where we could have two processes mmap a tmpfs file in 
order to create a circular buffer shared between them. You could then 
have a producer/consumer relationship with two processes where any data 
not between the head  tail offsets were marked volatile. The producer 
would mark tail+size non-volatile, write the data, and update the tail 
offset. The consumer would read data from the head offset, mark the 
just-read range as volatile, and update the offset.


In this example, the producer would be the only process to mark data 
non-volatile, while the consumer would be the only one marking ranges 
volatile. Thus the state of volatility would need to be an attribute of 
the file, not the process, in the same way the shared data is.


Is that clear?




Let's define fvrange's semantic same with anon-vrange.
If there is a process using range with non-volatile, at least,
we shouldn't purge at all.


So this I'm not in agreement with.

Anonymous pages are for the most part not shared, except via COW. And 
for the COW case, yes, I agree, we shouldn't purge those pages.


Similarly (and I have yet to handle this in the code), for private 
mapped files, those pages shouldn't be purged either (or purging them 
shouldn't affect the private mapped pages - not sure which direction to 
go here).


But for shared mapped files, we need to keep the volatility state shared 
as well.




Now there are some quirks still to be resolved with the approach
used here. The biggest one being the vrange() call can't be used to
create volatile ranges against mmapped files. Instead only the

Why?


As explained above, the volatility is shared like the data. The current 
vrange() code creates per-mm volatile ranges, which aren't shared.






fvrange() can be used to create file backed volatile ranges.

I could't understand your point. It would be better to explain
my thought firstly then, you could point out something I am missing
now. Look below.


This could be overcome by iterating across all the process VMAs to
determine if they're anonymous or file based, and if file-based,
create a VMA sized volatile range on the mapping pointed to by the
VMA.

It needs just when we start to discard pages. Simply, it is related
to reclaim path, NOT system call path so it's not a problem.


The reason we can't defer this to only the reclaim path is if volatile 
ranges on shared mappings are stored in the mm_struct, if process A sets 
up a volatile range on a shared mapping, but stores the volatility in 
its own mm, then process B wants to clear the volatility on the range, 
process B would have to iterate over all processes that have those file 
vmas mapped and change them.


Additionally if process A sets up a volatile range on a shared mapped 
file, then quits, the volatility state dies with that process.


Either way, its not just a simple matter of handling data on your own 
mm_struct. That's fine for the process' own anonymous memory, but 
doesn't work for shared file mappings.






But this would have downsides, as Minchan has been clear that he wants
to optmize the vrange() calls so that it is very cheap to create and
destroy volatile ranges. Having simple per-process ranges be created
means we don't have to iterate across the vmas in the range to
determine if they're anonymous or file backed. Instead the current
vrange() code just creates per process ranges (which may or may not
cover mmapped file data), but will only purge anonymous pages in
that range. This keeps the vrange() call 

[RFC PATCH 0/4] Support vranges on files

2013-04-03 Thread John Stultz
This patchset is against Minchan's vrange work here:
https://lkml.org/lkml/2013/3/12/105

Extending it to support volatile ranges on files. In effect
providing the same functionality of my earlier file based
volatile range patches on-top of Minchan's anonymous volatile
range work.

Volatile ranges on files are different then on anonymous memory,
because the volatility state can be shared between multiple
applications. This makes storing the volatile ranges exclusively
in the mm_struct (or in vmas as in Minchan's earlier work)
inappropriate.

The patchset starts with some minor cleanup.

Then we introduce the idea of a vrange_root, which provides a
interval-tree root and a lock to protect the tree. This structure
can then be stored in the mm_struct or in an addres_space. Then the
same infrastructure can be used to manage volatile ranges on both
anonymous and file backed memory.

Next we introduce a parallel fvrange() syscall for creating
volatile ranges directly against files.

And finally, we change the range pruging logic to be able to
handle both anonymous and file volatile ranges.

Now there are some quirks still to be resolved with the approach
used here. The biggest one being the vrange() call can't be used to
create volatile ranges against mmapped files. Instead only the
fvrange() can be used to create file backed volatile ranges.

This could be overcome by iterating across all the process VMAs to
determine if they're anonymous or file based, and if file-based,
create a VMA sized volatile range on the mapping pointed to by the
VMA.

But this would have downsides, as Minchan has been clear that he wants
to optmize the vrange() calls so that it is very cheap to create and
destroy volatile ranges. Having simple per-process ranges be created
means we don't have to iterate across the vmas in the range to
determine if they're anonymous or file backed. Instead the current
vrange() code just creates per process ranges (which may or may not
cover mmapped file data), but will only purge anonymous pages in
that range. This keeps the vrange() call cheap.

Additionally, just creating or destroying a single range is very
simple to do, and requires a fixed amount of memory known up front.
Thus we can allocate needed data prior to making any modifications.

But If we were to create a range that crosses anonymous and file
backed pages, it must create or destroy multiple per-process or
per-file ranges. This could require an unknown number of allocations,
opening the possibility of getting an ENOMEM half-way through the
operation, leaving the volatile range partially created or destroyed.

So to keep this simple for this first pass, for now we have two
syscalls for two types of volatile ranges.

Let me know if you have any thoughts or comments. I'm sure there's
plenty of room for improvement here.

In the meantime I'll be playing with some different approaches to
try to handle single volatile ranges that cross file and anonymous
vmas.

The entire queue, both Minchan's changes and mine can be found here:
git://git.linaro.org/people/jstultz/android-dev.git dev/vrange-minchan

thanks
-john

Cc: linux...@kvack.org
Cc: Michael Kerrisk 
Cc: Arun Sharma 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Rik van Riel 
Cc: Neil Brown 
Cc: Mike Hommey 
Cc: Taras Glek 
Cc: KOSAKI Motohiro 
Cc: KAMEZAWA Hiroyuki 
Cc: Jason Evans 
Cc: san...@google.com
Cc: Paul Turner 
Cc: Johannes Weiner 
Cc: Michel Lespinasse 
Cc: Andrew Morton 
Cc: Minchan Kim 


John Stultz (4):
  vrange: Make various vrange.c local functions static
  vrange: Introduce vrange_root to make vrange structures more flexible
  vrange: Support fvrange() syscall for file based volatile ranges
  vrange: Enable purging of file backed volatile ranges

 arch/x86/syscalls/syscall_64.tbl |1 +
 fs/file_table.c  |5 +
 fs/inode.c   |2 +
 fs/proc/task_mmu.c   |   10 +-
 include/linux/fs.h   |2 +
 include/linux/mm_types.h |4 +-
 include/linux/vrange.h   |   60 ---
 include/linux/vrange_types.h |   22 +++
 kernel/fork.c|2 +-
 mm/vrange.c  |  334 ++
 10 files changed, 308 insertions(+), 134 deletions(-)
 create mode 100644 include/linux/vrange_types.h

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 0/4] Support vranges on files

2013-04-03 Thread John Stultz
This patchset is against Minchan's vrange work here:
https://lkml.org/lkml/2013/3/12/105

Extending it to support volatile ranges on files. In effect
providing the same functionality of my earlier file based
volatile range patches on-top of Minchan's anonymous volatile
range work.

Volatile ranges on files are different then on anonymous memory,
because the volatility state can be shared between multiple
applications. This makes storing the volatile ranges exclusively
in the mm_struct (or in vmas as in Minchan's earlier work)
inappropriate.

The patchset starts with some minor cleanup.

Then we introduce the idea of a vrange_root, which provides a
interval-tree root and a lock to protect the tree. This structure
can then be stored in the mm_struct or in an addres_space. Then the
same infrastructure can be used to manage volatile ranges on both
anonymous and file backed memory.

Next we introduce a parallel fvrange() syscall for creating
volatile ranges directly against files.

And finally, we change the range pruging logic to be able to
handle both anonymous and file volatile ranges.

Now there are some quirks still to be resolved with the approach
used here. The biggest one being the vrange() call can't be used to
create volatile ranges against mmapped files. Instead only the
fvrange() can be used to create file backed volatile ranges.

This could be overcome by iterating across all the process VMAs to
determine if they're anonymous or file based, and if file-based,
create a VMA sized volatile range on the mapping pointed to by the
VMA.

But this would have downsides, as Minchan has been clear that he wants
to optmize the vrange() calls so that it is very cheap to create and
destroy volatile ranges. Having simple per-process ranges be created
means we don't have to iterate across the vmas in the range to
determine if they're anonymous or file backed. Instead the current
vrange() code just creates per process ranges (which may or may not
cover mmapped file data), but will only purge anonymous pages in
that range. This keeps the vrange() call cheap.

Additionally, just creating or destroying a single range is very
simple to do, and requires a fixed amount of memory known up front.
Thus we can allocate needed data prior to making any modifications.

But If we were to create a range that crosses anonymous and file
backed pages, it must create or destroy multiple per-process or
per-file ranges. This could require an unknown number of allocations,
opening the possibility of getting an ENOMEM half-way through the
operation, leaving the volatile range partially created or destroyed.

So to keep this simple for this first pass, for now we have two
syscalls for two types of volatile ranges.

Let me know if you have any thoughts or comments. I'm sure there's
plenty of room for improvement here.

In the meantime I'll be playing with some different approaches to
try to handle single volatile ranges that cross file and anonymous
vmas.

The entire queue, both Minchan's changes and mine can be found here:
git://git.linaro.org/people/jstultz/android-dev.git dev/vrange-minchan

thanks
-john

Cc: linux...@kvack.org
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: Arun Sharma asha...@fb.com
Cc: Mel Gorman m...@csn.ul.ie
Cc: Hugh Dickins hu...@google.com
Cc: Dave Hansen d...@sr71.net
Cc: Rik van Riel r...@redhat.com
Cc: Neil Brown ne...@suse.de
Cc: Mike Hommey m...@glandium.org
Cc: Taras Glek tg...@mozilla.com
Cc: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
Cc: Jason Evans j...@fb.com
Cc: san...@google.com
Cc: Paul Turner p...@google.com
Cc: Johannes Weiner han...@cmpxchg.org
Cc: Michel Lespinasse wal...@google.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Minchan Kim minc...@kernel.org


John Stultz (4):
  vrange: Make various vrange.c local functions static
  vrange: Introduce vrange_root to make vrange structures more flexible
  vrange: Support fvrange() syscall for file based volatile ranges
  vrange: Enable purging of file backed volatile ranges

 arch/x86/syscalls/syscall_64.tbl |1 +
 fs/file_table.c  |5 +
 fs/inode.c   |2 +
 fs/proc/task_mmu.c   |   10 +-
 include/linux/fs.h   |2 +
 include/linux/mm_types.h |4 +-
 include/linux/vrange.h   |   60 ---
 include/linux/vrange_types.h |   22 +++
 kernel/fork.c|2 +-
 mm/vrange.c  |  334 ++
 10 files changed, 308 insertions(+), 134 deletions(-)
 create mode 100644 include/linux/vrange_types.h

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/