Re: [RFC PATCH 0/4] Support vranges on files
On Tue, Apr 09, 2013 at 03:36:20PM -0700, John Stultz wrote: > On 04/08/2013 10:07 PM, Minchan Kim wrote: > >On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote: > >>marked volatile, it should remain volatile until someone who has the > >>file open marks it as non-volatile. The only time we clear the > >>volatility is when the file is closed by all users. > >Yes. We need it that clear volatile ranges when the file is closed > >by ball users. That's what we need and blow my concern out. > > Ok, sorry this wasn't more clear. In all the implementations I've > pushed, the volatility only persists as long as someone holds the > file open. Once its closed by all users, the volatility is cleared. I now confirmed it with your implementation. Sorry for the confusing without looking into your code in detail. :( > > Hopefully that calms your worries here. :) Yeb. > > > > >>I think the concern about surprising an application that isn't > >>expecting volatility is odd, since if an application jumped in and > >>punched a hole in the data, that could surprise other applications > >>as well. If you're going to use a file that can be shared, > >>applications have to deal with potential changes to that file by > >>others. > >True. My concern is delayed punching without any client of fd and > >there is no interface to detect some range of file is volatile state or > >not. It means anyone mapped a file with shared could encunter SIGBUS > >although he try to best effort to check it with lsof before using. > > I'll grant the SIGBUG semantics create the potential for stranger > behavior then usual, but I think the use cases are still attractive > enough to try to make it work. Indeed. > > > >>To me, the value in using volatile ranges on the file data is > >>exactly because the file data can be shared. So it makes sense to me > >>to have the volatility state be like the data in the file. I guess > >>the only exception in my case is that if all the references to a > >>file are closed, we can clear the volatility (since we don't have a > >>sane way for the volatility to persist past that point). > >Agree if you provide to clear out volatility when file are closed by > >all stakeholder. > > Agreed. > > > >>One question that might help resolve this: Would having some sort of > >>volatility checking interface be helpful in easing your concern > >>about applications being surprised by volatility? > >If we can provide above things, I think we don't need such interface > >until someone want it with reasonable logic. > > Sure, I just wanted to know if you saw a need right away. For now we > can leave it be. > > >>True. And performance needs to be good if this hinting interface is > >>to be used easily. Although I worry about performance trumping sane > >>semantics. So let me try to implement the desired behavior and we > >>can measure the difference. > >NP. But keep in mind that mmap_sem was really terrible for performance > >when I took a expereiment(ie, concurrent page fault by many threads > >while a thread calls mmap). > >I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER. > >So at least, we should avoid it by introducing new mode like > >VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to > >support mvrange-file and mvragne interface was thing userland people > >really want although ashmem have used fd-based model. > > The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting > compromise. > > Though, if one marks a VOLATILE_ANON range on an address that's an > mmaped file, how do we detect this and provide a sane error value > without checking the vmas? > Should we check vma? If there are conflict with existing vrange type, just return an -EINVAL? > > thanks > -john > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/4] Support vranges on files
On 04/08/2013 10:07 PM, Minchan Kim wrote: On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote: marked volatile, it should remain volatile until someone who has the file open marks it as non-volatile. The only time we clear the volatility is when the file is closed by all users. Yes. We need it that clear volatile ranges when the file is closed by ball users. That's what we need and blow my concern out. Ok, sorry this wasn't more clear. In all the implementations I've pushed, the volatility only persists as long as someone holds the file open. Once its closed by all users, the volatility is cleared. Hopefully that calms your worries here. :) I think the concern about surprising an application that isn't expecting volatility is odd, since if an application jumped in and punched a hole in the data, that could surprise other applications as well. If you're going to use a file that can be shared, applications have to deal with potential changes to that file by others. True. My concern is delayed punching without any client of fd and there is no interface to detect some range of file is volatile state or not. It means anyone mapped a file with shared could encunter SIGBUS although he try to best effort to check it with lsof before using. I'll grant the SIGBUG semantics create the potential for stranger behavior then usual, but I think the use cases are still attractive enough to try to make it work. To me, the value in using volatile ranges on the file data is exactly because the file data can be shared. So it makes sense to me to have the volatility state be like the data in the file. I guess the only exception in my case is that if all the references to a file are closed, we can clear the volatility (since we don't have a sane way for the volatility to persist past that point). Agree if you provide to clear out volatility when file are closed by all stakeholder. Agreed. One question that might help resolve this: Would having some sort of volatility checking interface be helpful in easing your concern about applications being surprised by volatility? If we can provide above things, I think we don't need such interface until someone want it with reasonable logic. Sure, I just wanted to know if you saw a need right away. For now we can leave it be. True. And performance needs to be good if this hinting interface is to be used easily. Although I worry about performance trumping sane semantics. So let me try to implement the desired behavior and we can measure the difference. NP. But keep in mind that mmap_sem was really terrible for performance when I took a expereiment(ie, concurrent page fault by many threads while a thread calls mmap). I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER. So at least, we should avoid it by introducing new mode like VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to support mvrange-file and mvragne interface was thing userland people really want although ashmem have used fd-based model. The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting compromise. Though, if one marks a VOLATILE_ANON range on an address that's an mmaped file, how do we detect this and provide a sane error value without checking the vmas? thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/4] Support vranges on files
On 04/08/2013 10:07 PM, Minchan Kim wrote: On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote: marked volatile, it should remain volatile until someone who has the file open marks it as non-volatile. The only time we clear the volatility is when the file is closed by all users. Yes. We need it that clear volatile ranges when the file is closed by ball users. That's what we need and blow my concern out. Ok, sorry this wasn't more clear. In all the implementations I've pushed, the volatility only persists as long as someone holds the file open. Once its closed by all users, the volatility is cleared. Hopefully that calms your worries here. :) I think the concern about surprising an application that isn't expecting volatility is odd, since if an application jumped in and punched a hole in the data, that could surprise other applications as well. If you're going to use a file that can be shared, applications have to deal with potential changes to that file by others. True. My concern is delayed punching without any client of fd and there is no interface to detect some range of file is volatile state or not. It means anyone mapped a file with shared could encunter SIGBUS although he try to best effort to check it with lsof before using. I'll grant the SIGBUG semantics create the potential for stranger behavior then usual, but I think the use cases are still attractive enough to try to make it work. To me, the value in using volatile ranges on the file data is exactly because the file data can be shared. So it makes sense to me to have the volatility state be like the data in the file. I guess the only exception in my case is that if all the references to a file are closed, we can clear the volatility (since we don't have a sane way for the volatility to persist past that point). Agree if you provide to clear out volatility when file are closed by all stakeholder. Agreed. One question that might help resolve this: Would having some sort of volatility checking interface be helpful in easing your concern about applications being surprised by volatility? If we can provide above things, I think we don't need such interface until someone want it with reasonable logic. Sure, I just wanted to know if you saw a need right away. For now we can leave it be. True. And performance needs to be good if this hinting interface is to be used easily. Although I worry about performance trumping sane semantics. So let me try to implement the desired behavior and we can measure the difference. NP. But keep in mind that mmap_sem was really terrible for performance when I took a expereiment(ie, concurrent page fault by many threads while a thread calls mmap). I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER. So at least, we should avoid it by introducing new mode like VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to support mvrange-file and mvragne interface was thing userland people really want although ashmem have used fd-based model. The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting compromise. Though, if one marks a VOLATILE_ANON range on an address that's an mmaped file, how do we detect this and provide a sane error value without checking the vmas? thanks -john -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/4] Support vranges on files
On Tue, Apr 09, 2013 at 03:36:20PM -0700, John Stultz wrote: On 04/08/2013 10:07 PM, Minchan Kim wrote: On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote: marked volatile, it should remain volatile until someone who has the file open marks it as non-volatile. The only time we clear the volatility is when the file is closed by all users. Yes. We need it that clear volatile ranges when the file is closed by ball users. That's what we need and blow my concern out. Ok, sorry this wasn't more clear. In all the implementations I've pushed, the volatility only persists as long as someone holds the file open. Once its closed by all users, the volatility is cleared. I now confirmed it with your implementation. Sorry for the confusing without looking into your code in detail. :( Hopefully that calms your worries here. :) Yeb. I think the concern about surprising an application that isn't expecting volatility is odd, since if an application jumped in and punched a hole in the data, that could surprise other applications as well. If you're going to use a file that can be shared, applications have to deal with potential changes to that file by others. True. My concern is delayed punching without any client of fd and there is no interface to detect some range of file is volatile state or not. It means anyone mapped a file with shared could encunter SIGBUS although he try to best effort to check it with lsof before using. I'll grant the SIGBUG semantics create the potential for stranger behavior then usual, but I think the use cases are still attractive enough to try to make it work. Indeed. To me, the value in using volatile ranges on the file data is exactly because the file data can be shared. So it makes sense to me to have the volatility state be like the data in the file. I guess the only exception in my case is that if all the references to a file are closed, we can clear the volatility (since we don't have a sane way for the volatility to persist past that point). Agree if you provide to clear out volatility when file are closed by all stakeholder. Agreed. One question that might help resolve this: Would having some sort of volatility checking interface be helpful in easing your concern about applications being surprised by volatility? If we can provide above things, I think we don't need such interface until someone want it with reasonable logic. Sure, I just wanted to know if you saw a need right away. For now we can leave it be. True. And performance needs to be good if this hinting interface is to be used easily. Although I worry about performance trumping sane semantics. So let me try to implement the desired behavior and we can measure the difference. NP. But keep in mind that mmap_sem was really terrible for performance when I took a expereiment(ie, concurrent page fault by many threads while a thread calls mmap). I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER. So at least, we should avoid it by introducing new mode like VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to support mvrange-file and mvragne interface was thing userland people really want although ashmem have used fd-based model. The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting compromise. Though, if one marks a VOLATILE_ANON range on an address that's an mmaped file, how do we detect this and provide a sane error value without checking the vmas? Should we check vma? If there are conflict with existing vrange type, just return an -EINVAL? thanks -john -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/4] Support vranges on files
On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote: > On 04/08/2013 07:18 PM, Minchan Kim wrote: > >On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote: > >>On 04/07/2013 05:46 PM, Minchan Kim wrote: > >>>Hello John, > >>> > >>>As you know, userland people wanted to handle vrange with mmaped > >>>pointer rather than fd-based and see the SIGBUS so I thought more > >>>about semantic of vrange and want to make it very clear and easy. > >>>So I suggest below semantic(Of course, it's not rock solid). > >>> > >>> mvrange(start_addr, lengh, mode, behavior) > >>> > >>>It's same with that I suggested lately but different name, just > >>>adding prefix "m". It's per-process model(ie, mm_struct vrange) > >>>so if process is exited, "volatility" isn't valid any more. > >>>It isn't a problem in anonymous but could be in file-vrange so let's > >>>introduce fvrange for covering the problem. > >>> > >>> fvrange(int fd, start_offset, length, mode, behavior) > >>> > >>>First of all, let's see mvrange with anonymous and file page POV. > >>> > >>>1) anon-mvrange > >>> > >>>The page in volaitle range will be purged only if all of processes > >>>marked the range as volatile. > >>> > >>>If A process calls mvrange and is forked, vrange could be copied > >>>from parent to child so not-yet-COWed pages could be purged > >>>unless either one of both processes marks NO_VOLATILE explicitly. > >>> > >>>Of course, COWed page could be purged easily because there is no link > >>>any more. > >>Ack. This seems reasonable. > >> > >> > >>>2) file-mvrange > >>> > >>>A page in volatile range will be purged only if all of processes mapped > >>>the page marked it as volatile AND there is no process mapped the page > >>>as "private". IOW, all of the process mapped the page should map it > >>>with "shared" for purging. > >>> > >>>So, all of processes should mark each address range in own process > >>>context if they want to collaborate with shared mapped file and gaurantee > >>>there is no process mapped the range with "private". > >>> > >>>Of course, volatility state will be terminated as the process is gone. > >>This case doesn't seem ideal to me, but is sort of how the current > >>code works to avoid the complexity of dealing with memory volatile > >>ranges that cross page types (file/anonymous). Although the current > >>code just doesn't purge file pages marked with mvrange(). > >Personally, I don't think it's to avoid the complexity of implemenation. > >I thought explict declaration volatility on range before using would be > >more clear for userspace programmer. > >Otherwise, he can encounter SIGBUS and got confused easily. > > > >Frankly speaking, I don't like to remain volatility permanently although > >relavant processes go away and it could make processs using the file > >much error-prone and hard to debug it. > > So this is maybe is a contentious point we'll have to work out. > > Maybe could you describe some use cases you envision where someone > would want to mark pages volatile on a file that could be > accidentally shared? Or how you think the per-mm sense of volatility > would be beneficial in those use-cases? My concern point is that following as 1. Process A calls mvrange for file F. 2. Process A is killed by someone or own BUG 3. Process B maps F with shared in his address space 4. Memory pressure happens 5. Process B is killed by SIGBUS but Process B really can't know why he was killed because he can't know anyone who open F except himself. > > The use cases I envision where volatility would be used are when any > sharing would be coordinated between processes. > Again, that producer/consumer example from before where the empty > portion of a very large circular buffer could be made volatile, > scaling the actual memory usage to the actual need. > > And really the same concern would likely apply in the common case > when multiple applications mmap (shared) a file, but use fvrange() > to mark the data as volatile. This is exactly the use case the > Android ashmem interface works for. In that case, once the data is I don't know Android ashmem interface well but if it works as I mentioned early, I think it's not good interface. > marked volatile, it should remain volatile until someone who has the > file open marks it as non-volatile. The only time we clear the > volatility is when the file is closed by all users. Yes. We need it that clear volatile ranges when the file is closed by ball users. That's what we need and blow my concern out. > > I think the concern about surprising an application that isn't > expecting volatility is odd, since if an application jumped in and > punched a hole in the data, that could surprise other applications > as well. If you're going to use a file that can be shared, > applications have to deal with potential changes to that file by > others. True. My concern is delayed punching without any client of fd and there is no interface to detect some range
Re: [RFC PATCH 0/4] Support vranges on files
On 04/08/2013 07:18 PM, Minchan Kim wrote: On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote: On 04/07/2013 05:46 PM, Minchan Kim wrote: Hello John, As you know, userland people wanted to handle vrange with mmaped pointer rather than fd-based and see the SIGBUS so I thought more about semantic of vrange and want to make it very clear and easy. So I suggest below semantic(Of course, it's not rock solid). mvrange(start_addr, lengh, mode, behavior) It's same with that I suggested lately but different name, just adding prefix "m". It's per-process model(ie, mm_struct vrange) so if process is exited, "volatility" isn't valid any more. It isn't a problem in anonymous but could be in file-vrange so let's introduce fvrange for covering the problem. fvrange(int fd, start_offset, length, mode, behavior) First of all, let's see mvrange with anonymous and file page POV. 1) anon-mvrange The page in volaitle range will be purged only if all of processes marked the range as volatile. If A process calls mvrange and is forked, vrange could be copied >from parent to child so not-yet-COWed pages could be purged unless either one of both processes marks NO_VOLATILE explicitly. Of course, COWed page could be purged easily because there is no link any more. Ack. This seems reasonable. 2) file-mvrange A page in volatile range will be purged only if all of processes mapped the page marked it as volatile AND there is no process mapped the page as "private". IOW, all of the process mapped the page should map it with "shared" for purging. So, all of processes should mark each address range in own process context if they want to collaborate with shared mapped file and gaurantee there is no process mapped the range with "private". Of course, volatility state will be terminated as the process is gone. This case doesn't seem ideal to me, but is sort of how the current code works to avoid the complexity of dealing with memory volatile ranges that cross page types (file/anonymous). Although the current code just doesn't purge file pages marked with mvrange(). Personally, I don't think it's to avoid the complexity of implemenation. I thought explict declaration volatility on range before using would be more clear for userspace programmer. Otherwise, he can encounter SIGBUS and got confused easily. Frankly speaking, I don't like to remain volatility permanently although relavant processes go away and it could make processs using the file much error-prone and hard to debug it. So this is maybe is a contentious point we'll have to work out. Maybe could you describe some use cases you envision where someone would want to mark pages volatile on a file that could be accidentally shared? Or how you think the per-mm sense of volatility would be beneficial in those use-cases? The use cases I envision where volatility would be used are when any sharing would be coordinated between processes. Again, that producer/consumer example from before where the empty portion of a very large circular buffer could be made volatile, scaling the actual memory usage to the actual need. And really the same concern would likely apply in the common case when multiple applications mmap (shared) a file, but use fvrange() to mark the data as volatile. This is exactly the use case the Android ashmem interface works for. In that case, once the data is marked volatile, it should remain volatile until someone who has the file open marks it as non-volatile. The only time we clear the volatility is when the file is closed by all users. I think the concern about surprising an application that isn't expecting volatility is odd, since if an application jumped in and punched a hole in the data, that could surprise other applications as well. If you're going to use a file that can be shared, applications have to deal with potential changes to that file by others. To me, the value in using volatile ranges on the file data is exactly because the file data can be shared. So it makes sense to me to have the volatility state be like the data in the file. I guess the only exception in my case is that if all the references to a file are closed, we can clear the volatility (since we don't have a sane way for the volatility to persist past that point). One question that might help resolve this: Would having some sort of volatility checking interface be helpful in easing your concern about applications being surprised by volatility? Anyway, do you agree my suggestion that "we should not purge any page if a process are using now with non-shared(ie, private)"? Yes, or if we do purge any pages, they should not affect the private mapped pages (in other words, the COW link should be broken - as the backing page has in-effect been written to by purging). I'd much prefer file-mvrange calls to behave identically to fvrange calls. The important point here is that the kernel doesn't *have* to purge
Re: [RFC PATCH 0/4] Support vranges on files
On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote: > On 04/07/2013 05:46 PM, Minchan Kim wrote: > >Hello John, > > > >As you know, userland people wanted to handle vrange with mmaped > >pointer rather than fd-based and see the SIGBUS so I thought more > >about semantic of vrange and want to make it very clear and easy. > >So I suggest below semantic(Of course, it's not rock solid). > > > > mvrange(start_addr, lengh, mode, behavior) > > > >It's same with that I suggested lately but different name, just > >adding prefix "m". It's per-process model(ie, mm_struct vrange) > >so if process is exited, "volatility" isn't valid any more. > >It isn't a problem in anonymous but could be in file-vrange so let's > >introduce fvrange for covering the problem. > > > > fvrange(int fd, start_offset, length, mode, behavior) > > > >First of all, let's see mvrange with anonymous and file page POV. > > > >1) anon-mvrange > > > >The page in volaitle range will be purged only if all of processes > >marked the range as volatile. > > > >If A process calls mvrange and is forked, vrange could be copied > >from parent to child so not-yet-COWed pages could be purged > >unless either one of both processes marks NO_VOLATILE explicitly. > > > >Of course, COWed page could be purged easily because there is no link > >any more. > > Ack. This seems reasonable. > > > >2) file-mvrange > > > >A page in volatile range will be purged only if all of processes mapped > >the page marked it as volatile AND there is no process mapped the page > >as "private". IOW, all of the process mapped the page should map it > >with "shared" for purging. > > > >So, all of processes should mark each address range in own process > >context if they want to collaborate with shared mapped file and gaurantee > >there is no process mapped the range with "private". > > > >Of course, volatility state will be terminated as the process is gone. > > This case doesn't seem ideal to me, but is sort of how the current > code works to avoid the complexity of dealing with memory volatile > ranges that cross page types (file/anonymous). Although the current > code just doesn't purge file pages marked with mvrange(). Personally, I don't think it's to avoid the complexity of implemenation. I thought explict declaration volatility on range before using would be more clear for userspace programmer. Otherwise, he can encounter SIGBUS and got confused easily. Frankly speaking, I don't like to remain volatility permanently although relavant processes go away and it could make processs using the file much error-prone and hard to debug it. Anyway, do you agree my suggestion that "we should not purge any page if a process are using now with non-shared(ie, private)"? > > I'd much prefer file-mvrange calls to behave identically to fvrange calls. > > The important point here is that the kernel doesn't *have* to purge > anything ever. Its the kernel's discretion as to which volatile > pages to purge when. So its easier for now to simply not purge file Right. > pages marked volatile via mvolatile. NP but we should write down vague description. User try to use it in file-backed pages and got disappointed, then is reluctant to use it any more. :) I'm not saying that let's write down description implementation specific but want to say them at least new system call can affect anonymous or file or both, at least from the beginning. Just hope. > > There however is the inconsistency that file pages marked volatile > via fvrange, then are marked non-volatile via mvrange() might still > be purged. That is broken in my mind, and still needs to be > addressed. The easiest out is probably just to return an error if > any of the mvrange calls cover file pages. But I'd really like a It needs vma enumeration and mmap_sem read-lock. It could hurt anon-vrange performance severely. > better fix. Another idea is that we can move per-mm vrange element to address_space when the process goes away if the element covers file-backd vma. But I'm still very not sure whether we should keep it persistent. > > > >3) fvrange > > > >It's same with 2) but volatility state could be persistent in address_space > >until someone calls fvrange(NO_VOLATILE). > >So it could remove the weakness of 2). > >What do you think about above semantic? > > > I'd still like mvrange() calls on shared mapped files to be stored > on the address_space. > > > >If you don't have any problem, we could implement it. I think 1) and 2) could > >be handled with my base code for anon-vrange handling with tweaking > >file-vrange and need your new patches in address_space for handling 3). > > I think we can get it sorted out. It might just take a few iterations. Sure! -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the
Re: [RFC PATCH 0/4] Support vranges on files
On 04/07/2013 05:46 PM, Minchan Kim wrote: Hello John, As you know, userland people wanted to handle vrange with mmaped pointer rather than fd-based and see the SIGBUS so I thought more about semantic of vrange and want to make it very clear and easy. So I suggest below semantic(Of course, it's not rock solid). mvrange(start_addr, lengh, mode, behavior) It's same with that I suggested lately but different name, just adding prefix "m". It's per-process model(ie, mm_struct vrange) so if process is exited, "volatility" isn't valid any more. It isn't a problem in anonymous but could be in file-vrange so let's introduce fvrange for covering the problem. fvrange(int fd, start_offset, length, mode, behavior) First of all, let's see mvrange with anonymous and file page POV. 1) anon-mvrange The page in volaitle range will be purged only if all of processes marked the range as volatile. If A process calls mvrange and is forked, vrange could be copied from parent to child so not-yet-COWed pages could be purged unless either one of both processes marks NO_VOLATILE explicitly. Of course, COWed page could be purged easily because there is no link any more. Ack. This seems reasonable. 2) file-mvrange A page in volatile range will be purged only if all of processes mapped the page marked it as volatile AND there is no process mapped the page as "private". IOW, all of the process mapped the page should map it with "shared" for purging. So, all of processes should mark each address range in own process context if they want to collaborate with shared mapped file and gaurantee there is no process mapped the range with "private". Of course, volatility state will be terminated as the process is gone. This case doesn't seem ideal to me, but is sort of how the current code works to avoid the complexity of dealing with memory volatile ranges that cross page types (file/anonymous). Although the current code just doesn't purge file pages marked with mvrange(). I'd much prefer file-mvrange calls to behave identically to fvrange calls. The important point here is that the kernel doesn't *have* to purge anything ever. Its the kernel's discretion as to which volatile pages to purge when. So its easier for now to simply not purge file pages marked volatile via mvolatile. There however is the inconsistency that file pages marked volatile via fvrange, then are marked non-volatile via mvrange() might still be purged. That is broken in my mind, and still needs to be addressed. The easiest out is probably just to return an error if any of the mvrange calls cover file pages. But I'd really like a better fix. 3) fvrange It's same with 2) but volatility state could be persistent in address_space until someone calls fvrange(NO_VOLATILE). So it could remove the weakness of 2). What do you think about above semantic? I'd still like mvrange() calls on shared mapped files to be stored on the address_space. If you don't have any problem, we could implement it. I think 1) and 2) could be handled with my base code for anon-vrange handling with tweaking file-vrange and need your new patches in address_space for handling 3). I think we can get it sorted out. It might just take a few iterations. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/4] Support vranges on files
On 04/07/2013 05:46 PM, Minchan Kim wrote: Hello John, As you know, userland people wanted to handle vrange with mmaped pointer rather than fd-based and see the SIGBUS so I thought more about semantic of vrange and want to make it very clear and easy. So I suggest below semantic(Of course, it's not rock solid). mvrange(start_addr, lengh, mode, behavior) It's same with that I suggested lately but different name, just adding prefix m. It's per-process model(ie, mm_struct vrange) so if process is exited, volatility isn't valid any more. It isn't a problem in anonymous but could be in file-vrange so let's introduce fvrange for covering the problem. fvrange(int fd, start_offset, length, mode, behavior) First of all, let's see mvrange with anonymous and file page POV. 1) anon-mvrange The page in volaitle range will be purged only if all of processes marked the range as volatile. If A process calls mvrange and is forked, vrange could be copied from parent to child so not-yet-COWed pages could be purged unless either one of both processes marks NO_VOLATILE explicitly. Of course, COWed page could be purged easily because there is no link any more. Ack. This seems reasonable. 2) file-mvrange A page in volatile range will be purged only if all of processes mapped the page marked it as volatile AND there is no process mapped the page as private. IOW, all of the process mapped the page should map it with shared for purging. So, all of processes should mark each address range in own process context if they want to collaborate with shared mapped file and gaurantee there is no process mapped the range with private. Of course, volatility state will be terminated as the process is gone. This case doesn't seem ideal to me, but is sort of how the current code works to avoid the complexity of dealing with memory volatile ranges that cross page types (file/anonymous). Although the current code just doesn't purge file pages marked with mvrange(). I'd much prefer file-mvrange calls to behave identically to fvrange calls. The important point here is that the kernel doesn't *have* to purge anything ever. Its the kernel's discretion as to which volatile pages to purge when. So its easier for now to simply not purge file pages marked volatile via mvolatile. There however is the inconsistency that file pages marked volatile via fvrange, then are marked non-volatile via mvrange() might still be purged. That is broken in my mind, and still needs to be addressed. The easiest out is probably just to return an error if any of the mvrange calls cover file pages. But I'd really like a better fix. 3) fvrange It's same with 2) but volatility state could be persistent in address_space until someone calls fvrange(NO_VOLATILE). So it could remove the weakness of 2). What do you think about above semantic? I'd still like mvrange() calls on shared mapped files to be stored on the address_space. If you don't have any problem, we could implement it. I think 1) and 2) could be handled with my base code for anon-vrange handling with tweaking file-vrange and need your new patches in address_space for handling 3). I think we can get it sorted out. It might just take a few iterations. thanks -john -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/4] Support vranges on files
On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote: On 04/07/2013 05:46 PM, Minchan Kim wrote: Hello John, As you know, userland people wanted to handle vrange with mmaped pointer rather than fd-based and see the SIGBUS so I thought more about semantic of vrange and want to make it very clear and easy. So I suggest below semantic(Of course, it's not rock solid). mvrange(start_addr, lengh, mode, behavior) It's same with that I suggested lately but different name, just adding prefix m. It's per-process model(ie, mm_struct vrange) so if process is exited, volatility isn't valid any more. It isn't a problem in anonymous but could be in file-vrange so let's introduce fvrange for covering the problem. fvrange(int fd, start_offset, length, mode, behavior) First of all, let's see mvrange with anonymous and file page POV. 1) anon-mvrange The page in volaitle range will be purged only if all of processes marked the range as volatile. If A process calls mvrange and is forked, vrange could be copied from parent to child so not-yet-COWed pages could be purged unless either one of both processes marks NO_VOLATILE explicitly. Of course, COWed page could be purged easily because there is no link any more. Ack. This seems reasonable. 2) file-mvrange A page in volatile range will be purged only if all of processes mapped the page marked it as volatile AND there is no process mapped the page as private. IOW, all of the process mapped the page should map it with shared for purging. So, all of processes should mark each address range in own process context if they want to collaborate with shared mapped file and gaurantee there is no process mapped the range with private. Of course, volatility state will be terminated as the process is gone. This case doesn't seem ideal to me, but is sort of how the current code works to avoid the complexity of dealing with memory volatile ranges that cross page types (file/anonymous). Although the current code just doesn't purge file pages marked with mvrange(). Personally, I don't think it's to avoid the complexity of implemenation. I thought explict declaration volatility on range before using would be more clear for userspace programmer. Otherwise, he can encounter SIGBUS and got confused easily. Frankly speaking, I don't like to remain volatility permanently although relavant processes go away and it could make processs using the file much error-prone and hard to debug it. Anyway, do you agree my suggestion that we should not purge any page if a process are using now with non-shared(ie, private)? I'd much prefer file-mvrange calls to behave identically to fvrange calls. The important point here is that the kernel doesn't *have* to purge anything ever. Its the kernel's discretion as to which volatile pages to purge when. So its easier for now to simply not purge file Right. pages marked volatile via mvolatile. NP but we should write down vague description. User try to use it in file-backed pages and got disappointed, then is reluctant to use it any more. :) I'm not saying that let's write down description implementation specific but want to say them at least new system call can affect anonymous or file or both, at least from the beginning. Just hope. There however is the inconsistency that file pages marked volatile via fvrange, then are marked non-volatile via mvrange() might still be purged. That is broken in my mind, and still needs to be addressed. The easiest out is probably just to return an error if any of the mvrange calls cover file pages. But I'd really like a It needs vma enumeration and mmap_sem read-lock. It could hurt anon-vrange performance severely. better fix. Another idea is that we can move per-mm vrange element to address_space when the process goes away if the element covers file-backd vma. But I'm still very not sure whether we should keep it persistent. 3) fvrange It's same with 2) but volatility state could be persistent in address_space until someone calls fvrange(NO_VOLATILE). So it could remove the weakness of 2). What do you think about above semantic? I'd still like mvrange() calls on shared mapped files to be stored on the address_space. If you don't have any problem, we could implement it. I think 1) and 2) could be handled with my base code for anon-vrange handling with tweaking file-vrange and need your new patches in address_space for handling 3). I think we can get it sorted out. It might just take a few iterations. Sure! -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/4] Support vranges on files
On 04/08/2013 07:18 PM, Minchan Kim wrote: On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote: On 04/07/2013 05:46 PM, Minchan Kim wrote: Hello John, As you know, userland people wanted to handle vrange with mmaped pointer rather than fd-based and see the SIGBUS so I thought more about semantic of vrange and want to make it very clear and easy. So I suggest below semantic(Of course, it's not rock solid). mvrange(start_addr, lengh, mode, behavior) It's same with that I suggested lately but different name, just adding prefix m. It's per-process model(ie, mm_struct vrange) so if process is exited, volatility isn't valid any more. It isn't a problem in anonymous but could be in file-vrange so let's introduce fvrange for covering the problem. fvrange(int fd, start_offset, length, mode, behavior) First of all, let's see mvrange with anonymous and file page POV. 1) anon-mvrange The page in volaitle range will be purged only if all of processes marked the range as volatile. If A process calls mvrange and is forked, vrange could be copied from parent to child so not-yet-COWed pages could be purged unless either one of both processes marks NO_VOLATILE explicitly. Of course, COWed page could be purged easily because there is no link any more. Ack. This seems reasonable. 2) file-mvrange A page in volatile range will be purged only if all of processes mapped the page marked it as volatile AND there is no process mapped the page as private. IOW, all of the process mapped the page should map it with shared for purging. So, all of processes should mark each address range in own process context if they want to collaborate with shared mapped file and gaurantee there is no process mapped the range with private. Of course, volatility state will be terminated as the process is gone. This case doesn't seem ideal to me, but is sort of how the current code works to avoid the complexity of dealing with memory volatile ranges that cross page types (file/anonymous). Although the current code just doesn't purge file pages marked with mvrange(). Personally, I don't think it's to avoid the complexity of implemenation. I thought explict declaration volatility on range before using would be more clear for userspace programmer. Otherwise, he can encounter SIGBUS and got confused easily. Frankly speaking, I don't like to remain volatility permanently although relavant processes go away and it could make processs using the file much error-prone and hard to debug it. So this is maybe is a contentious point we'll have to work out. Maybe could you describe some use cases you envision where someone would want to mark pages volatile on a file that could be accidentally shared? Or how you think the per-mm sense of volatility would be beneficial in those use-cases? The use cases I envision where volatility would be used are when any sharing would be coordinated between processes. Again, that producer/consumer example from before where the empty portion of a very large circular buffer could be made volatile, scaling the actual memory usage to the actual need. And really the same concern would likely apply in the common case when multiple applications mmap (shared) a file, but use fvrange() to mark the data as volatile. This is exactly the use case the Android ashmem interface works for. In that case, once the data is marked volatile, it should remain volatile until someone who has the file open marks it as non-volatile. The only time we clear the volatility is when the file is closed by all users. I think the concern about surprising an application that isn't expecting volatility is odd, since if an application jumped in and punched a hole in the data, that could surprise other applications as well. If you're going to use a file that can be shared, applications have to deal with potential changes to that file by others. To me, the value in using volatile ranges on the file data is exactly because the file data can be shared. So it makes sense to me to have the volatility state be like the data in the file. I guess the only exception in my case is that if all the references to a file are closed, we can clear the volatility (since we don't have a sane way for the volatility to persist past that point). One question that might help resolve this: Would having some sort of volatility checking interface be helpful in easing your concern about applications being surprised by volatility? Anyway, do you agree my suggestion that we should not purge any page if a process are using now with non-shared(ie, private)? Yes, or if we do purge any pages, they should not affect the private mapped pages (in other words, the COW link should be broken - as the backing page has in-effect been written to by purging). I'd much prefer file-mvrange calls to behave identically to fvrange calls. The important point here is that the kernel doesn't *have* to purge anything
Re: [RFC PATCH 0/4] Support vranges on files
On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote: On 04/08/2013 07:18 PM, Minchan Kim wrote: On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote: On 04/07/2013 05:46 PM, Minchan Kim wrote: Hello John, As you know, userland people wanted to handle vrange with mmaped pointer rather than fd-based and see the SIGBUS so I thought more about semantic of vrange and want to make it very clear and easy. So I suggest below semantic(Of course, it's not rock solid). mvrange(start_addr, lengh, mode, behavior) It's same with that I suggested lately but different name, just adding prefix m. It's per-process model(ie, mm_struct vrange) so if process is exited, volatility isn't valid any more. It isn't a problem in anonymous but could be in file-vrange so let's introduce fvrange for covering the problem. fvrange(int fd, start_offset, length, mode, behavior) First of all, let's see mvrange with anonymous and file page POV. 1) anon-mvrange The page in volaitle range will be purged only if all of processes marked the range as volatile. If A process calls mvrange and is forked, vrange could be copied from parent to child so not-yet-COWed pages could be purged unless either one of both processes marks NO_VOLATILE explicitly. Of course, COWed page could be purged easily because there is no link any more. Ack. This seems reasonable. 2) file-mvrange A page in volatile range will be purged only if all of processes mapped the page marked it as volatile AND there is no process mapped the page as private. IOW, all of the process mapped the page should map it with shared for purging. So, all of processes should mark each address range in own process context if they want to collaborate with shared mapped file and gaurantee there is no process mapped the range with private. Of course, volatility state will be terminated as the process is gone. This case doesn't seem ideal to me, but is sort of how the current code works to avoid the complexity of dealing with memory volatile ranges that cross page types (file/anonymous). Although the current code just doesn't purge file pages marked with mvrange(). Personally, I don't think it's to avoid the complexity of implemenation. I thought explict declaration volatility on range before using would be more clear for userspace programmer. Otherwise, he can encounter SIGBUS and got confused easily. Frankly speaking, I don't like to remain volatility permanently although relavant processes go away and it could make processs using the file much error-prone and hard to debug it. So this is maybe is a contentious point we'll have to work out. Maybe could you describe some use cases you envision where someone would want to mark pages volatile on a file that could be accidentally shared? Or how you think the per-mm sense of volatility would be beneficial in those use-cases? My concern point is that following as 1. Process A calls mvrange for file F. 2. Process A is killed by someone or own BUG 3. Process B maps F with shared in his address space 4. Memory pressure happens 5. Process B is killed by SIGBUS but Process B really can't know why he was killed because he can't know anyone who open F except himself. The use cases I envision where volatility would be used are when any sharing would be coordinated between processes. Again, that producer/consumer example from before where the empty portion of a very large circular buffer could be made volatile, scaling the actual memory usage to the actual need. And really the same concern would likely apply in the common case when multiple applications mmap (shared) a file, but use fvrange() to mark the data as volatile. This is exactly the use case the Android ashmem interface works for. In that case, once the data is I don't know Android ashmem interface well but if it works as I mentioned early, I think it's not good interface. marked volatile, it should remain volatile until someone who has the file open marks it as non-volatile. The only time we clear the volatility is when the file is closed by all users. Yes. We need it that clear volatile ranges when the file is closed by ball users. That's what we need and blow my concern out. I think the concern about surprising an application that isn't expecting volatility is odd, since if an application jumped in and punched a hole in the data, that could surprise other applications as well. If you're going to use a file that can be shared, applications have to deal with potential changes to that file by others. True. My concern is delayed punching without any client of fd and there is no interface to detect some range of file is volatile state or not. It means anyone mapped a file with shared could encunter SIGBUS although he try to best effort to check it with lsof before using. To me, the value in using volatile ranges on the file data is exactly because
Re: [RFC PATCH 0/4] Support vranges on files
Hello John, As you know, userland people wanted to handle vrange with mmaped pointer rather than fd-based and see the SIGBUS so I thought more about semantic of vrange and want to make it very clear and easy. So I suggest below semantic(Of course, it's not rock solid). mvrange(start_addr, lengh, mode, behavior) It's same with that I suggested lately but different name, just adding prefix "m". It's per-process model(ie, mm_struct vrange) so if process is exited, "volatility" isn't valid any more. It isn't a problem in anonymous but could be in file-vrange so let's introduce fvrange for covering the problem. fvrange(int fd, start_offset, length, mode, behavior) First of all, let's see mvrange with anonymous and file page POV. 1) anon-mvrange The page in volaitle range will be purged only if all of processes marked the range as volatile. If A process calls mvrange and is forked, vrange could be copied from parent to child so not-yet-COWed pages could be purged unless either one of both processes marks NO_VOLATILE explicitly. Of course, COWed page could be purged easily because there is no link any more. 2) file-mvrange A page in volatile range will be purged only if all of processes mapped the page marked it as volatile AND there is no process mapped the page as "private". IOW, all of the process mapped the page should map it with "shared" for purging. So, all of processes should mark each address range in own process context if they want to collaborate with shared mapped file and gaurantee there is no process mapped the range with "private". Of course, volatility state will be terminated as the process is gone. 3) fvrange It's same with 2) but volatility state could be persistent in address_space until someone calls fvrange(NO_VOLATILE). So it could remove the weakness of 2). What do you think about above semantic? If you don't have any problem, we could implement it. I think 1) and 2) could be handled with my base code for anon-vrange handling with tweaking file-vrange and need your new patches in address_space for handling 3). On Fri, Apr 05, 2013 at 04:55:04PM +0900, Minchan Kim wrote: > Hi John, > > On Thu, Apr 04, 2013 at 10:37:52AM -0700, John Stultz wrote: > > On 04/03/2013 11:55 PM, Minchan Kim wrote: > > >On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote: > > >>Next we introduce a parallel fvrange() syscall for creating > > >>volatile ranges directly against files. > > >Okay. It seems you want to replace ashmem interface with fvrange. > > >I dobut we have to eat a slot for system call. Can't we add "int fd" > > >in vrange systemcall without inventing new wheel? > > > > Sure, that would be doable. I just added the new syscall to make the > > differences in functionality clear. > > Once the subtleties are understood, we can condense things down if > > we think its best. > > Fair enough. > > > > > > > >>And finally, we change the range pruging logic to be able to > > >>handle both anonymous and file volatile ranges. > > >Okay. Then, what's the semantic file-vrange? > > > > > >There is a file F. Process A mapped some part of file into his > > >address space. Then, Process B calls fvrange same part. > > >As I looked over your code, it purges the range although process B > > >is using now. Right? Is it your intention? Maybe isn't. > > > > Not sure if you're example has a type-o and you meant "process A is > > using it"? If so, yes. The point is the volatility is shared and > > consistent across all users of the file, in the same way the data in > > the file is shared. If process B punched a hole in the file, process > > A would see the effect immediately. With volatile ranges, the hole > > punching is just delayed and possibly done later by the kernel, in > > effect on behalf of process B, so the behavior is the same. > > > > Consider the case where we could have two processes mmap a tmpfs > > file in order to create a circular buffer shared between them. You > > could then have a producer/consumer relationship with two processes > > where any data not between the head & tail offsets were marked > > volatile. The producer would mark tail+size non-volatile, write the > > data, and update the tail offset. The consumer would read data from > > the head offset, mark the just-read range as volatile, and update > > the offset. > > > > In this example, the producer would be the only process to mark data > > non-volatile, while the consumer would be the only one marking > > ranges volatile. Thus the state of volatility would need to be an > > attribute of the file, not the process, in the same way the shared > > data is. > > > > Is that clear? > > Yes, I got your point that you meant shared mapping. > Let's enumerate more examples. > > 1. Process A mapped FILE A with MAP_SHARED >Process B mapped FILE A with MAP_SHARED >Process C calls fvrange >Discard all pages of process A and B -> Make sense to me. > > 2. Process A mapped FILE A with
Re: [RFC PATCH 0/4] Support vranges on files
Hello John, As you know, userland people wanted to handle vrange with mmaped pointer rather than fd-based and see the SIGBUS so I thought more about semantic of vrange and want to make it very clear and easy. So I suggest below semantic(Of course, it's not rock solid). mvrange(start_addr, lengh, mode, behavior) It's same with that I suggested lately but different name, just adding prefix m. It's per-process model(ie, mm_struct vrange) so if process is exited, volatility isn't valid any more. It isn't a problem in anonymous but could be in file-vrange so let's introduce fvrange for covering the problem. fvrange(int fd, start_offset, length, mode, behavior) First of all, let's see mvrange with anonymous and file page POV. 1) anon-mvrange The page in volaitle range will be purged only if all of processes marked the range as volatile. If A process calls mvrange and is forked, vrange could be copied from parent to child so not-yet-COWed pages could be purged unless either one of both processes marks NO_VOLATILE explicitly. Of course, COWed page could be purged easily because there is no link any more. 2) file-mvrange A page in volatile range will be purged only if all of processes mapped the page marked it as volatile AND there is no process mapped the page as private. IOW, all of the process mapped the page should map it with shared for purging. So, all of processes should mark each address range in own process context if they want to collaborate with shared mapped file and gaurantee there is no process mapped the range with private. Of course, volatility state will be terminated as the process is gone. 3) fvrange It's same with 2) but volatility state could be persistent in address_space until someone calls fvrange(NO_VOLATILE). So it could remove the weakness of 2). What do you think about above semantic? If you don't have any problem, we could implement it. I think 1) and 2) could be handled with my base code for anon-vrange handling with tweaking file-vrange and need your new patches in address_space for handling 3). On Fri, Apr 05, 2013 at 04:55:04PM +0900, Minchan Kim wrote: Hi John, On Thu, Apr 04, 2013 at 10:37:52AM -0700, John Stultz wrote: On 04/03/2013 11:55 PM, Minchan Kim wrote: On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote: Next we introduce a parallel fvrange() syscall for creating volatile ranges directly against files. Okay. It seems you want to replace ashmem interface with fvrange. I dobut we have to eat a slot for system call. Can't we add int fd in vrange systemcall without inventing new wheel? Sure, that would be doable. I just added the new syscall to make the differences in functionality clear. Once the subtleties are understood, we can condense things down if we think its best. Fair enough. And finally, we change the range pruging logic to be able to handle both anonymous and file volatile ranges. Okay. Then, what's the semantic file-vrange? There is a file F. Process A mapped some part of file into his address space. Then, Process B calls fvrange same part. As I looked over your code, it purges the range although process B is using now. Right? Is it your intention? Maybe isn't. Not sure if you're example has a type-o and you meant process A is using it? If so, yes. The point is the volatility is shared and consistent across all users of the file, in the same way the data in the file is shared. If process B punched a hole in the file, process A would see the effect immediately. With volatile ranges, the hole punching is just delayed and possibly done later by the kernel, in effect on behalf of process B, so the behavior is the same. Consider the case where we could have two processes mmap a tmpfs file in order to create a circular buffer shared between them. You could then have a producer/consumer relationship with two processes where any data not between the head tail offsets were marked volatile. The producer would mark tail+size non-volatile, write the data, and update the tail offset. The consumer would read data from the head offset, mark the just-read range as volatile, and update the offset. In this example, the producer would be the only process to mark data non-volatile, while the consumer would be the only one marking ranges volatile. Thus the state of volatility would need to be an attribute of the file, not the process, in the same way the shared data is. Is that clear? Yes, I got your point that you meant shared mapping. Let's enumerate more examples. 1. Process A mapped FILE A with MAP_SHARED Process B mapped FILE A with MAP_SHARED Process C calls fvrange Discard all pages of process A and B - Make sense to me. 2. Process A mapped FILE A with MAP_PRIVATE and is using it with read-only Process B mapped FILE A with MAP_PRIVATE and is using it with write-only Process C calls fvrange
Re: [RFC PATCH 0/4] Support vranges on files
Hi John, On Thu, Apr 04, 2013 at 10:37:52AM -0700, John Stultz wrote: > On 04/03/2013 11:55 PM, Minchan Kim wrote: > >On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote: > >>Next we introduce a parallel fvrange() syscall for creating > >>volatile ranges directly against files. > >Okay. It seems you want to replace ashmem interface with fvrange. > >I dobut we have to eat a slot for system call. Can't we add "int fd" > >in vrange systemcall without inventing new wheel? > > Sure, that would be doable. I just added the new syscall to make the > differences in functionality clear. > Once the subtleties are understood, we can condense things down if > we think its best. Fair enough. > > > >>And finally, we change the range pruging logic to be able to > >>handle both anonymous and file volatile ranges. > >Okay. Then, what's the semantic file-vrange? > > > >There is a file F. Process A mapped some part of file into his > >address space. Then, Process B calls fvrange same part. > >As I looked over your code, it purges the range although process B > >is using now. Right? Is it your intention? Maybe isn't. > > Not sure if you're example has a type-o and you meant "process A is > using it"? If so, yes. The point is the volatility is shared and > consistent across all users of the file, in the same way the data in > the file is shared. If process B punched a hole in the file, process > A would see the effect immediately. With volatile ranges, the hole > punching is just delayed and possibly done later by the kernel, in > effect on behalf of process B, so the behavior is the same. > > Consider the case where we could have two processes mmap a tmpfs > file in order to create a circular buffer shared between them. You > could then have a producer/consumer relationship with two processes > where any data not between the head & tail offsets were marked > volatile. The producer would mark tail+size non-volatile, write the > data, and update the tail offset. The consumer would read data from > the head offset, mark the just-read range as volatile, and update > the offset. > > In this example, the producer would be the only process to mark data > non-volatile, while the consumer would be the only one marking > ranges volatile. Thus the state of volatility would need to be an > attribute of the file, not the process, in the same way the shared > data is. > > Is that clear? Yes, I got your point that you meant shared mapping. Let's enumerate more examples. 1. Process A mapped FILE A with MAP_SHARED Process B mapped FILE A with MAP_SHARED Process C calls fvrange Discard all pages of process A and B -> Make sense to me. 2. Process A mapped FILE A with MAP_PRIVATE and is using it with read-only Process B mapped FILE A with MAP_PRIVATE and is using it with write-only Process C calls fvrange What does it happens? I expect process A lost all pages while process B keeps COWed pages. 3. Process A mapped FILE A with MAP_PRIVATE and is using it with read/write Process C calls fvrange Some pages non-COWed in process A are lost while some pages COWed are kept. Mixing. Above all are your intention? It would be very clear if you should have wrote down semantic you intent about private mapped file and shared mapped file. ;-) > > > > >Let's define fvrange's semantic same with anon-vrange. > >If there is a process using range with non-volatile, at least, > >we shouldn't purge at all. > > So this I'm not in agreement with. I got your point. > > Anonymous pages are for the most part not shared, except via COW. > And for the COW case, yes, I agree, we shouldn't purge those pages. > > Similarly (and I have yet to handle this in the code), for private > mapped files, those pages shouldn't be purged either (or purging > them shouldn't affect the private mapped pages - not sure which > direction to go here). Yeb. It's questionable. It seems fallocate for punch hole removes non-COWed pages although they are mapped privately if I didn't miss something to read code. If I was right, it looks very strange to me. COWed pages remain in memory while NOT-YET-COWed pages are discarded. :( Ho, Hmm. > > But for shared mapped files, we need to keep the volatility state > shared as well. > > > >>Now there are some quirks still to be resolved with the approach > >>used here. The biggest one being the vrange() call can't be used to > >>create volatile ranges against mmapped files. Instead only the > >Why? > > As explained above, the volatility is shared like the data. The > current vrange() code creates per-mm volatile ranges, which aren't > shared. Strictly speaking, we can do it by only per-mm volatile range, I think. But the concern if we choose the approach is that what you mention in below is we have to iterate all process's mm_sturct to check in system call context. Of course, I don't like it and too bad design. > > > > > >>fvrange() can be used to create file backed volatile ranges. >
Re: [RFC PATCH 0/4] Support vranges on files
Hi John, On Thu, Apr 04, 2013 at 10:37:52AM -0700, John Stultz wrote: On 04/03/2013 11:55 PM, Minchan Kim wrote: On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote: Next we introduce a parallel fvrange() syscall for creating volatile ranges directly against files. Okay. It seems you want to replace ashmem interface with fvrange. I dobut we have to eat a slot for system call. Can't we add int fd in vrange systemcall without inventing new wheel? Sure, that would be doable. I just added the new syscall to make the differences in functionality clear. Once the subtleties are understood, we can condense things down if we think its best. Fair enough. And finally, we change the range pruging logic to be able to handle both anonymous and file volatile ranges. Okay. Then, what's the semantic file-vrange? There is a file F. Process A mapped some part of file into his address space. Then, Process B calls fvrange same part. As I looked over your code, it purges the range although process B is using now. Right? Is it your intention? Maybe isn't. Not sure if you're example has a type-o and you meant process A is using it? If so, yes. The point is the volatility is shared and consistent across all users of the file, in the same way the data in the file is shared. If process B punched a hole in the file, process A would see the effect immediately. With volatile ranges, the hole punching is just delayed and possibly done later by the kernel, in effect on behalf of process B, so the behavior is the same. Consider the case where we could have two processes mmap a tmpfs file in order to create a circular buffer shared between them. You could then have a producer/consumer relationship with two processes where any data not between the head tail offsets were marked volatile. The producer would mark tail+size non-volatile, write the data, and update the tail offset. The consumer would read data from the head offset, mark the just-read range as volatile, and update the offset. In this example, the producer would be the only process to mark data non-volatile, while the consumer would be the only one marking ranges volatile. Thus the state of volatility would need to be an attribute of the file, not the process, in the same way the shared data is. Is that clear? Yes, I got your point that you meant shared mapping. Let's enumerate more examples. 1. Process A mapped FILE A with MAP_SHARED Process B mapped FILE A with MAP_SHARED Process C calls fvrange Discard all pages of process A and B - Make sense to me. 2. Process A mapped FILE A with MAP_PRIVATE and is using it with read-only Process B mapped FILE A with MAP_PRIVATE and is using it with write-only Process C calls fvrange What does it happens? I expect process A lost all pages while process B keeps COWed pages. 3. Process A mapped FILE A with MAP_PRIVATE and is using it with read/write Process C calls fvrange Some pages non-COWed in process A are lost while some pages COWed are kept. Mixing. Above all are your intention? It would be very clear if you should have wrote down semantic you intent about private mapped file and shared mapped file. ;-) Let's define fvrange's semantic same with anon-vrange. If there is a process using range with non-volatile, at least, we shouldn't purge at all. So this I'm not in agreement with. I got your point. Anonymous pages are for the most part not shared, except via COW. And for the COW case, yes, I agree, we shouldn't purge those pages. Similarly (and I have yet to handle this in the code), for private mapped files, those pages shouldn't be purged either (or purging them shouldn't affect the private mapped pages - not sure which direction to go here). Yeb. It's questionable. It seems fallocate for punch hole removes non-COWed pages although they are mapped privately if I didn't miss something to read code. If I was right, it looks very strange to me. COWed pages remain in memory while NOT-YET-COWed pages are discarded. :( Ho, Hmm. But for shared mapped files, we need to keep the volatility state shared as well. Now there are some quirks still to be resolved with the approach used here. The biggest one being the vrange() call can't be used to create volatile ranges against mmapped files. Instead only the Why? As explained above, the volatility is shared like the data. The current vrange() code creates per-mm volatile ranges, which aren't shared. Strictly speaking, we can do it by only per-mm volatile range, I think. But the concern if we choose the approach is that what you mention in below is we have to iterate all process's mm_sturct to check in system call context. Of course, I don't like it and too bad design. fvrange() can be used to create file backed volatile ranges. I could't understand your point. It would be better to explain my thought firstly then, you could point out something
Re: [RFC PATCH 0/4] Support vranges on files
On 04/03/2013 11:55 PM, Minchan Kim wrote: On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote: Next we introduce a parallel fvrange() syscall for creating volatile ranges directly against files. Okay. It seems you want to replace ashmem interface with fvrange. I dobut we have to eat a slot for system call. Can't we add "int fd" in vrange systemcall without inventing new wheel? Sure, that would be doable. I just added the new syscall to make the differences in functionality clear. Once the subtleties are understood, we can condense things down if we think its best. And finally, we change the range pruging logic to be able to handle both anonymous and file volatile ranges. Okay. Then, what's the semantic file-vrange? There is a file F. Process A mapped some part of file into his address space. Then, Process B calls fvrange same part. As I looked over your code, it purges the range although process B is using now. Right? Is it your intention? Maybe isn't. Not sure if you're example has a type-o and you meant "process A is using it"? If so, yes. The point is the volatility is shared and consistent across all users of the file, in the same way the data in the file is shared. If process B punched a hole in the file, process A would see the effect immediately. With volatile ranges, the hole punching is just delayed and possibly done later by the kernel, in effect on behalf of process B, so the behavior is the same. Consider the case where we could have two processes mmap a tmpfs file in order to create a circular buffer shared between them. You could then have a producer/consumer relationship with two processes where any data not between the head & tail offsets were marked volatile. The producer would mark tail+size non-volatile, write the data, and update the tail offset. The consumer would read data from the head offset, mark the just-read range as volatile, and update the offset. In this example, the producer would be the only process to mark data non-volatile, while the consumer would be the only one marking ranges volatile. Thus the state of volatility would need to be an attribute of the file, not the process, in the same way the shared data is. Is that clear? Let's define fvrange's semantic same with anon-vrange. If there is a process using range with non-volatile, at least, we shouldn't purge at all. So this I'm not in agreement with. Anonymous pages are for the most part not shared, except via COW. And for the COW case, yes, I agree, we shouldn't purge those pages. Similarly (and I have yet to handle this in the code), for private mapped files, those pages shouldn't be purged either (or purging them shouldn't affect the private mapped pages - not sure which direction to go here). But for shared mapped files, we need to keep the volatility state shared as well. Now there are some quirks still to be resolved with the approach used here. The biggest one being the vrange() call can't be used to create volatile ranges against mmapped files. Instead only the Why? As explained above, the volatility is shared like the data. The current vrange() code creates per-mm volatile ranges, which aren't shared. fvrange() can be used to create file backed volatile ranges. I could't understand your point. It would be better to explain my thought firstly then, you could point out something I am missing now. Look below. This could be overcome by iterating across all the process VMAs to determine if they're anonymous or file based, and if file-based, create a VMA sized volatile range on the mapping pointed to by the VMA. It needs just when we start to discard pages. Simply, it is related to reclaim path, NOT system call path so it's not a problem. The reason we can't defer this to only the reclaim path is if volatile ranges on shared mappings are stored in the mm_struct, if process A sets up a volatile range on a shared mapping, but stores the volatility in its own mm, then process B wants to clear the volatility on the range, process B would have to iterate over all processes that have those file vmas mapped and change them. Additionally if process A sets up a volatile range on a shared mapped file, then quits, the volatility state dies with that process. Either way, its not just a simple matter of handling data on your own mm_struct. That's fine for the process' own anonymous memory, but doesn't work for shared file mappings. But this would have downsides, as Minchan has been clear that he wants to optmize the vrange() calls so that it is very cheap to create and destroy volatile ranges. Having simple per-process ranges be created means we don't have to iterate across the vmas in the range to determine if they're anonymous or file backed. Instead the current vrange() code just creates per process ranges (which may or may not cover mmapped file data), but will only purge anonymous pages in that range. This keeps the vrange()
Re: [RFC PATCH 0/4] Support vranges on files
Hey John, First of all, I should confess I just glanced your code and poped several questions. If I miss something, please slap me. On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote: > This patchset is against Minchan's vrange work here: > https://lkml.org/lkml/2013/3/12/105 > > Extending it to support volatile ranges on files. In effect > providing the same functionality of my earlier file based > volatile range patches on-top of Minchan's anonymous volatile > range work. > > Volatile ranges on files are different then on anonymous memory, > because the volatility state can be shared between multiple > applications. This makes storing the volatile ranges exclusively > in the mm_struct (or in vmas as in Minchan's earlier work) > inappropriate. > > The patchset starts with some minor cleanup. > > Then we introduce the idea of a vrange_root, which provides a > interval-tree root and a lock to protect the tree. This structure > can then be stored in the mm_struct or in an addres_space. Then the > same infrastructure can be used to manage volatile ranges on both > anonymous and file backed memory. Thanks for the above two patches. It is a nice cleanup. > > Next we introduce a parallel fvrange() syscall for creating > volatile ranges directly against files. Okay. It seems you want to replace ashmem interface with fvrange. I dobut we have to eat a slot for system call. Can't we add "int fd" in vrange systemcall without inventing new wheel? > > And finally, we change the range pruging logic to be able to > handle both anonymous and file volatile ranges. Okay. Then, what's the semantic file-vrange? There is a file F. Process A mapped some part of file into his address space. Then, Process B calls fvrange same part. As I looked over your code, it purges the range although process B is using now. Right? Is it your intention? Maybe isn't. Let's define fvrange's semantic same with anon-vrange. If there is a process using range with non-volatile, at least, we shouldn't purge at all. So your [4/4] should investigate all processes mapped the page atomically. You could do it with i_mmap_mutex and vrange_lock and percolate the logic into try_to_discard_vpage. > > Now there are some quirks still to be resolved with the approach > used here. The biggest one being the vrange() call can't be used to > create volatile ranges against mmapped files. Instead only the Why? > fvrange() can be used to create file backed volatile ranges. I could't understand your point. It would be better to explain my thought firstly then, you could point out something I am missing now. Look below. > > This could be overcome by iterating across all the process VMAs to > determine if they're anonymous or file based, and if file-based, > create a VMA sized volatile range on the mapping pointed to by the > VMA. It needs just when we start to discard pages. Simply, it is related to reclaim path, NOT system call path so it's not a problem. > > But this would have downsides, as Minchan has been clear that he wants > to optmize the vrange() calls so that it is very cheap to create and > destroy volatile ranges. Having simple per-process ranges be created > means we don't have to iterate across the vmas in the range to > determine if they're anonymous or file backed. Instead the current > vrange() code just creates per process ranges (which may or may not > cover mmapped file data), but will only purge anonymous pages in > that range. This keeps the vrange() call cheap. Right. > > Additionally, just creating or destroying a single range is very > simple to do, and requires a fixed amount of memory known up front. > Thus we can allocate needed data prior to making any modifications. > > But If we were to create a range that crosses anonymous and file > backed pages, it must create or destroy multiple per-process or > per-file ranges. This could require an unknown number of allocations, This is a part I can fail to parse your opinion. > opening the possibility of getting an ENOMEM half-way through the > operation, leaving the volatile range partially created or destroyed. > > So to keep this simple for this first pass, for now we have two > syscalls for two types of volatile ranges. My idea is following as vrange(fd, start, len, mode, behavior) A) fd = 0 1) system call context - vrange system call registers new vrange in mm_struct. 2) Add new vrange into LRU 3) reclaim context - walk with rmap to confirm all processes make the range with volatile -> discard B) fd = 1 1) system call context - vrange system call registers new vrange in address_space 2) Add new vrange into LRU 3) reclaim context - walk with rmap to confirm all processes make the range with volatile -> discard What's the problem in this logic? > > Let me know if you have any thoughts or comments. I'm sure there's > plenty of room for improvement here. > > In the meantime I'll be playing with some different approaches
Re: [RFC PATCH 0/4] Support vranges on files
Hey John, First of all, I should confess I just glanced your code and poped several questions. If I miss something, please slap me. On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote: This patchset is against Minchan's vrange work here: https://lkml.org/lkml/2013/3/12/105 Extending it to support volatile ranges on files. In effect providing the same functionality of my earlier file based volatile range patches on-top of Minchan's anonymous volatile range work. Volatile ranges on files are different then on anonymous memory, because the volatility state can be shared between multiple applications. This makes storing the volatile ranges exclusively in the mm_struct (or in vmas as in Minchan's earlier work) inappropriate. The patchset starts with some minor cleanup. Then we introduce the idea of a vrange_root, which provides a interval-tree root and a lock to protect the tree. This structure can then be stored in the mm_struct or in an addres_space. Then the same infrastructure can be used to manage volatile ranges on both anonymous and file backed memory. Thanks for the above two patches. It is a nice cleanup. Next we introduce a parallel fvrange() syscall for creating volatile ranges directly against files. Okay. It seems you want to replace ashmem interface with fvrange. I dobut we have to eat a slot for system call. Can't we add int fd in vrange systemcall without inventing new wheel? And finally, we change the range pruging logic to be able to handle both anonymous and file volatile ranges. Okay. Then, what's the semantic file-vrange? There is a file F. Process A mapped some part of file into his address space. Then, Process B calls fvrange same part. As I looked over your code, it purges the range although process B is using now. Right? Is it your intention? Maybe isn't. Let's define fvrange's semantic same with anon-vrange. If there is a process using range with non-volatile, at least, we shouldn't purge at all. So your [4/4] should investigate all processes mapped the page atomically. You could do it with i_mmap_mutex and vrange_lock and percolate the logic into try_to_discard_vpage. Now there are some quirks still to be resolved with the approach used here. The biggest one being the vrange() call can't be used to create volatile ranges against mmapped files. Instead only the Why? fvrange() can be used to create file backed volatile ranges. I could't understand your point. It would be better to explain my thought firstly then, you could point out something I am missing now. Look below. This could be overcome by iterating across all the process VMAs to determine if they're anonymous or file based, and if file-based, create a VMA sized volatile range on the mapping pointed to by the VMA. It needs just when we start to discard pages. Simply, it is related to reclaim path, NOT system call path so it's not a problem. But this would have downsides, as Minchan has been clear that he wants to optmize the vrange() calls so that it is very cheap to create and destroy volatile ranges. Having simple per-process ranges be created means we don't have to iterate across the vmas in the range to determine if they're anonymous or file backed. Instead the current vrange() code just creates per process ranges (which may or may not cover mmapped file data), but will only purge anonymous pages in that range. This keeps the vrange() call cheap. Right. Additionally, just creating or destroying a single range is very simple to do, and requires a fixed amount of memory known up front. Thus we can allocate needed data prior to making any modifications. But If we were to create a range that crosses anonymous and file backed pages, it must create or destroy multiple per-process or per-file ranges. This could require an unknown number of allocations, This is a part I can fail to parse your opinion. opening the possibility of getting an ENOMEM half-way through the operation, leaving the volatile range partially created or destroyed. So to keep this simple for this first pass, for now we have two syscalls for two types of volatile ranges. My idea is following as vrange(fd, start, len, mode, behavior) A) fd = 0 1) system call context - vrange system call registers new vrange in mm_struct. 2) Add new vrange into LRU 3) reclaim context - walk with rmap to confirm all processes make the range with volatile - discard B) fd = 1 1) system call context - vrange system call registers new vrange in address_space 2) Add new vrange into LRU 3) reclaim context - walk with rmap to confirm all processes make the range with volatile - discard What's the problem in this logic? Let me know if you have any thoughts or comments. I'm sure there's plenty of room for improvement here. In the meantime I'll be playing with some different approaches to try to handle single volatile ranges that cross file and
Re: [RFC PATCH 0/4] Support vranges on files
On 04/03/2013 11:55 PM, Minchan Kim wrote: On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote: Next we introduce a parallel fvrange() syscall for creating volatile ranges directly against files. Okay. It seems you want to replace ashmem interface with fvrange. I dobut we have to eat a slot for system call. Can't we add int fd in vrange systemcall without inventing new wheel? Sure, that would be doable. I just added the new syscall to make the differences in functionality clear. Once the subtleties are understood, we can condense things down if we think its best. And finally, we change the range pruging logic to be able to handle both anonymous and file volatile ranges. Okay. Then, what's the semantic file-vrange? There is a file F. Process A mapped some part of file into his address space. Then, Process B calls fvrange same part. As I looked over your code, it purges the range although process B is using now. Right? Is it your intention? Maybe isn't. Not sure if you're example has a type-o and you meant process A is using it? If so, yes. The point is the volatility is shared and consistent across all users of the file, in the same way the data in the file is shared. If process B punched a hole in the file, process A would see the effect immediately. With volatile ranges, the hole punching is just delayed and possibly done later by the kernel, in effect on behalf of process B, so the behavior is the same. Consider the case where we could have two processes mmap a tmpfs file in order to create a circular buffer shared between them. You could then have a producer/consumer relationship with two processes where any data not between the head tail offsets were marked volatile. The producer would mark tail+size non-volatile, write the data, and update the tail offset. The consumer would read data from the head offset, mark the just-read range as volatile, and update the offset. In this example, the producer would be the only process to mark data non-volatile, while the consumer would be the only one marking ranges volatile. Thus the state of volatility would need to be an attribute of the file, not the process, in the same way the shared data is. Is that clear? Let's define fvrange's semantic same with anon-vrange. If there is a process using range with non-volatile, at least, we shouldn't purge at all. So this I'm not in agreement with. Anonymous pages are for the most part not shared, except via COW. And for the COW case, yes, I agree, we shouldn't purge those pages. Similarly (and I have yet to handle this in the code), for private mapped files, those pages shouldn't be purged either (or purging them shouldn't affect the private mapped pages - not sure which direction to go here). But for shared mapped files, we need to keep the volatility state shared as well. Now there are some quirks still to be resolved with the approach used here. The biggest one being the vrange() call can't be used to create volatile ranges against mmapped files. Instead only the Why? As explained above, the volatility is shared like the data. The current vrange() code creates per-mm volatile ranges, which aren't shared. fvrange() can be used to create file backed volatile ranges. I could't understand your point. It would be better to explain my thought firstly then, you could point out something I am missing now. Look below. This could be overcome by iterating across all the process VMAs to determine if they're anonymous or file based, and if file-based, create a VMA sized volatile range on the mapping pointed to by the VMA. It needs just when we start to discard pages. Simply, it is related to reclaim path, NOT system call path so it's not a problem. The reason we can't defer this to only the reclaim path is if volatile ranges on shared mappings are stored in the mm_struct, if process A sets up a volatile range on a shared mapping, but stores the volatility in its own mm, then process B wants to clear the volatility on the range, process B would have to iterate over all processes that have those file vmas mapped and change them. Additionally if process A sets up a volatile range on a shared mapped file, then quits, the volatility state dies with that process. Either way, its not just a simple matter of handling data on your own mm_struct. That's fine for the process' own anonymous memory, but doesn't work for shared file mappings. But this would have downsides, as Minchan has been clear that he wants to optmize the vrange() calls so that it is very cheap to create and destroy volatile ranges. Having simple per-process ranges be created means we don't have to iterate across the vmas in the range to determine if they're anonymous or file backed. Instead the current vrange() code just creates per process ranges (which may or may not cover mmapped file data), but will only purge anonymous pages in that range. This keeps the vrange() call
[RFC PATCH 0/4] Support vranges on files
This patchset is against Minchan's vrange work here: https://lkml.org/lkml/2013/3/12/105 Extending it to support volatile ranges on files. In effect providing the same functionality of my earlier file based volatile range patches on-top of Minchan's anonymous volatile range work. Volatile ranges on files are different then on anonymous memory, because the volatility state can be shared between multiple applications. This makes storing the volatile ranges exclusively in the mm_struct (or in vmas as in Minchan's earlier work) inappropriate. The patchset starts with some minor cleanup. Then we introduce the idea of a vrange_root, which provides a interval-tree root and a lock to protect the tree. This structure can then be stored in the mm_struct or in an addres_space. Then the same infrastructure can be used to manage volatile ranges on both anonymous and file backed memory. Next we introduce a parallel fvrange() syscall for creating volatile ranges directly against files. And finally, we change the range pruging logic to be able to handle both anonymous and file volatile ranges. Now there are some quirks still to be resolved with the approach used here. The biggest one being the vrange() call can't be used to create volatile ranges against mmapped files. Instead only the fvrange() can be used to create file backed volatile ranges. This could be overcome by iterating across all the process VMAs to determine if they're anonymous or file based, and if file-based, create a VMA sized volatile range on the mapping pointed to by the VMA. But this would have downsides, as Minchan has been clear that he wants to optmize the vrange() calls so that it is very cheap to create and destroy volatile ranges. Having simple per-process ranges be created means we don't have to iterate across the vmas in the range to determine if they're anonymous or file backed. Instead the current vrange() code just creates per process ranges (which may or may not cover mmapped file data), but will only purge anonymous pages in that range. This keeps the vrange() call cheap. Additionally, just creating or destroying a single range is very simple to do, and requires a fixed amount of memory known up front. Thus we can allocate needed data prior to making any modifications. But If we were to create a range that crosses anonymous and file backed pages, it must create or destroy multiple per-process or per-file ranges. This could require an unknown number of allocations, opening the possibility of getting an ENOMEM half-way through the operation, leaving the volatile range partially created or destroyed. So to keep this simple for this first pass, for now we have two syscalls for two types of volatile ranges. Let me know if you have any thoughts or comments. I'm sure there's plenty of room for improvement here. In the meantime I'll be playing with some different approaches to try to handle single volatile ranges that cross file and anonymous vmas. The entire queue, both Minchan's changes and mine can be found here: git://git.linaro.org/people/jstultz/android-dev.git dev/vrange-minchan thanks -john Cc: linux...@kvack.org Cc: Michael Kerrisk Cc: Arun Sharma Cc: Mel Gorman Cc: Hugh Dickins Cc: Dave Hansen Cc: Rik van Riel Cc: Neil Brown Cc: Mike Hommey Cc: Taras Glek Cc: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki Cc: Jason Evans Cc: san...@google.com Cc: Paul Turner Cc: Johannes Weiner Cc: Michel Lespinasse Cc: Andrew Morton Cc: Minchan Kim John Stultz (4): vrange: Make various vrange.c local functions static vrange: Introduce vrange_root to make vrange structures more flexible vrange: Support fvrange() syscall for file based volatile ranges vrange: Enable purging of file backed volatile ranges arch/x86/syscalls/syscall_64.tbl |1 + fs/file_table.c |5 + fs/inode.c |2 + fs/proc/task_mmu.c | 10 +- include/linux/fs.h |2 + include/linux/mm_types.h |4 +- include/linux/vrange.h | 60 --- include/linux/vrange_types.h | 22 +++ kernel/fork.c|2 +- mm/vrange.c | 334 ++ 10 files changed, 308 insertions(+), 134 deletions(-) create mode 100644 include/linux/vrange_types.h -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/4] Support vranges on files
This patchset is against Minchan's vrange work here: https://lkml.org/lkml/2013/3/12/105 Extending it to support volatile ranges on files. In effect providing the same functionality of my earlier file based volatile range patches on-top of Minchan's anonymous volatile range work. Volatile ranges on files are different then on anonymous memory, because the volatility state can be shared between multiple applications. This makes storing the volatile ranges exclusively in the mm_struct (or in vmas as in Minchan's earlier work) inappropriate. The patchset starts with some minor cleanup. Then we introduce the idea of a vrange_root, which provides a interval-tree root and a lock to protect the tree. This structure can then be stored in the mm_struct or in an addres_space. Then the same infrastructure can be used to manage volatile ranges on both anonymous and file backed memory. Next we introduce a parallel fvrange() syscall for creating volatile ranges directly against files. And finally, we change the range pruging logic to be able to handle both anonymous and file volatile ranges. Now there are some quirks still to be resolved with the approach used here. The biggest one being the vrange() call can't be used to create volatile ranges against mmapped files. Instead only the fvrange() can be used to create file backed volatile ranges. This could be overcome by iterating across all the process VMAs to determine if they're anonymous or file based, and if file-based, create a VMA sized volatile range on the mapping pointed to by the VMA. But this would have downsides, as Minchan has been clear that he wants to optmize the vrange() calls so that it is very cheap to create and destroy volatile ranges. Having simple per-process ranges be created means we don't have to iterate across the vmas in the range to determine if they're anonymous or file backed. Instead the current vrange() code just creates per process ranges (which may or may not cover mmapped file data), but will only purge anonymous pages in that range. This keeps the vrange() call cheap. Additionally, just creating or destroying a single range is very simple to do, and requires a fixed amount of memory known up front. Thus we can allocate needed data prior to making any modifications. But If we were to create a range that crosses anonymous and file backed pages, it must create or destroy multiple per-process or per-file ranges. This could require an unknown number of allocations, opening the possibility of getting an ENOMEM half-way through the operation, leaving the volatile range partially created or destroyed. So to keep this simple for this first pass, for now we have two syscalls for two types of volatile ranges. Let me know if you have any thoughts or comments. I'm sure there's plenty of room for improvement here. In the meantime I'll be playing with some different approaches to try to handle single volatile ranges that cross file and anonymous vmas. The entire queue, both Minchan's changes and mine can be found here: git://git.linaro.org/people/jstultz/android-dev.git dev/vrange-minchan thanks -john Cc: linux...@kvack.org Cc: Michael Kerrisk mtk.manpa...@gmail.com Cc: Arun Sharma asha...@fb.com Cc: Mel Gorman m...@csn.ul.ie Cc: Hugh Dickins hu...@google.com Cc: Dave Hansen d...@sr71.net Cc: Rik van Riel r...@redhat.com Cc: Neil Brown ne...@suse.de Cc: Mike Hommey m...@glandium.org Cc: Taras Glek tg...@mozilla.com Cc: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Cc: Jason Evans j...@fb.com Cc: san...@google.com Cc: Paul Turner p...@google.com Cc: Johannes Weiner han...@cmpxchg.org Cc: Michel Lespinasse wal...@google.com Cc: Andrew Morton a...@linux-foundation.org Cc: Minchan Kim minc...@kernel.org John Stultz (4): vrange: Make various vrange.c local functions static vrange: Introduce vrange_root to make vrange structures more flexible vrange: Support fvrange() syscall for file based volatile ranges vrange: Enable purging of file backed volatile ranges arch/x86/syscalls/syscall_64.tbl |1 + fs/file_table.c |5 + fs/inode.c |2 + fs/proc/task_mmu.c | 10 +- include/linux/fs.h |2 + include/linux/mm_types.h |4 +- include/linux/vrange.h | 60 --- include/linux/vrange_types.h | 22 +++ kernel/fork.c|2 +- mm/vrange.c | 334 ++ 10 files changed, 308 insertions(+), 134 deletions(-) create mode 100644 include/linux/vrange_types.h -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/