Re: [RFC 0/3] Pin page control subsystem
On Thu, 15 Aug 2013, Minchan Kim wrote: > Now mlock pages could be migrated in case of CMA so I think it's not a > big problem to migrate it for other cases. > I remember You and Peter argued what's the mlock semainc of pin POV > and as I remember correctly, Peter said mlock doesn't mean pin so > we could migrate it but you didn't agree. Right? mlock means it can be migrated. Pinning is currently done by increasing the page count. Migration will be attempted but it will fail since the references cannot be all removed. Peter proposed that mlock would work like pinning so that a migration of the page would not be attempted. My concern is not only about migration but about a general way of pinning pages. Having mlock and pinning with different semantics is already an issue as the conversation with Peter brought out. Now we are adding yet another way that pinning is used. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Thu, 15 Aug 2013, Minchan Kim wrote: Now mlock pages could be migrated in case of CMA so I think it's not a big problem to migrate it for other cases. I remember You and Peter argued what's the mlock semainc of pin POV and as I remember correctly, Peter said mlock doesn't mean pin so we could migrate it but you didn't agree. Right? mlock means it can be migrated. Pinning is currently done by increasing the page count. Migration will be attempted but it will fail since the references cannot be all removed. Peter proposed that mlock would work like pinning so that a migration of the page would not be attempted. My concern is not only about migration but about a general way of pinning pages. Having mlock and pinning with different semantics is already an issue as the conversation with Peter brought out. Now we are adding yet another way that pinning is used. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hey Christoph, On Wed, Aug 14, 2013 at 04:58:36PM +, Christoph Lameter wrote: > On Thu, 15 Aug 2013, Minchan Kim wrote: > > > When I look API of mmu_notifier, it has mm_struct so I guess it works > > for only user process. Right? > > Correct. A process must have mapped the pages. If you can get a > kernel "process" to work then that process could map the pages. > > > If so, I need to register it without user conext because zram, zswap > > and zcache works for only kernel side. > > Hmmm... Ok but that now gets the complexity of page pinnning up to a very > weird level. Is there some way we can have a common way to deal with the > various ways that pinning is needed? Just off the top of my head (I may > miss some use cases) we have > > 1. mlock from user space Now mlock pages could be migrated in case of CMA so I think it's not a big problem to migrate it for other cases. I remember You and Peter argued what's the mlock semainc of pin POV and as I remember correctly, Peter said mlock doesn't mean pin so we could migrate it but you didn't agree. Right? Anyway, it's off-topic but technically, it's not a problem. > 2. page pinning for reclaim Reclaiming pin a page for a while. Of course, "for a while" means rather vague so it could mean it's really long for someone but really short for others. But at least, reclaim pin should be short and we should try it if it's not ture. > 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem) It's one of big concerns for me. Even several drviers might be able to pin a page same time. But normally most of drvier can know he will pin a page long time or short time so if it want to pin a page long time like aio or some GPU driver for zero-coyp, it should use pinpage control subsystem to release pin pages when VM ask. > 4. Page pinning for low latency operations I have no idea but I guess most of them pin a page during short time? Otherwise, they should use pinpage control subsystem, too. > 5. Page pinning for migration It's like 2. migration pin should be short. > 6. Page pinning for the perf buffers. I'm not familiar with that but my gut feeling is it will pin pages for a long time so it should use pinpage control subsystem. > 7. Page pinning for cross system access (XPMEM, GRU SGI) If it's really long pin, it should use pinpage control subsystem. > > Now we have another subsystem wanting different semantics of pinning. Is > there any way we can come up with a pinning mechanism that fits all use > cases, that is easyly understandable and maintainable? I agree it's not easy but we should go that way rather than adding ad-hoc subsystem specific implementaion. If we allow subsystem specific way, maybe, everybody want to touch migrate.c so it would be very complicated and bloated, even not maintainable in future. If it goes another way like a_ops->migratepages, it couldn't handle complex nesting pin pages case so it couldn't gaurantee pinpage migraions. Most hard part is what is "for a while". It depends on system workloads so some system means it is 3ms while other system means it is 3s. :( Sigh, now I have no idea how can handle it with general. Thanks for the comment, Christoph! > -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Thu, 15 Aug 2013, Minchan Kim wrote: > When I look API of mmu_notifier, it has mm_struct so I guess it works > for only user process. Right? Correct. A process must have mapped the pages. If you can get a kernel "process" to work then that process could map the pages. > If so, I need to register it without user conext because zram, zswap > and zcache works for only kernel side. Hmmm... Ok but that now gets the complexity of page pinnning up to a very weird level. Is there some way we can have a common way to deal with the various ways that pinning is needed? Just off the top of my head (I may miss some use cases) we have 1. mlock from user space 2. page pinning for reclaim 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem) 4. Page pinning for low latency operations 5. Page pinning for migration 6. Page pinning for the perf buffers. 7. Page pinning for cross system access (XPMEM, GRU SGI) Now we have another subsystem wanting different semantics of pinning. Is there any way we can come up with a pinning mechanism that fits all use cases, that is easyly understandable and maintainable? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Wed, 14 Aug 2013, Minchan Kim wrote: > On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote: > > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > > > THP and so on but at the moment, it could handle only userspace pages > > > so if above example subsystem have pinned a some page in a range VM want > > > to migrate, migration is failed so above exmaple couldn't work well. > > > > Dont we have the mmu_notifiers that could help in that case? You could get > > a callback which could prepare the pages for migration? > > Now I'm not familiar with mmu_notifier so please could you elaborate it > a bit for me to dive into that? Add a notifier callback for unpinning pages to the mmu notifier subsystem and then your drivers could register with the subsystem to get notifications when migration needs to occur etc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hi Christoph, On Wed, Aug 14, 2013 at 04:36:44PM +, Christoph Lameter wrote: > On Wed, 14 Aug 2013, Minchan Kim wrote: > > > On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote: > > > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > > > > > VM sometime want to migrate and/or reclaim pages for CMA, > > > > memory-hotplug, > > > > THP and so on but at the moment, it could handle only userspace pages > > > > so if above example subsystem have pinned a some page in a range VM want > > > > to migrate, migration is failed so above exmaple couldn't work well. > > > > > > Dont we have the mmu_notifiers that could help in that case? You could get > > > a callback which could prepare the pages for migration? > > > > Now I'm not familiar with mmu_notifier so please could you elaborate it > > a bit for me to dive into that? > > Add a notifier callback for unpinning pages to the mmu notifier subsystem > and then your drivers could register with the subsystem to get > notifications when migration needs to occur etc. > When I look API of mmu_notifier, it has mm_struct so I guess it works for only user process. Right? If so, I need to register it without user conext because zram, zswap and zcache works for only kernel side. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hi Christoph, On Wed, Aug 14, 2013 at 04:36:44PM +, Christoph Lameter wrote: On Wed, 14 Aug 2013, Minchan Kim wrote: On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote: On Tue, 13 Aug 2013, Minchan Kim wrote: VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, THP and so on but at the moment, it could handle only userspace pages so if above example subsystem have pinned a some page in a range VM want to migrate, migration is failed so above exmaple couldn't work well. Dont we have the mmu_notifiers that could help in that case? You could get a callback which could prepare the pages for migration? Now I'm not familiar with mmu_notifier so please could you elaborate it a bit for me to dive into that? Add a notifier callback for unpinning pages to the mmu notifier subsystem and then your drivers could register with the subsystem to get notifications when migration needs to occur etc. When I look API of mmu_notifier, it has mm_struct so I guess it works for only user process. Right? If so, I need to register it without user conext because zram, zswap and zcache works for only kernel side. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Wed, 14 Aug 2013, Minchan Kim wrote: On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote: On Tue, 13 Aug 2013, Minchan Kim wrote: VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, THP and so on but at the moment, it could handle only userspace pages so if above example subsystem have pinned a some page in a range VM want to migrate, migration is failed so above exmaple couldn't work well. Dont we have the mmu_notifiers that could help in that case? You could get a callback which could prepare the pages for migration? Now I'm not familiar with mmu_notifier so please could you elaborate it a bit for me to dive into that? Add a notifier callback for unpinning pages to the mmu notifier subsystem and then your drivers could register with the subsystem to get notifications when migration needs to occur etc. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Thu, 15 Aug 2013, Minchan Kim wrote: When I look API of mmu_notifier, it has mm_struct so I guess it works for only user process. Right? Correct. A process must have mapped the pages. If you can get a kernel process to work then that process could map the pages. If so, I need to register it without user conext because zram, zswap and zcache works for only kernel side. Hmmm... Ok but that now gets the complexity of page pinnning up to a very weird level. Is there some way we can have a common way to deal with the various ways that pinning is needed? Just off the top of my head (I may miss some use cases) we have 1. mlock from user space 2. page pinning for reclaim 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem) 4. Page pinning for low latency operations 5. Page pinning for migration 6. Page pinning for the perf buffers. 7. Page pinning for cross system access (XPMEM, GRU SGI) Now we have another subsystem wanting different semantics of pinning. Is there any way we can come up with a pinning mechanism that fits all use cases, that is easyly understandable and maintainable? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hey Christoph, On Wed, Aug 14, 2013 at 04:58:36PM +, Christoph Lameter wrote: On Thu, 15 Aug 2013, Minchan Kim wrote: When I look API of mmu_notifier, it has mm_struct so I guess it works for only user process. Right? Correct. A process must have mapped the pages. If you can get a kernel process to work then that process could map the pages. If so, I need to register it without user conext because zram, zswap and zcache works for only kernel side. Hmmm... Ok but that now gets the complexity of page pinnning up to a very weird level. Is there some way we can have a common way to deal with the various ways that pinning is needed? Just off the top of my head (I may miss some use cases) we have 1. mlock from user space Now mlock pages could be migrated in case of CMA so I think it's not a big problem to migrate it for other cases. I remember You and Peter argued what's the mlock semainc of pin POV and as I remember correctly, Peter said mlock doesn't mean pin so we could migrate it but you didn't agree. Right? Anyway, it's off-topic but technically, it's not a problem. 2. page pinning for reclaim Reclaiming pin a page for a while. Of course, for a while means rather vague so it could mean it's really long for someone but really short for others. But at least, reclaim pin should be short and we should try it if it's not ture. 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem) It's one of big concerns for me. Even several drviers might be able to pin a page same time. But normally most of drvier can know he will pin a page long time or short time so if it want to pin a page long time like aio or some GPU driver for zero-coyp, it should use pinpage control subsystem to release pin pages when VM ask. 4. Page pinning for low latency operations I have no idea but I guess most of them pin a page during short time? Otherwise, they should use pinpage control subsystem, too. 5. Page pinning for migration It's like 2. migration pin should be short. 6. Page pinning for the perf buffers. I'm not familiar with that but my gut feeling is it will pin pages for a long time so it should use pinpage control subsystem. 7. Page pinning for cross system access (XPMEM, GRU SGI) If it's really long pin, it should use pinpage control subsystem. Now we have another subsystem wanting different semantics of pinning. Is there any way we can come up with a pinning mechanism that fits all use cases, that is easyly understandable and maintainable? I agree it's not easy but we should go that way rather than adding ad-hoc subsystem specific implementaion. If we allow subsystem specific way, maybe, everybody want to touch migrate.c so it would be very complicated and bloated, even not maintainable in future. If it goes another way like a_ops-migratepages, it couldn't handle complex nesting pin pages case so it couldn't gaurantee pinpage migraions. Most hard part is what is for a while. It depends on system workloads so some system means it is 3ms while other system means it is 3s. :( Sigh, now I have no idea how can handle it with general. Thanks for the comment, Christoph! -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hello Christoph, On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote: > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > > THP and so on but at the moment, it could handle only userspace pages > > so if above example subsystem have pinned a some page in a range VM want > > to migrate, migration is failed so above exmaple couldn't work well. > > Dont we have the mmu_notifiers that could help in that case? You could get > a callback which could prepare the pages for migration? Now I'm not familiar with mmu_notifier so please could you elaborate it a bit for me to dive into that? Thanks! > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hello Benjamin, On Tue, Aug 13, 2013 at 10:23:38AM -0400, Benjamin LaHaise wrote: > On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > > Hi Minchan, > > > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > > patch 2 introduce pinpage control > > > subsystem. So, subsystems want to control pinpage should implement own > > > pinpage_xxx functions because each subsystem would have other character > > > so what kinds of data structure for managing pinpage information depends > > > on them. Otherwise, they can use general functions defined in pinpage > > > subsystem. patch 3 hacks migration.c so that migration is > > > aware of pinpage now and migrate them with pinpage subsystem. > > > > I wonder why don't we use page->mapping and a_ops? Is there any > > disadvantage of such mapping/a_ops? > > That's what the pending aio patches do, and I think this is a better > approach for those use-cases that the technique works for. I saw your implementation roughly and I think it's not a generic solution. How could it handle the example mentioned in reply of Krzysztof? > > The biggest problem I see with the pinpage approach is that it's based on a > single page at a time. I'd venture a guess that many pinned pages are done > in groups of pages, not single ones. In case of z* family, most of allocation is single but I agree many GUP users would allocate groups of pages. Then, we can cover it by expanding the API like this. int set_pinpage(struct pinpage_system *psys, struct page **pages, unsigned long nr_pages, void **privates); so we can handle it by batch and the subsystem can manage pinpage_info with interval tree rather than radix tree which is default. That's why pinpage control subsystem has room for subsystem specific metadata handling. > > -ben > > > Best regards, > > Krzysztof > > -- > "Thought is the essence of where you are now." > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hello Krzysztof, On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > Hi Minchan, > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > patch 2 introduce pinpage control > > subsystem. So, subsystems want to control pinpage should implement own > > pinpage_xxx functions because each subsystem would have other character > > so what kinds of data structure for managing pinpage information depends > > on them. Otherwise, they can use general functions defined in pinpage > > subsystem. patch 3 hacks migration.c so that migration is > > aware of pinpage now and migrate them with pinpage subsystem. > > I wonder why don't we use page->mapping and a_ops? Is there any > disadvantage of such mapping/a_ops? Most concern of the approach is how to handle nested pin case. For example, driver A and driver B pin same file-backed page conincidently by get_user_pages. For the migration, we needs following operations. 1. [buffer]'s migrate_page for the file-backed page 2. [driver A]'s migrate_page 3. [driver B]'s migrate_page But the page's mapping is only one. How can we handle it? If we give up pinpage subsystem unifying userspace pages(ex, GUP) and kernel space pages(ex, zswap, zram and zcache), we can go address_space's migatepages but we might lost abstraction so that all of users should implement own pinpage manager. It's not hard, I guess but it's more error-prone and not maintainable for the future. > > Best regards, > Krzysztof > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Tue, 13 Aug 2013, Minchan Kim wrote: > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > THP and so on but at the moment, it could handle only userspace pages > so if above example subsystem have pinned a some page in a range VM want > to migrate, migration is failed so above exmaple couldn't work well. Dont we have the mmu_notifiers that could help in that case? You could get a callback which could prepare the pages for migration? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > Hi Minchan, > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > patch 2 introduce pinpage control > > subsystem. So, subsystems want to control pinpage should implement own > > pinpage_xxx functions because each subsystem would have other character > > so what kinds of data structure for managing pinpage information depends > > on them. Otherwise, they can use general functions defined in pinpage > > subsystem. patch 3 hacks migration.c so that migration is > > aware of pinpage now and migrate them with pinpage subsystem. > > I wonder why don't we use page->mapping and a_ops? Is there any > disadvantage of such mapping/a_ops? That's what the pending aio patches do, and I think this is a better approach for those use-cases that the technique works for. The biggest problem I see with the pinpage approach is that it's based on a single page at a time. I'd venture a guess that many pinned pages are done in groups of pages, not single ones. -ben > Best regards, > Krzysztof -- "Thought is the essence of where you are now." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hi Minchan, On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > patch 2 introduce pinpage control > subsystem. So, subsystems want to control pinpage should implement own > pinpage_xxx functions because each subsystem would have other character > so what kinds of data structure for managing pinpage information depends > on them. Otherwise, they can use general functions defined in pinpage > subsystem. patch 3 hacks migration.c so that migration is > aware of pinpage now and migrate them with pinpage subsystem. I wonder why don't we use page->mapping and a_ops? Is there any disadvantage of such mapping/a_ops? Best regards, Krzysztof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hi Minchan, On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: patch 2 introduce pinpage control subsystem. So, subsystems want to control pinpage should implement own pinpage_xxx functions because each subsystem would have other character so what kinds of data structure for managing pinpage information depends on them. Otherwise, they can use general functions defined in pinpage subsystem. patch 3 hacks migration.c so that migration is aware of pinpage now and migrate them with pinpage subsystem. I wonder why don't we use page-mapping and a_ops? Is there any disadvantage of such mapping/a_ops? Best regards, Krzysztof -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: Hi Minchan, On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: patch 2 introduce pinpage control subsystem. So, subsystems want to control pinpage should implement own pinpage_xxx functions because each subsystem would have other character so what kinds of data structure for managing pinpage information depends on them. Otherwise, they can use general functions defined in pinpage subsystem. patch 3 hacks migration.c so that migration is aware of pinpage now and migrate them with pinpage subsystem. I wonder why don't we use page-mapping and a_ops? Is there any disadvantage of such mapping/a_ops? That's what the pending aio patches do, and I think this is a better approach for those use-cases that the technique works for. The biggest problem I see with the pinpage approach is that it's based on a single page at a time. I'd venture a guess that many pinned pages are done in groups of pages, not single ones. -ben Best regards, Krzysztof -- Thought is the essence of where you are now. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
On Tue, 13 Aug 2013, Minchan Kim wrote: VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, THP and so on but at the moment, it could handle only userspace pages so if above example subsystem have pinned a some page in a range VM want to migrate, migration is failed so above exmaple couldn't work well. Dont we have the mmu_notifiers that could help in that case? You could get a callback which could prepare the pages for migration? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hello Krzysztof, On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: Hi Minchan, On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: patch 2 introduce pinpage control subsystem. So, subsystems want to control pinpage should implement own pinpage_xxx functions because each subsystem would have other character so what kinds of data structure for managing pinpage information depends on them. Otherwise, they can use general functions defined in pinpage subsystem. patch 3 hacks migration.c so that migration is aware of pinpage now and migrate them with pinpage subsystem. I wonder why don't we use page-mapping and a_ops? Is there any disadvantage of such mapping/a_ops? Most concern of the approach is how to handle nested pin case. For example, driver A and driver B pin same file-backed page conincidently by get_user_pages. For the migration, we needs following operations. 1. [buffer]'s migrate_page for the file-backed page 2. [driver A]'s migrate_page 3. [driver B]'s migrate_page But the page's mapping is only one. How can we handle it? If we give up pinpage subsystem unifying userspace pages(ex, GUP) and kernel space pages(ex, zswap, zram and zcache), we can go address_space's migatepages but we might lost abstraction so that all of users should implement own pinpage manager. It's not hard, I guess but it's more error-prone and not maintainable for the future. Best regards, Krzysztof -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hello Benjamin, On Tue, Aug 13, 2013 at 10:23:38AM -0400, Benjamin LaHaise wrote: On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: Hi Minchan, On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: patch 2 introduce pinpage control subsystem. So, subsystems want to control pinpage should implement own pinpage_xxx functions because each subsystem would have other character so what kinds of data structure for managing pinpage information depends on them. Otherwise, they can use general functions defined in pinpage subsystem. patch 3 hacks migration.c so that migration is aware of pinpage now and migrate them with pinpage subsystem. I wonder why don't we use page-mapping and a_ops? Is there any disadvantage of such mapping/a_ops? That's what the pending aio patches do, and I think this is a better approach for those use-cases that the technique works for. I saw your implementation roughly and I think it's not a generic solution. How could it handle the example mentioned in reply of Krzysztof? The biggest problem I see with the pinpage approach is that it's based on a single page at a time. I'd venture a guess that many pinned pages are done in groups of pages, not single ones. In case of z* family, most of allocation is single but I agree many GUP users would allocate groups of pages. Then, we can cover it by expanding the API like this. int set_pinpage(struct pinpage_system *psys, struct page **pages, unsigned long nr_pages, void **privates); so we can handle it by batch and the subsystem can manage pinpage_info with interval tree rather than radix tree which is default. That's why pinpage control subsystem has room for subsystem specific metadata handling. -ben Best regards, Krzysztof -- Thought is the essence of where you are now. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Pin page control subsystem
Hello Christoph, On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote: On Tue, 13 Aug 2013, Minchan Kim wrote: VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, THP and so on but at the moment, it could handle only userspace pages so if above example subsystem have pinned a some page in a range VM want to migrate, migration is failed so above exmaple couldn't work well. Dont we have the mmu_notifiers that could help in that case? You could get a callback which could prepare the pages for migration? Now I'm not familiar with mmu_notifier so please could you elaborate it a bit for me to dive into that? Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/