Re: [RFC 0/3] Pin page control subsystem

2013-08-15 Thread Christoph Lameter
On Thu, 15 Aug 2013, Minchan Kim wrote:

> Now mlock pages could be migrated in case of CMA so I think it's not a
> big problem to migrate it for other cases.
> I remember You and Peter argued what's the mlock semainc of pin POV
> and as I remember correctly, Peter said mlock doesn't mean pin so
> we could migrate it but you didn't agree. Right?

mlock means it can be migrated. Pinning is currently done by increasing
the page count. Migration will be attempted but it will fail since the
references cannot be all removed. Peter proposed that mlock would work
like pinning so that a migration of the page would not be attempted.

My concern is not only about migration but about a general way of pinning
pages. Having mlock and pinning with different semantics is already an
issue as the conversation with Peter brought out. Now we are
adding yet another way that pinning is used.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-15 Thread Christoph Lameter
On Thu, 15 Aug 2013, Minchan Kim wrote:

 Now mlock pages could be migrated in case of CMA so I think it's not a
 big problem to migrate it for other cases.
 I remember You and Peter argued what's the mlock semainc of pin POV
 and as I remember correctly, Peter said mlock doesn't mean pin so
 we could migrate it but you didn't agree. Right?

mlock means it can be migrated. Pinning is currently done by increasing
the page count. Migration will be attempted but it will fail since the
references cannot be all removed. Peter proposed that mlock would work
like pinning so that a migration of the page would not be attempted.

My concern is not only about migration but about a general way of pinning
pages. Having mlock and pinning with different semantics is already an
issue as the conversation with Peter brought out. Now we are
adding yet another way that pinning is used.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-14 Thread Minchan Kim
Hey Christoph,

On Wed, Aug 14, 2013 at 04:58:36PM +, Christoph Lameter wrote:
> On Thu, 15 Aug 2013, Minchan Kim wrote:
> 
> > When I look API of mmu_notifier, it has mm_struct so I guess it works
> > for only user process. Right?
> 
> Correct. A process must have mapped the pages. If you can get a
> kernel "process" to work then that process could map the pages.
> 
> > If so, I need to register it without user conext because zram, zswap
> > and zcache works for only kernel side.
> 
> Hmmm... Ok but that now gets the complexity of page pinnning up to a very
> weird level. Is there some way we can have a common way to deal with the
> various ways that pinning is needed? Just off the top of my head (I may
> miss some use cases) we have
> 
> 1. mlock from user space

Now mlock pages could be migrated in case of CMA so I think it's not a
big problem to migrate it for other cases.
I remember You and Peter argued what's the mlock semainc of pin POV
and as I remember correctly, Peter said mlock doesn't mean pin so
we could migrate it but you didn't agree. Right?
Anyway, it's off-topic but technically, it's not a problem.

> 2. page pinning for reclaim

Reclaiming pin a page for a while. Of course, "for a while" means
rather vague so it could mean it's really long for someone but really
short for others. But at least, reclaim pin should be short and
we should try it if it's not ture.

> 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem)

It's one of big concerns for me. Even several drviers might be able to pin
a page same time. But normally most of drvier can know he will pin a page
long time or short time so if it want to pin a page long time like aio or
some GPU driver for zero-coyp, it should use pinpage control subsystem to
release pin pages when VM ask.

> 4. Page pinning for low latency operations

I have no idea but I guess most of them pin a page during short time?
Otherwise, they should use pinpage control subsystem, too.

> 5. Page pinning for migration

It's like 2. migration pin should be short.

> 6. Page pinning for the perf buffers.

I'm not familiar with that but my gut feeling is it will pin pages
for a long time so it should use pinpage control subsystem.

> 7. Page pinning for cross system access (XPMEM, GRU SGI)

If it's really long pin, it should use pinpage control subsystem.

> 
> Now we have another subsystem wanting different semantics of pinning. Is
> there any way we can come up with a pinning mechanism that fits all use
> cases, that is easyly understandable and maintainable?

I agree it's not easy but we should go that way rather than adding ad-hoc
subsystem specific implementaion. If we allow subsystem specific way,
maybe, everybody want to touch migrate.c so it would be very complicated
and bloated, even not maintainable in future. If it goes another way
like a_ops->migratepages, it couldn't handle complex nesting pin pages
case so it couldn't gaurantee pinpage migraions.

Most hard part is what is "for a while". It depends on system workloads
so some system means it is 3ms while other system means it is 3s. :(
Sigh, now I have no idea how can handle it with general.

Thanks for the comment, Christoph!

> 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-14 Thread Christoph Lameter
On Thu, 15 Aug 2013, Minchan Kim wrote:

> When I look API of mmu_notifier, it has mm_struct so I guess it works
> for only user process. Right?

Correct. A process must have mapped the pages. If you can get a
kernel "process" to work then that process could map the pages.

> If so, I need to register it without user conext because zram, zswap
> and zcache works for only kernel side.

Hmmm... Ok but that now gets the complexity of page pinnning up to a very
weird level. Is there some way we can have a common way to deal with the
various ways that pinning is needed? Just off the top of my head (I may
miss some use cases) we have

1. mlock from user space
2. page pinning for reclaim
3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem)
4. Page pinning for low latency operations
5. Page pinning for migration
6. Page pinning for the perf buffers.
7. Page pinning for cross system access (XPMEM, GRU SGI)

Now we have another subsystem wanting different semantics of pinning. Is
there any way we can come up with a pinning mechanism that fits all use
cases, that is easyly understandable and maintainable?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-14 Thread Christoph Lameter
On Wed, 14 Aug 2013, Minchan Kim wrote:

> On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote:
> > On Tue, 13 Aug 2013, Minchan Kim wrote:
> >
> > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug,
> > > THP and so on but at the moment, it could handle only userspace pages
> > > so if above example subsystem have pinned a some page in a range VM want
> > > to migrate, migration is failed so above exmaple couldn't work well.
> >
> > Dont we have the mmu_notifiers that could help in that case? You could get
> > a callback which could prepare the pages for migration?
>
> Now I'm not familiar with mmu_notifier so please could you elaborate it
> a bit for me to dive into that?

Add a notifier callback for unpinning pages to the mmu notifier subsystem
and then your drivers could register with the subsystem to get
notifications when migration needs to occur etc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-14 Thread Minchan Kim
Hi Christoph,

On Wed, Aug 14, 2013 at 04:36:44PM +, Christoph Lameter wrote:
> On Wed, 14 Aug 2013, Minchan Kim wrote:
> 
> > On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote:
> > > On Tue, 13 Aug 2013, Minchan Kim wrote:
> > >
> > > > VM sometime want to migrate and/or reclaim pages for CMA, 
> > > > memory-hotplug,
> > > > THP and so on but at the moment, it could handle only userspace pages
> > > > so if above example subsystem have pinned a some page in a range VM want
> > > > to migrate, migration is failed so above exmaple couldn't work well.
> > >
> > > Dont we have the mmu_notifiers that could help in that case? You could get
> > > a callback which could prepare the pages for migration?
> >
> > Now I'm not familiar with mmu_notifier so please could you elaborate it
> > a bit for me to dive into that?
> 
> Add a notifier callback for unpinning pages to the mmu notifier subsystem
> and then your drivers could register with the subsystem to get
> notifications when migration needs to occur etc.
> 

When I look API of mmu_notifier, it has mm_struct so I guess it works
for only user process. Right?
If so, I need to register it without user conext because zram, zswap
and zcache works for only kernel side.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-14 Thread Minchan Kim
Hi Christoph,

On Wed, Aug 14, 2013 at 04:36:44PM +, Christoph Lameter wrote:
 On Wed, 14 Aug 2013, Minchan Kim wrote:
 
  On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote:
   On Tue, 13 Aug 2013, Minchan Kim wrote:
  
VM sometime want to migrate and/or reclaim pages for CMA, 
memory-hotplug,
THP and so on but at the moment, it could handle only userspace pages
so if above example subsystem have pinned a some page in a range VM want
to migrate, migration is failed so above exmaple couldn't work well.
  
   Dont we have the mmu_notifiers that could help in that case? You could get
   a callback which could prepare the pages for migration?
 
  Now I'm not familiar with mmu_notifier so please could you elaborate it
  a bit for me to dive into that?
 
 Add a notifier callback for unpinning pages to the mmu notifier subsystem
 and then your drivers could register with the subsystem to get
 notifications when migration needs to occur etc.
 

When I look API of mmu_notifier, it has mm_struct so I guess it works
for only user process. Right?
If so, I need to register it without user conext because zram, zswap
and zcache works for only kernel side.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-14 Thread Christoph Lameter
On Wed, 14 Aug 2013, Minchan Kim wrote:

 On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote:
  On Tue, 13 Aug 2013, Minchan Kim wrote:
 
   VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug,
   THP and so on but at the moment, it could handle only userspace pages
   so if above example subsystem have pinned a some page in a range VM want
   to migrate, migration is failed so above exmaple couldn't work well.
 
  Dont we have the mmu_notifiers that could help in that case? You could get
  a callback which could prepare the pages for migration?

 Now I'm not familiar with mmu_notifier so please could you elaborate it
 a bit for me to dive into that?

Add a notifier callback for unpinning pages to the mmu notifier subsystem
and then your drivers could register with the subsystem to get
notifications when migration needs to occur etc.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-14 Thread Christoph Lameter
On Thu, 15 Aug 2013, Minchan Kim wrote:

 When I look API of mmu_notifier, it has mm_struct so I guess it works
 for only user process. Right?

Correct. A process must have mapped the pages. If you can get a
kernel process to work then that process could map the pages.

 If so, I need to register it without user conext because zram, zswap
 and zcache works for only kernel side.

Hmmm... Ok but that now gets the complexity of page pinnning up to a very
weird level. Is there some way we can have a common way to deal with the
various ways that pinning is needed? Just off the top of my head (I may
miss some use cases) we have

1. mlock from user space
2. page pinning for reclaim
3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem)
4. Page pinning for low latency operations
5. Page pinning for migration
6. Page pinning for the perf buffers.
7. Page pinning for cross system access (XPMEM, GRU SGI)

Now we have another subsystem wanting different semantics of pinning. Is
there any way we can come up with a pinning mechanism that fits all use
cases, that is easyly understandable and maintainable?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-14 Thread Minchan Kim
Hey Christoph,

On Wed, Aug 14, 2013 at 04:58:36PM +, Christoph Lameter wrote:
 On Thu, 15 Aug 2013, Minchan Kim wrote:
 
  When I look API of mmu_notifier, it has mm_struct so I guess it works
  for only user process. Right?
 
 Correct. A process must have mapped the pages. If you can get a
 kernel process to work then that process could map the pages.
 
  If so, I need to register it without user conext because zram, zswap
  and zcache works for only kernel side.
 
 Hmmm... Ok but that now gets the complexity of page pinnning up to a very
 weird level. Is there some way we can have a common way to deal with the
 various ways that pinning is needed? Just off the top of my head (I may
 miss some use cases) we have
 
 1. mlock from user space

Now mlock pages could be migrated in case of CMA so I think it's not a
big problem to migrate it for other cases.
I remember You and Peter argued what's the mlock semainc of pin POV
and as I remember correctly, Peter said mlock doesn't mean pin so
we could migrate it but you didn't agree. Right?
Anyway, it's off-topic but technically, it's not a problem.

 2. page pinning for reclaim

Reclaiming pin a page for a while. Of course, for a while means
rather vague so it could mean it's really long for someone but really
short for others. But at least, reclaim pin should be short and
we should try it if it's not ture.

 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem)

It's one of big concerns for me. Even several drviers might be able to pin
a page same time. But normally most of drvier can know he will pin a page
long time or short time so if it want to pin a page long time like aio or
some GPU driver for zero-coyp, it should use pinpage control subsystem to
release pin pages when VM ask.

 4. Page pinning for low latency operations

I have no idea but I guess most of them pin a page during short time?
Otherwise, they should use pinpage control subsystem, too.

 5. Page pinning for migration

It's like 2. migration pin should be short.

 6. Page pinning for the perf buffers.

I'm not familiar with that but my gut feeling is it will pin pages
for a long time so it should use pinpage control subsystem.

 7. Page pinning for cross system access (XPMEM, GRU SGI)

If it's really long pin, it should use pinpage control subsystem.

 
 Now we have another subsystem wanting different semantics of pinning. Is
 there any way we can come up with a pinning mechanism that fits all use
 cases, that is easyly understandable and maintainable?

I agree it's not easy but we should go that way rather than adding ad-hoc
subsystem specific implementaion. If we allow subsystem specific way,
maybe, everybody want to touch migrate.c so it would be very complicated
and bloated, even not maintainable in future. If it goes another way
like a_ops-migratepages, it couldn't handle complex nesting pin pages
case so it couldn't gaurantee pinpage migraions.

Most hard part is what is for a while. It depends on system workloads
so some system means it is 3ms while other system means it is 3s. :(
Sigh, now I have no idea how can handle it with general.

Thanks for the comment, Christoph!

 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Minchan Kim
Hello Christoph,

On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote:
> On Tue, 13 Aug 2013, Minchan Kim wrote:
> 
> > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug,
> > THP and so on but at the moment, it could handle only userspace pages
> > so if above example subsystem have pinned a some page in a range VM want
> > to migrate, migration is failed so above exmaple couldn't work well.
> 
> Dont we have the mmu_notifiers that could help in that case? You could get
> a callback which could prepare the pages for migration?

Now I'm not familiar with mmu_notifier so please could you elaborate it
a bit for me to dive into that? 

Thanks!

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Minchan Kim
Hello Benjamin,

On Tue, Aug 13, 2013 at 10:23:38AM -0400, Benjamin LaHaise wrote:
> On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote:
> > Hi Minchan,
> > 
> > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote:
> > > patch 2 introduce pinpage control
> > > subsystem. So, subsystems want to control pinpage should implement own
> > > pinpage_xxx functions because each subsystem would have other character
> > > so what kinds of data structure for managing pinpage information depends
> > > on them. Otherwise, they can use general functions defined in pinpage
> > > subsystem. patch 3 hacks migration.c so that migration is
> > > aware of pinpage now and migrate them with pinpage subsystem.
> > 
> > I wonder why don't we use page->mapping and a_ops? Is there any
> > disadvantage of such mapping/a_ops?
> 
> That's what the pending aio patches do, and I think this is a better 
> approach for those use-cases that the technique works for.

I saw your implementation roughly and I think it's not a generic solution.
How could it handle the example mentioned in reply of Krzysztof?

> 
> The biggest problem I see with the pinpage approach is that it's based on a
> single page at a time.  I'd venture a guess that many pinned pages are done 
> in groups of pages, not single ones.

In case of z* family, most of allocation is single but I agree many GUP users
would allocate groups of pages. Then, we can cover it by expanding the API
like this.

int set_pinpage(struct pinpage_system *psys, struct page **pages,
unsigned long nr_pages, void **privates);

so we can handle it by batch and the subsystem can manage pinpage_info with
interval tree rather than radix tree which is default.
That's why pinpage control subsystem has room for subsystem specific metadata
handling.

> 
>   -ben
> 
> > Best regards,
> > Krzysztof
> 
> -- 
> "Thought is the essence of where you are now."
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Minchan Kim
Hello Krzysztof,

On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote:
> Hi Minchan,
> 
> On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote:
> > patch 2 introduce pinpage control
> > subsystem. So, subsystems want to control pinpage should implement own
> > pinpage_xxx functions because each subsystem would have other character
> > so what kinds of data structure for managing pinpage information depends
> > on them. Otherwise, they can use general functions defined in pinpage
> > subsystem. patch 3 hacks migration.c so that migration is
> > aware of pinpage now and migrate them with pinpage subsystem.
> 
> I wonder why don't we use page->mapping and a_ops? Is there any
> disadvantage of such mapping/a_ops?

Most concern of the approach is how to handle nested pin case.
For example, driver A and driver B pin same file-backed page
conincidently by get_user_pages.
For the migration, we needs following operations.

1. [buffer]'s migrate_page for the file-backed page
2. [driver A]'s migrate_page 
3. [driver B]'s migrate_page

But the page's mapping is only one. How can we handle it?

If we give up pinpage subsystem unifying userspace pages(ex, GUP)
and kernel space pages(ex, zswap, zram and zcache), we can go
address_space's migatepages but we might lost abstraction so that
all of users should implement own pinpage manager. It's not hard,
I guess but it's more error-prone and not maintainable for the future.

> 
> Best regards,
> Krzysztof
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Christoph Lameter
On Tue, 13 Aug 2013, Minchan Kim wrote:

> VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug,
> THP and so on but at the moment, it could handle only userspace pages
> so if above example subsystem have pinned a some page in a range VM want
> to migrate, migration is failed so above exmaple couldn't work well.

Dont we have the mmu_notifiers that could help in that case? You could get
a callback which could prepare the pages for migration?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Benjamin LaHaise
On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote:
> Hi Minchan,
> 
> On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote:
> > patch 2 introduce pinpage control
> > subsystem. So, subsystems want to control pinpage should implement own
> > pinpage_xxx functions because each subsystem would have other character
> > so what kinds of data structure for managing pinpage information depends
> > on them. Otherwise, they can use general functions defined in pinpage
> > subsystem. patch 3 hacks migration.c so that migration is
> > aware of pinpage now and migrate them with pinpage subsystem.
> 
> I wonder why don't we use page->mapping and a_ops? Is there any
> disadvantage of such mapping/a_ops?

That's what the pending aio patches do, and I think this is a better 
approach for those use-cases that the technique works for.

The biggest problem I see with the pinpage approach is that it's based on a
single page at a time.  I'd venture a guess that many pinned pages are done 
in groups of pages, not single ones.

-ben

> Best regards,
> Krzysztof

-- 
"Thought is the essence of where you are now."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Krzysztof Kozlowski
Hi Minchan,

On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote:
> patch 2 introduce pinpage control
> subsystem. So, subsystems want to control pinpage should implement own
> pinpage_xxx functions because each subsystem would have other character
> so what kinds of data structure for managing pinpage information depends
> on them. Otherwise, they can use general functions defined in pinpage
> subsystem. patch 3 hacks migration.c so that migration is
> aware of pinpage now and migrate them with pinpage subsystem.

I wonder why don't we use page->mapping and a_ops? Is there any
disadvantage of such mapping/a_ops?

Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Krzysztof Kozlowski
Hi Minchan,

On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote:
 patch 2 introduce pinpage control
 subsystem. So, subsystems want to control pinpage should implement own
 pinpage_xxx functions because each subsystem would have other character
 so what kinds of data structure for managing pinpage information depends
 on them. Otherwise, they can use general functions defined in pinpage
 subsystem. patch 3 hacks migration.c so that migration is
 aware of pinpage now and migrate them with pinpage subsystem.

I wonder why don't we use page-mapping and a_ops? Is there any
disadvantage of such mapping/a_ops?

Best regards,
Krzysztof

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Benjamin LaHaise
On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote:
 Hi Minchan,
 
 On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote:
  patch 2 introduce pinpage control
  subsystem. So, subsystems want to control pinpage should implement own
  pinpage_xxx functions because each subsystem would have other character
  so what kinds of data structure for managing pinpage information depends
  on them. Otherwise, they can use general functions defined in pinpage
  subsystem. patch 3 hacks migration.c so that migration is
  aware of pinpage now and migrate them with pinpage subsystem.
 
 I wonder why don't we use page-mapping and a_ops? Is there any
 disadvantage of such mapping/a_ops?

That's what the pending aio patches do, and I think this is a better 
approach for those use-cases that the technique works for.

The biggest problem I see with the pinpage approach is that it's based on a
single page at a time.  I'd venture a guess that many pinned pages are done 
in groups of pages, not single ones.

-ben

 Best regards,
 Krzysztof

-- 
Thought is the essence of where you are now.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Christoph Lameter
On Tue, 13 Aug 2013, Minchan Kim wrote:

 VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug,
 THP and so on but at the moment, it could handle only userspace pages
 so if above example subsystem have pinned a some page in a range VM want
 to migrate, migration is failed so above exmaple couldn't work well.

Dont we have the mmu_notifiers that could help in that case? You could get
a callback which could prepare the pages for migration?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Minchan Kim
Hello Krzysztof,

On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote:
 Hi Minchan,
 
 On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote:
  patch 2 introduce pinpage control
  subsystem. So, subsystems want to control pinpage should implement own
  pinpage_xxx functions because each subsystem would have other character
  so what kinds of data structure for managing pinpage information depends
  on them. Otherwise, they can use general functions defined in pinpage
  subsystem. patch 3 hacks migration.c so that migration is
  aware of pinpage now and migrate them with pinpage subsystem.
 
 I wonder why don't we use page-mapping and a_ops? Is there any
 disadvantage of such mapping/a_ops?

Most concern of the approach is how to handle nested pin case.
For example, driver A and driver B pin same file-backed page
conincidently by get_user_pages.
For the migration, we needs following operations.

1. [buffer]'s migrate_page for the file-backed page
2. [driver A]'s migrate_page 
3. [driver B]'s migrate_page

But the page's mapping is only one. How can we handle it?

If we give up pinpage subsystem unifying userspace pages(ex, GUP)
and kernel space pages(ex, zswap, zram and zcache), we can go
address_space's migatepages but we might lost abstraction so that
all of users should implement own pinpage manager. It's not hard,
I guess but it's more error-prone and not maintainable for the future.

 
 Best regards,
 Krzysztof
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Minchan Kim
Hello Benjamin,

On Tue, Aug 13, 2013 at 10:23:38AM -0400, Benjamin LaHaise wrote:
 On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote:
  Hi Minchan,
  
  On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote:
   patch 2 introduce pinpage control
   subsystem. So, subsystems want to control pinpage should implement own
   pinpage_xxx functions because each subsystem would have other character
   so what kinds of data structure for managing pinpage information depends
   on them. Otherwise, they can use general functions defined in pinpage
   subsystem. patch 3 hacks migration.c so that migration is
   aware of pinpage now and migrate them with pinpage subsystem.
  
  I wonder why don't we use page-mapping and a_ops? Is there any
  disadvantage of such mapping/a_ops?
 
 That's what the pending aio patches do, and I think this is a better 
 approach for those use-cases that the technique works for.

I saw your implementation roughly and I think it's not a generic solution.
How could it handle the example mentioned in reply of Krzysztof?

 
 The biggest problem I see with the pinpage approach is that it's based on a
 single page at a time.  I'd venture a guess that many pinned pages are done 
 in groups of pages, not single ones.

In case of z* family, most of allocation is single but I agree many GUP users
would allocate groups of pages. Then, we can cover it by expanding the API
like this.

int set_pinpage(struct pinpage_system *psys, struct page **pages,
unsigned long nr_pages, void **privates);

so we can handle it by batch and the subsystem can manage pinpage_info with
interval tree rather than radix tree which is default.
That's why pinpage control subsystem has room for subsystem specific metadata
handling.

 
   -ben
 
  Best regards,
  Krzysztof
 
 -- 
 Thought is the essence of where you are now.
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Pin page control subsystem

2013-08-13 Thread Minchan Kim
Hello Christoph,

On Tue, Aug 13, 2013 at 04:21:30PM +, Christoph Lameter wrote:
 On Tue, 13 Aug 2013, Minchan Kim wrote:
 
  VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug,
  THP and so on but at the moment, it could handle only userspace pages
  so if above example subsystem have pinned a some page in a range VM want
  to migrate, migration is failed so above exmaple couldn't work well.
 
 Dont we have the mmu_notifiers that could help in that case? You could get
 a callback which could prepare the pages for migration?

Now I'm not familiar with mmu_notifier so please could you elaborate it
a bit for me to dive into that? 

Thanks!

 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/