Re: [PATCH v2 0/3] Directed kmem charging

2018-02-22 Thread Christopher Lameter
On Wed, 21 Feb 2018, Andrew Morton wrote:

> What do others think?

I think the changes to the hotpaths of the slab allocators increasing
register pressure in some of the hotttest paths of the kernel are
problematic.

Its better to do the allocation properly in the task context to which it
is finally charged. There may be other restrictions that emerge from other
fields in the task_struct that also may influence allocation and reclaim
behavior. It is cleanest to do this in the proper task context instead of
taking a piece (like the cgroup) out of context.



Re: [PATCH v2 0/3] Directed kmem charging

2018-02-22 Thread Christopher Lameter
On Wed, 21 Feb 2018, Andrew Morton wrote:

> What do others think?

I think the changes to the hotpaths of the slab allocators increasing
register pressure in some of the hotttest paths of the kernel are
problematic.

Its better to do the allocation properly in the task context to which it
is finally charged. There may be other restrictions that emerge from other
fields in the task_struct that also may influence allocation and reclaim
behavior. It is cleanest to do this in the proper task context instead of
taking a piece (like the cgroup) out of context.



Re: [PATCH v2 0/3] Directed kmem charging

2018-02-22 Thread Christopher Lameter
On Thu, 22 Feb 2018, Jan Kara wrote:

> I don't see how task work can be used here. Firstly I don't know of a case
> where task work would be used for something else than the current task -
> and that is substantial because otherwise you have to deal with lots of
> problems like races with task exit, when work gets executed (normally it
> gets executed once task exits to userspace) etc. Or do you mean that you'd
> queue task work for current task and then somehow magically switch memcg
> there? In that case this magic switching isn't clear to me...

Thats surprising since one can specify the task. If its only for current
then why do you need a parameter?

I think a capability of executing a function in the context of another
running task could simplify a lot. In particular if something triggers
behavior that is related to another task from the kernel like whats
happening here.

It should not be that difficult to do proper synchronization on the list
of work and then set some flag (maybe SIGPENDING) to make the task execute
whats on the tasklist. Signal delivery is after all similar.




Re: [PATCH v2 0/3] Directed kmem charging

2018-02-22 Thread Christopher Lameter
On Thu, 22 Feb 2018, Jan Kara wrote:

> I don't see how task work can be used here. Firstly I don't know of a case
> where task work would be used for something else than the current task -
> and that is substantial because otherwise you have to deal with lots of
> problems like races with task exit, when work gets executed (normally it
> gets executed once task exits to userspace) etc. Or do you mean that you'd
> queue task work for current task and then somehow magically switch memcg
> there? In that case this magic switching isn't clear to me...

Thats surprising since one can specify the task. If its only for current
then why do you need a parameter?

I think a capability of executing a function in the context of another
running task could simplify a lot. In particular if something triggers
behavior that is related to another task from the kernel like whats
happening here.

It should not be that difficult to do proper synchronization on the list
of work and then set some flag (maybe SIGPENDING) to make the task execute
whats on the tasklist. Signal delivery is after all similar.




Re: [PATCH v2 0/3] Directed kmem charging

2018-02-22 Thread Jan Kara
On Wed 21-02-18 11:57:47, Christopher Lameter wrote:
> On Wed, 21 Feb 2018, Shakeel Butt wrote:
> 
> > On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> > > Another way to solve this is to switch the user context right?
> > >
> > > Isnt it possible to avoid these patches if do the allocation in another
> > > task context instead?
> > >
> >
> > Sorry, can you please explain what you mean by 'switch the user
> > context'. Is there any example in kernel which does something similar?
> 
> See include/linux/task_work.h. One use case is in mntput_no_expire() in
> linux/fs/namespace.c
> 
> > > Are there really any other use cases beyond fsnotify?
> > >
> >
> > Another use case I have in mind and plan to upstream is to bind a
> > filesystem mount with a memcg. So, all the file pages (or anon pages
> > for shmem) and kmem (like inodes and dentry) will be charged to that
> > memcg.
> 
> The mount logic already uses task_work.h. That may be the approach to
> expand there.

I don't see how task work can be used here. Firstly I don't know of a case
where task work would be used for something else than the current task -
and that is substantial because otherwise you have to deal with lots of
problems like races with task exit, when work gets executed (normally it
gets executed once task exits to userspace) etc. Or do you mean that you'd
queue task work for current task and then somehow magically switch memcg
there? In that case this magic switching isn't clear to me...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-22 Thread Jan Kara
On Wed 21-02-18 11:57:47, Christopher Lameter wrote:
> On Wed, 21 Feb 2018, Shakeel Butt wrote:
> 
> > On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> > > Another way to solve this is to switch the user context right?
> > >
> > > Isnt it possible to avoid these patches if do the allocation in another
> > > task context instead?
> > >
> >
> > Sorry, can you please explain what you mean by 'switch the user
> > context'. Is there any example in kernel which does something similar?
> 
> See include/linux/task_work.h. One use case is in mntput_no_expire() in
> linux/fs/namespace.c
> 
> > > Are there really any other use cases beyond fsnotify?
> > >
> >
> > Another use case I have in mind and plan to upstream is to bind a
> > filesystem mount with a memcg. So, all the file pages (or anon pages
> > for shmem) and kmem (like inodes and dentry) will be charged to that
> > memcg.
> 
> The mount logic already uses task_work.h. That may be the approach to
> expand there.

I don't see how task work can be used here. Firstly I don't know of a case
where task work would be used for something else than the current task -
and that is substantial because otherwise you have to deal with lots of
problems like races with task exit, when work gets executed (normally it
gets executed once task exits to userspace) etc. Or do you mean that you'd
queue task work for current task and then somehow magically switch memcg
there? In that case this magic switching isn't clear to me...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-22 Thread Jan Kara
On Wed 21-02-18 12:54:26, Andrew Morton wrote:
> On Wed, 21 Feb 2018 09:18:35 -0800 Shakeel Butt  wrote:
> 
> > On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> > > Another way to solve this is to switch the user context right?
> > >
> > > Isnt it possible to avoid these patches if do the allocation in another
> > > task context instead?
> > >
> > 
> > Sorry, can you please explain what you mean by 'switch the user
> > context'. Is there any example in kernel which does something similar?
> > 
> > Another way is by adding a field 'remote_memcg_to_charge' in
> > task_struct and set it before the allocation and in memcontrol.c,
> > first check if current->remote_memcg_to_charge is set otherwise use
> > the memcg of current. Also if we provide a wrapper to do that for the
> > user, there will be a lot less plumbing.
> > 
> > Please let me know if you prefer this approach.
> 
> That would be a lot simpler.  Passing function arguments via
> task_struct is a bit dirty but is sometimes sooo effective.  You
> should've seen how much mess task_struct.journal_info avoided!  And
> reclaim_state.

Agreed, although from time to time people try to be too creative e.g. with
journal_info and surprising bugs come out of that :).

> And one always wonders whether we should do a local save/restore before
> modifying the task_struct field, so it nests.
> 
> What do others think?

Sounds nice to me.

> Maybe we can rename task_struct.reclaim_state to `struct task_mm_state
> *task_mm_state", add remote_memcg_to_charge to struct task_mm_state and
> avoid bloating the task_struct?

Yeah, even better, but then we really need to make sure these things stack
properly.

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-22 Thread Jan Kara
On Wed 21-02-18 12:54:26, Andrew Morton wrote:
> On Wed, 21 Feb 2018 09:18:35 -0800 Shakeel Butt  wrote:
> 
> > On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> > > Another way to solve this is to switch the user context right?
> > >
> > > Isnt it possible to avoid these patches if do the allocation in another
> > > task context instead?
> > >
> > 
> > Sorry, can you please explain what you mean by 'switch the user
> > context'. Is there any example in kernel which does something similar?
> > 
> > Another way is by adding a field 'remote_memcg_to_charge' in
> > task_struct and set it before the allocation and in memcontrol.c,
> > first check if current->remote_memcg_to_charge is set otherwise use
> > the memcg of current. Also if we provide a wrapper to do that for the
> > user, there will be a lot less plumbing.
> > 
> > Please let me know if you prefer this approach.
> 
> That would be a lot simpler.  Passing function arguments via
> task_struct is a bit dirty but is sometimes sooo effective.  You
> should've seen how much mess task_struct.journal_info avoided!  And
> reclaim_state.

Agreed, although from time to time people try to be too creative e.g. with
journal_info and surprising bugs come out of that :).

> And one always wonders whether we should do a local save/restore before
> modifying the task_struct field, so it nests.
> 
> What do others think?

Sounds nice to me.

> Maybe we can rename task_struct.reclaim_state to `struct task_mm_state
> *task_mm_state", add remote_memcg_to_charge to struct task_mm_state and
> avoid bloating the task_struct?

Yeah, even better, but then we really need to make sure these things stack
properly.

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Andrew Morton
On Wed, 21 Feb 2018 09:18:35 -0800 Shakeel Butt  wrote:

> On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> > Another way to solve this is to switch the user context right?
> >
> > Isnt it possible to avoid these patches if do the allocation in another
> > task context instead?
> >
> 
> Sorry, can you please explain what you mean by 'switch the user
> context'. Is there any example in kernel which does something similar?
> 
> Another way is by adding a field 'remote_memcg_to_charge' in
> task_struct and set it before the allocation and in memcontrol.c,
> first check if current->remote_memcg_to_charge is set otherwise use
> the memcg of current. Also if we provide a wrapper to do that for the
> user, there will be a lot less plumbing.
> 
> Please let me know if you prefer this approach.

That would be a lot simpler.  Passing function arguments via
task_struct is a bit dirty but is sometimes sooo effective.  You
should've seen how much mess task_struct.journal_info avoided!  And
reclaim_state.

And one always wonders whether we should do a local save/restore before
modifying the task_struct field, so it nests.

What do others think?


Maybe we can rename task_struct.reclaim_state to `struct task_mm_state
*task_mm_state", add remote_memcg_to_charge to struct task_mm_state and
avoid bloating the task_struct?



Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Andrew Morton
On Wed, 21 Feb 2018 09:18:35 -0800 Shakeel Butt  wrote:

> On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> > Another way to solve this is to switch the user context right?
> >
> > Isnt it possible to avoid these patches if do the allocation in another
> > task context instead?
> >
> 
> Sorry, can you please explain what you mean by 'switch the user
> context'. Is there any example in kernel which does something similar?
> 
> Another way is by adding a field 'remote_memcg_to_charge' in
> task_struct and set it before the allocation and in memcontrol.c,
> first check if current->remote_memcg_to_charge is set otherwise use
> the memcg of current. Also if we provide a wrapper to do that for the
> user, there will be a lot less plumbing.
> 
> Please let me know if you prefer this approach.

That would be a lot simpler.  Passing function arguments via
task_struct is a bit dirty but is sometimes sooo effective.  You
should've seen how much mess task_struct.journal_info avoided!  And
reclaim_state.

And one always wonders whether we should do a local save/restore before
modifying the task_struct field, so it nests.

What do others think?


Maybe we can rename task_struct.reclaim_state to `struct task_mm_state
*task_mm_state", add remote_memcg_to_charge to struct task_mm_state and
avoid bloating the task_struct?



Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Shakeel Butt
On Wed, Feb 21, 2018 at 9:57 AM, Christopher Lameter  wrote:
> On Wed, 21 Feb 2018, Shakeel Butt wrote:
>
>> On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
>> > Another way to solve this is to switch the user context right?
>> >
>> > Isnt it possible to avoid these patches if do the allocation in another
>> > task context instead?
>> >
>>
>> Sorry, can you please explain what you mean by 'switch the user
>> context'. Is there any example in kernel which does something similar?
>
> See include/linux/task_work.h. One use case is in mntput_no_expire() in
> linux/fs/namespace.c
>

>From what I understand, using task_work will require fanotify/inotify
event handler to allocate memory asynchronously. IMHO the code will be
much more complex if we go through that route.

> Another way is by adding a field 'remote_memcg_to_charge' in
> task_struct and set it before the allocation and in memcontrol.c,
> first check if current->remote_memcg_to_charge is set otherwise use
> the memcg of current. Also if we provide a wrapper to do that for the
> user, there will be a lot less plumbing.
>
> Please let me know if you prefer this approach.
>

What do you think of the above approach. I think the amount and
complexity of code will be much less.

>> > Are there really any other use cases beyond fsnotify?
>> >
>>
>> Another use case I have in mind and plan to upstream is to bind a
>> filesystem mount with a memcg. So, all the file pages (or anon pages
>> for shmem) and kmem (like inodes and dentry) will be charged to that
>> memcg.
>
> The mount logic already uses task_work.h. That may be the approach to
> expand there.

The task_work approach will require that the job is already running at
the time of mount operation. Usually the mount operations are done by
either admin or the control task starting the job and is a part of
setting up the environment. So, there might not be any process running
at the time of mount operation.


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Shakeel Butt
On Wed, Feb 21, 2018 at 9:57 AM, Christopher Lameter  wrote:
> On Wed, 21 Feb 2018, Shakeel Butt wrote:
>
>> On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
>> > Another way to solve this is to switch the user context right?
>> >
>> > Isnt it possible to avoid these patches if do the allocation in another
>> > task context instead?
>> >
>>
>> Sorry, can you please explain what you mean by 'switch the user
>> context'. Is there any example in kernel which does something similar?
>
> See include/linux/task_work.h. One use case is in mntput_no_expire() in
> linux/fs/namespace.c
>

>From what I understand, using task_work will require fanotify/inotify
event handler to allocate memory asynchronously. IMHO the code will be
much more complex if we go through that route.

> Another way is by adding a field 'remote_memcg_to_charge' in
> task_struct and set it before the allocation and in memcontrol.c,
> first check if current->remote_memcg_to_charge is set otherwise use
> the memcg of current. Also if we provide a wrapper to do that for the
> user, there will be a lot less plumbing.
>
> Please let me know if you prefer this approach.
>

What do you think of the above approach. I think the amount and
complexity of code will be much less.

>> > Are there really any other use cases beyond fsnotify?
>> >
>>
>> Another use case I have in mind and plan to upstream is to bind a
>> filesystem mount with a memcg. So, all the file pages (or anon pages
>> for shmem) and kmem (like inodes and dentry) will be charged to that
>> memcg.
>
> The mount logic already uses task_work.h. That may be the approach to
> expand there.

The task_work approach will require that the job is already running at
the time of mount operation. Usually the mount operations are done by
either admin or the control task starting the job and is a part of
setting up the environment. So, there might not be any process running
at the time of mount operation.


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Christopher Lameter
On Wed, 21 Feb 2018, Shakeel Butt wrote:

> On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> > Another way to solve this is to switch the user context right?
> >
> > Isnt it possible to avoid these patches if do the allocation in another
> > task context instead?
> >
>
> Sorry, can you please explain what you mean by 'switch the user
> context'. Is there any example in kernel which does something similar?

See include/linux/task_work.h. One use case is in mntput_no_expire() in
linux/fs/namespace.c

> > Are there really any other use cases beyond fsnotify?
> >
>
> Another use case I have in mind and plan to upstream is to bind a
> filesystem mount with a memcg. So, all the file pages (or anon pages
> for shmem) and kmem (like inodes and dentry) will be charged to that
> memcg.

The mount logic already uses task_work.h. That may be the approach to
expand there.


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Christopher Lameter
On Wed, 21 Feb 2018, Shakeel Butt wrote:

> On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> > Another way to solve this is to switch the user context right?
> >
> > Isnt it possible to avoid these patches if do the allocation in another
> > task context instead?
> >
>
> Sorry, can you please explain what you mean by 'switch the user
> context'. Is there any example in kernel which does something similar?

See include/linux/task_work.h. One use case is in mntput_no_expire() in
linux/fs/namespace.c

> > Are there really any other use cases beyond fsnotify?
> >
>
> Another use case I have in mind and plan to upstream is to bind a
> filesystem mount with a memcg. So, all the file pages (or anon pages
> for shmem) and kmem (like inodes and dentry) will be charged to that
> memcg.

The mount logic already uses task_work.h. That may be the approach to
expand there.


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Shakeel Butt
On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> Another way to solve this is to switch the user context right?
>
> Isnt it possible to avoid these patches if do the allocation in another
> task context instead?
>

Sorry, can you please explain what you mean by 'switch the user
context'. Is there any example in kernel which does something similar?

Another way is by adding a field 'remote_memcg_to_charge' in
task_struct and set it before the allocation and in memcontrol.c,
first check if current->remote_memcg_to_charge is set otherwise use
the memcg of current. Also if we provide a wrapper to do that for the
user, there will be a lot less plumbing.

Please let me know if you prefer this approach.


> Are there really any other use cases beyond fsnotify?
>

Another use case I have in mind and plan to upstream is to bind a
filesystem mount with a memcg. So, all the file pages (or anon pages
for shmem) and kmem (like inodes and dentry) will be charged to that
memcg.

>
> The charging of the memory works on a per page level but the allocation
> occur from the same page for multiple tasks that may be running on a
> system. So how relevant is this for other small objects?
>
> Seems that if you do a large amount of allocations for the same purpose
> your chance of accounting it to the right memcg increases. But this is a
> game of chance.
>
>
>


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Shakeel Butt
On Wed, Feb 21, 2018 at 8:09 AM, Christopher Lameter  wrote:
> Another way to solve this is to switch the user context right?
>
> Isnt it possible to avoid these patches if do the allocation in another
> task context instead?
>

Sorry, can you please explain what you mean by 'switch the user
context'. Is there any example in kernel which does something similar?

Another way is by adding a field 'remote_memcg_to_charge' in
task_struct and set it before the allocation and in memcontrol.c,
first check if current->remote_memcg_to_charge is set otherwise use
the memcg of current. Also if we provide a wrapper to do that for the
user, there will be a lot less plumbing.

Please let me know if you prefer this approach.


> Are there really any other use cases beyond fsnotify?
>

Another use case I have in mind and plan to upstream is to bind a
filesystem mount with a memcg. So, all the file pages (or anon pages
for shmem) and kmem (like inodes and dentry) will be charged to that
memcg.

>
> The charging of the memory works on a per page level but the allocation
> occur from the same page for multiple tasks that may be running on a
> system. So how relevant is this for other small objects?
>
> Seems that if you do a large amount of allocations for the same purpose
> your chance of accounting it to the right memcg increases. But this is a
> game of chance.
>
>
>


Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Christopher Lameter
Another way to solve this is to switch the user context right?

Isnt it possible to avoid these patches if do the allocation in another
task context instead?

Are there really any other use cases beyond fsnotify?


The charging of the memory works on a per page level but the allocation
occur from the same page for multiple tasks that may be running on a
system. So how relevant is this for other small objects?

Seems that if you do a large amount of allocations for the same purpose
your chance of accounting it to the right memcg increases. But this is a
game of chance.





Re: [PATCH v2 0/3] Directed kmem charging

2018-02-21 Thread Christopher Lameter
Another way to solve this is to switch the user context right?

Isnt it possible to avoid these patches if do the allocation in another
task context instead?

Are there really any other use cases beyond fsnotify?


The charging of the memory works on a per page level but the allocation
occur from the same page for multiple tasks that may be running on a
system. So how relevant is this for other small objects?

Seems that if you do a large amount of allocations for the same purpose
your chance of accounting it to the right memcg increases. But this is a
game of chance.