Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-02-03 Thread David Rientjes
On Wed, 4 Feb 2015, Minchan Kim wrote:

> > > This is a result of allowing something external (process B) be able to
> > > clear hwm so that you never knew the value went to 100MB.  That's the
> > > definition of a race, I don't know how to explain it any better and making
> > > any connection between clearing PG_referenced and mm->hiwater_rss is a
> > > stretch.  This approach just makes mm->hiwater_rss meaningless.
> > 
> > I understand your concern, but I hope you agree that the functionality we
> > are proposing would be very useful for profiling. Therefore, I suggest
> > adding an extra resettable field to /proc/pid/status (e.g.
> > resettable_hiwater_rss) instead. What is your view on this approach?
> 
> The idea would be very useful for measuring working set size for
> efficient memory management in userside, which becomes very popular
> with many platforms for embedded world with tight memory.
> 

The problem is the same as the aforementioned if you're only going to be 
adding one field.  If another process happens to clear the 
resettable_hiwater_rss before you can read it, you don't see potentially 
large spikes in size.

I understand the need for measuring working set size, and we have an 
in-house solution for that, but I don't think we should be introducing new 
fields that require only one root process on the system to be touching it 
for it to be effective.  Let me talk with some people about how difficult 
it would be to propose our in-house solution.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-02-03 Thread Minchan Kim
Hello,

On Tue, Feb 03, 2015 at 03:26:28AM +, Petr Cermak wrote:
> On Mon, Jan 26, 2015 at 04:00:20PM -0800, David Rientjes wrote:
> > [...]
> > This is a result of allowing something external (process B) be able to
> > clear hwm so that you never knew the value went to 100MB.  That's the
> > definition of a race, I don't know how to explain it any better and making
> > any connection between clearing PG_referenced and mm->hiwater_rss is a
> > stretch.  This approach just makes mm->hiwater_rss meaningless.
> 
> I understand your concern, but I hope you agree that the functionality we
> are proposing would be very useful for profiling. Therefore, I suggest
> adding an extra resettable field to /proc/pid/status (e.g.
> resettable_hiwater_rss) instead. What is your view on this approach?

The idea would be very useful for measuring working set size for
efficient memory management in userside, which becomes very popular
with many platforms for embedded world with tight memory.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-02-03 Thread Minchan Kim
Hello,

On Tue, Feb 03, 2015 at 03:26:28AM +, Petr Cermak wrote:
 On Mon, Jan 26, 2015 at 04:00:20PM -0800, David Rientjes wrote:
  [...]
  This is a result of allowing something external (process B) be able to
  clear hwm so that you never knew the value went to 100MB.  That's the
  definition of a race, I don't know how to explain it any better and making
  any connection between clearing PG_referenced and mm-hiwater_rss is a
  stretch.  This approach just makes mm-hiwater_rss meaningless.
 
 I understand your concern, but I hope you agree that the functionality we
 are proposing would be very useful for profiling. Therefore, I suggest
 adding an extra resettable field to /proc/pid/status (e.g.
 resettable_hiwater_rss) instead. What is your view on this approach?

The idea would be very useful for measuring working set size for
efficient memory management in userside, which becomes very popular
with many platforms for embedded world with tight memory.

 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-02-03 Thread David Rientjes
On Wed, 4 Feb 2015, Minchan Kim wrote:

   This is a result of allowing something external (process B) be able to
   clear hwm so that you never knew the value went to 100MB.  That's the
   definition of a race, I don't know how to explain it any better and making
   any connection between clearing PG_referenced and mm-hiwater_rss is a
   stretch.  This approach just makes mm-hiwater_rss meaningless.
  
  I understand your concern, but I hope you agree that the functionality we
  are proposing would be very useful for profiling. Therefore, I suggest
  adding an extra resettable field to /proc/pid/status (e.g.
  resettable_hiwater_rss) instead. What is your view on this approach?
 
 The idea would be very useful for measuring working set size for
 efficient memory management in userside, which becomes very popular
 with many platforms for embedded world with tight memory.
 

The problem is the same as the aforementioned if you're only going to be 
adding one field.  If another process happens to clear the 
resettable_hiwater_rss before you can read it, you don't see potentially 
large spikes in size.

I understand the need for measuring working set size, and we have an 
in-house solution for that, but I don't think we should be introducing new 
fields that require only one root process on the system to be touching it 
for it to be effective.  Let me talk with some people about how difficult 
it would be to propose our in-house solution.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-02-02 Thread Petr Cermak
On Mon, Jan 26, 2015 at 04:00:20PM -0800, David Rientjes wrote:
> [...]
> This is a result of allowing something external (process B) be able to
> clear hwm so that you never knew the value went to 100MB.  That's the
> definition of a race, I don't know how to explain it any better and making
> any connection between clearing PG_referenced and mm->hiwater_rss is a
> stretch.  This approach just makes mm->hiwater_rss meaningless.

I understand your concern, but I hope you agree that the functionality we
are proposing would be very useful for profiling. Therefore, I suggest
adding an extra resettable field to /proc/pid/status (e.g.
resettable_hiwater_rss) instead. What is your view on this approach?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-02-02 Thread Petr Cermak
On Mon, Jan 26, 2015 at 04:00:20PM -0800, David Rientjes wrote:
 [...]
 This is a result of allowing something external (process B) be able to
 clear hwm so that you never knew the value went to 100MB.  That's the
 definition of a race, I don't know how to explain it any better and making
 any connection between clearing PG_referenced and mm-hiwater_rss is a
 stretch.  This approach just makes mm-hiwater_rss meaningless.

I understand your concern, but I hope you agree that the functionality we
are proposing would be very useful for profiling. Therefore, I suggest
adding an extra resettable field to /proc/pid/status (e.g.
resettable_hiwater_rss) instead. What is your view on this approach?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-26 Thread David Rientjes
On Fri, 23 Jan 2015, Primiano Tucci wrote:

> > If you reset the hwm for a process, rss grows to 100MB, another process
> > resets the hwm, and you see a hwm of 2MB, that invalidates the hwm
> > entirely.
> 
> Not sure I follow this scenario. Where does the 2MB come from?

It's a random number that the hwm gets reset to after the other process 
clears it.

> How can
> you see a hwm of 2MB, under which conditions? HVM can never be < RSS.
> Again, what you are talking about is the case of two profilers racing
> for using the same interface (hwm).
> This is the same case today of the PG_referenced bit.
> 

PG_referenced bit is not tracking the highest rss a process has ever 
attained.  PG_referenced is understood to be clearable at any time and the 
only guarantee is that it was at least cleared before returning from the 
write.  It could be set again before the write returns as well, but we can 
be sure that it was at least cleared.

With your approach, which completely invalidates the entire purpose of 
hwm, the following is possible:

process A   process B
-   -
read hwm = 50MB read hwm = 50MB
write to clear hwm
rss goes to 100MB
write to clear hwm
rss goes to 2MB
read hwm = 2MB  read hwm = 2MB

This is a result of allowing something external (process B) be able to 
clear hwm so that you never knew the value went to 100MB.  That's the 
definition of a race, I don't know how to explain it any better and making 
any connection between clearing PG_referenced and mm->hiwater_rss is a 
stretch.  This approach just makes mm->hiwater_rss meaningless.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-26 Thread David Rientjes
On Fri, 23 Jan 2015, Primiano Tucci wrote:

  If you reset the hwm for a process, rss grows to 100MB, another process
  resets the hwm, and you see a hwm of 2MB, that invalidates the hwm
  entirely.
 
 Not sure I follow this scenario. Where does the 2MB come from?

It's a random number that the hwm gets reset to after the other process 
clears it.

 How can
 you see a hwm of 2MB, under which conditions? HVM can never be  RSS.
 Again, what you are talking about is the case of two profilers racing
 for using the same interface (hwm).
 This is the same case today of the PG_referenced bit.
 

PG_referenced bit is not tracking the highest rss a process has ever 
attained.  PG_referenced is understood to be clearable at any time and the 
only guarantee is that it was at least cleared before returning from the 
write.  It could be set again before the write returns as well, but we can 
be sure that it was at least cleared.

With your approach, which completely invalidates the entire purpose of 
hwm, the following is possible:

process A   process B
-   -
read hwm = 50MB read hwm = 50MB
write to clear hwm
rss goes to 100MB
write to clear hwm
rss goes to 2MB
read hwm = 2MB  read hwm = 2MB

This is a result of allowing something external (process B) be able to 
clear hwm so that you never knew the value went to 100MB.  That's the 
definition of a race, I don't know how to explain it any better and making 
any connection between clearing PG_referenced and mm-hiwater_rss is a 
stretch.  This approach just makes mm-hiwater_rss meaningless.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-22 Thread Primiano Tucci
On Thu, Jan 22, 2015 at 11:27 PM, David Rientjes  wrote:
> If you reset the hwm for a process, rss grows to 100MB, another process
> resets the hwm, and you see a hwm of 2MB, that invalidates the hwm
> entirely.

Not sure I follow this scenario. Where does the 2MB come from? How can
you see a hwm of 2MB, under which conditions? HVM can never be < RSS.
Again, what you are talking about is the case of two profilers racing
for using the same interface (hwm).
This is the same case today of the PG_referenced bit.

> The hwm is already defined as the
> highest rss the process has attained, resetting it and trying to make any
> inference from the result is racy and invalidates the actual value which
> is useful.
The counter arugment is: once you have one very high peak, the hvm
becomes essentially useless for the rest of the lifetime of the
process (until a higher peak comes). This makes very hard to
understand what is going on in the meanwhile (from userspace).

Anyways, are you proposing to pursue a different approach? Is the
approach 2. that petrcermark@ proposed in the beginning of the thread
going to address this concern?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-22 Thread David Rientjes
On Thu, 22 Jan 2015, Primiano Tucci wrote:

> > I think the bigger concern would be that this, and any new line such as
> > resettable_hiwater_rss, invalidates itself entirely.  Any process that
> > checks the hwm will not know of other processes that reset it, so the
> > value itself has no significance anymore.
> >  It would just be the mark since the last clear at an unknown time.
> 
> How is that different from the current logic of clear_refs and the
> corresponding PG_Referenced bit?
> 

If you reset the hwm for a process, rss grows to 100MB, another process 
resets the hwm, and you see a hwm of 2MB, that invalidates the hwm 
entirely.  That's especially true if there's an oom condition that kills a 
process when the rss grew to 100MB but you see a hwm of 2MB and don't 
believe it was possibly the culprit.  The hwm is already defined as the 
highest rss the process has attained, resetting it and trying to make any 
inference from the result is racy and invalidates the actual value which 
is useful.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-22 Thread David Rientjes
On Thu, 22 Jan 2015, Primiano Tucci wrote:

  I think the bigger concern would be that this, and any new line such as
  resettable_hiwater_rss, invalidates itself entirely.  Any process that
  checks the hwm will not know of other processes that reset it, so the
  value itself has no significance anymore.
   It would just be the mark since the last clear at an unknown time.
 
 How is that different from the current logic of clear_refs and the
 corresponding PG_Referenced bit?
 

If you reset the hwm for a process, rss grows to 100MB, another process 
resets the hwm, and you see a hwm of 2MB, that invalidates the hwm 
entirely.  That's especially true if there's an oom condition that kills a 
process when the rss grew to 100MB but you see a hwm of 2MB and don't 
believe it was possibly the culprit.  The hwm is already defined as the 
highest rss the process has attained, resetting it and trying to make any 
inference from the result is racy and invalidates the actual value which 
is useful.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-22 Thread Primiano Tucci
On Thu, Jan 22, 2015 at 11:27 PM, David Rientjes rient...@google.com wrote:
 If you reset the hwm for a process, rss grows to 100MB, another process
 resets the hwm, and you see a hwm of 2MB, that invalidates the hwm
 entirely.

Not sure I follow this scenario. Where does the 2MB come from? How can
you see a hwm of 2MB, under which conditions? HVM can never be  RSS.
Again, what you are talking about is the case of two profilers racing
for using the same interface (hwm).
This is the same case today of the PG_referenced bit.

 The hwm is already defined as the
 highest rss the process has attained, resetting it and trying to make any
 inference from the result is racy and invalidates the actual value which
 is useful.
The counter arugment is: once you have one very high peak, the hvm
becomes essentially useless for the rest of the lifetime of the
process (until a higher peak comes). This makes very hard to
understand what is going on in the meanwhile (from userspace).

Anyways, are you proposing to pursue a different approach? Is the
approach 2. that petrcermark@ proposed in the beginning of the thread
going to address this concern?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-21 Thread Primiano Tucci
On Wed, Jan 21, 2015 at 10:58 PM, David Rientjes  wrote:
> I think the bigger concern would be that this, and any new line such as
> resettable_hiwater_rss, invalidates itself entirely.  Any process that
> checks the hwm will not know of other processes that reset it, so the
> value itself has no significance anymore.
>  It would just be the mark since the last clear at an unknown time.

How is that different from the current logic of clear_refs and the
corresponding PG_Referenced bit?

> Userspace can monitor the rss of a
> process by reading /proc/pid/stat, there's no need for the kernel to do
> something that userspace can do.

I disagree here. The driving motivation of this patch is precisely the
opposite. There are peak events that last for very short time (order:
10-100 ms) and are practically invisible from user-space (even doing
something awkward like polling in a tight loop). Concrete examples
are: GPU memory transfers, image decoding, compression /
decompression.
These kinds of tasks, which use scratch buffers for few ms, can create
significant (yet short lasting) memory pressure which is desirable to
monitor.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-21 Thread David Rientjes
On Thu, 15 Jan 2015, Kirill A. Shutemov wrote:

> I'm not sure if it should be considered ABI break or not. Just asking.
> 
> I would like to hear opinion from other people.
>  

I think the bigger concern would be that this, and any new line such as 
resettable_hiwater_rss, invalidates itself entirely.  Any process that 
checks the hwm will not know of other processes that reset it, so the 
value itself has no significance anymore.  It would just be the mark since 
the last clear at an unknown time.  Userspace can monitor the rss of a 
process by reading /proc/pid/stat, there's no need for the kernel to do 
something that userspace can do.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-21 Thread Primiano Tucci
On Wed, Jan 21, 2015 at 10:58 PM, David Rientjes rient...@google.com wrote:
 I think the bigger concern would be that this, and any new line such as
 resettable_hiwater_rss, invalidates itself entirely.  Any process that
 checks the hwm will not know of other processes that reset it, so the
 value itself has no significance anymore.
  It would just be the mark since the last clear at an unknown time.

How is that different from the current logic of clear_refs and the
corresponding PG_Referenced bit?

 Userspace can monitor the rss of a
 process by reading /proc/pid/stat, there's no need for the kernel to do
 something that userspace can do.

I disagree here. The driving motivation of this patch is precisely the
opposite. There are peak events that last for very short time (order:
10-100 ms) and are practically invisible from user-space (even doing
something awkward like polling in a tight loop). Concrete examples
are: GPU memory transfers, image decoding, compression /
decompression.
These kinds of tasks, which use scratch buffers for few ms, can create
significant (yet short lasting) memory pressure which is desirable to
monitor.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-21 Thread David Rientjes
On Thu, 15 Jan 2015, Kirill A. Shutemov wrote:

 I'm not sure if it should be considered ABI break or not. Just asking.
 
 I would like to hear opinion from other people.
  

I think the bigger concern would be that this, and any new line such as 
resettable_hiwater_rss, invalidates itself entirely.  Any process that 
checks the hwm will not know of other processes that reset it, so the 
value itself has no significance anymore.  It would just be the mark since 
the last clear at an unknown time.  Userspace can monitor the rss of a 
process by reading /proc/pid/stat, there's no need for the kernel to do 
something that userspace can do.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-15 Thread Petr Cermak
On Thu, Jan 15, 2015 at 01:36:30AM +0200, Kirill A. Shutemov wrote:
> On Wed, Jan 14, 2015 at 03:22:25PM +, Petr Cermak wrote:
> > On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
> > > And how it's not an ABI break?
> > I don't think this is an ABI break because the current behaviour is not
> > changed unless you write "5" to /proc/pid/clear_refs. If you do, you are
> > explicitly requesting the new functionality.
> 
> I'm not sure if it should be considered ABI break or not. Just asking.
> 
> I would like to hear opinion from other people.

We would like to get more feedback as well.

> > > We have never-lowering VmHWM for 9+ years. How can you know that nobody
> > > expects this behaviour?
> > This is why we sent an RFC [1] several weeks ago. We expect this to be
> > used mainly by performance-related tools (e.g. profilers) and from the
> > comments in the code [2] VmHWM seems to be a best-effort counter. If this
> > is strictly a no-go, I can only think of the following two alternatives:
> > 
> >   1. Add an extra resettable field to /proc/pid/status (e.g.
> >  resettable_hiwater_rss). While this doesn't violate the current
> >  definition of VmHWM, it adds an extra line to /proc/pid/status,
> >  which I think is a much bigger issue.
> 
> I don't think extra line is bigger issue. Sane applications would look for
> a key, not line number. We do add lines there. I've posted patch which
> adds one more just today ;)

In that case, should we add an extra field to /proc/pid/status?

> >   2. Introduce a new proc fs file to task_mmu (e.g.
> >  /proc/pid/profiler_stats), but this feels like overengineering.
> 
> BTW, we have memory.max_usage_in_byte in memory cgroup. And it's resetable.
> Wouldn't it be enough for your profiling use-case?

This is a very interesting point, but it doesn't cover all use cases.
Our specific use case is memory profiling of the Chromium browser, but I
think that the same considerations below apply to any other use case
which involves child sub-processes:

  1. The process must be added to the control group explicitly by root.
 Hence, the Chromium process cannot do this itself.
  2. All forked children are implicitly grouped in the same control
 group. This is a problem because we want to be able to measure
 memory usage of the child processes separately.

The advantage of using clear_refs instead is that Chomium would be able
to profile its memory usage without any external intervention (as it's
already the case with performance profiling).

> > > And why do you reset hiwater_rss, but not hiwater_vm?
> > This is a good point. Should we reset both using the same flag, or
> > introduce a new one ("6")?
> > 
> > [1] lkml.iu.edu/hypermail/linux/kernel/1412.1/01877.html
> > [2] task_mmu.c:32: "... such snapshots can always be inconsistent."

Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-15 Thread Petr Cermak
On Thu, Jan 15, 2015 at 01:36:30AM +0200, Kirill A. Shutemov wrote:
 On Wed, Jan 14, 2015 at 03:22:25PM +, Petr Cermak wrote:
  On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
   And how it's not an ABI break?
  I don't think this is an ABI break because the current behaviour is not
  changed unless you write 5 to /proc/pid/clear_refs. If you do, you are
  explicitly requesting the new functionality.
 
 I'm not sure if it should be considered ABI break or not. Just asking.
 
 I would like to hear opinion from other people.

We would like to get more feedback as well.

   We have never-lowering VmHWM for 9+ years. How can you know that nobody
   expects this behaviour?
  This is why we sent an RFC [1] several weeks ago. We expect this to be
  used mainly by performance-related tools (e.g. profilers) and from the
  comments in the code [2] VmHWM seems to be a best-effort counter. If this
  is strictly a no-go, I can only think of the following two alternatives:
  
1. Add an extra resettable field to /proc/pid/status (e.g.
   resettable_hiwater_rss). While this doesn't violate the current
   definition of VmHWM, it adds an extra line to /proc/pid/status,
   which I think is a much bigger issue.
 
 I don't think extra line is bigger issue. Sane applications would look for
 a key, not line number. We do add lines there. I've posted patch which
 adds one more just today ;)

In that case, should we add an extra field to /proc/pid/status?

2. Introduce a new proc fs file to task_mmu (e.g.
   /proc/pid/profiler_stats), but this feels like overengineering.
 
 BTW, we have memory.max_usage_in_byte in memory cgroup. And it's resetable.
 Wouldn't it be enough for your profiling use-case?

This is a very interesting point, but it doesn't cover all use cases.
Our specific use case is memory profiling of the Chromium browser, but I
think that the same considerations below apply to any other use case
which involves child sub-processes:

  1. The process must be added to the control group explicitly by root.
 Hence, the Chromium process cannot do this itself.
  2. All forked children are implicitly grouped in the same control
 group. This is a problem because we want to be able to measure
 memory usage of the child processes separately.

The advantage of using clear_refs instead is that Chomium would be able
to profile its memory usage without any external intervention (as it's
already the case with performance profiling).

   And why do you reset hiwater_rss, but not hiwater_vm?
  This is a good point. Should we reset both using the same flag, or
  introduce a new one (6)?
  
  [1] lkml.iu.edu/hypermail/linux/kernel/1412.1/01877.html
  [2] task_mmu.c:32: ... such snapshots can always be inconsistent.

Petr
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-14 Thread Kirill A. Shutemov
On Wed, Jan 14, 2015 at 03:22:25PM +, Petr Cermak wrote:
> On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
> > And how it's not an ABI break?
> I don't think this is an ABI break because the current behaviour is not
> changed unless you write "5" to /proc/pid/clear_refs. If you do, you are
> explicitly requesting the new functionality.
> 
> > We have never-lowering VmHWM for 9+ years. How can you know that nobody
> > expects this behaviour?
> This is why we sent an RFC [1] several weeks ago. We expect this to be
> used mainly by performance-related tools (e.g. profilers) and from the
> comments in the code [2] VmHWM seems to be a best-effort counter. If this
> is strictly a no-go, I can only think of the following two alternatives:
> 
>   1. Add an extra resettable field to /proc/pid/status (e.g.
>  resettable_hiwater_rss). While this doesn't violate the current
>  definition of VmHWM, it adds an extra line to /proc/pid/status,
>  which I think is a much bigger issue.
>   2. Introduce a new proc fs file to task_mmu (e.g.
>  /proc/pid/profiler_stats), but this feels like overengineering.

BTW, we have memory.max_usage_in_byte in memory cgroup. And it's resetable.
Wouldn't it be enough for your profiling use-case?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-14 Thread Kirill A. Shutemov
On Wed, Jan 14, 2015 at 03:22:25PM +, Petr Cermak wrote:
> On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
> > And how it's not an ABI break?
> I don't think this is an ABI break because the current behaviour is not
> changed unless you write "5" to /proc/pid/clear_refs. If you do, you are
> explicitly requesting the new functionality.

I'm not sure if it should be considered ABI break or not. Just asking.

I would like to hear opinion from other people.
 
> > We have never-lowering VmHWM for 9+ years. How can you know that nobody
> > expects this behaviour?
> This is why we sent an RFC [1] several weeks ago. We expect this to be
> used mainly by performance-related tools (e.g. profilers) and from the
> comments in the code [2] VmHWM seems to be a best-effort counter. If this
> is strictly a no-go, I can only think of the following two alternatives:
> 
>   1. Add an extra resettable field to /proc/pid/status (e.g.
>  resettable_hiwater_rss). While this doesn't violate the current
>  definition of VmHWM, it adds an extra line to /proc/pid/status,
>  which I think is a much bigger issue.

I don't think extra line is bigger issue. Sane applications would look for
a key, not line number. We do add lines there. I've posted patch which
adds one more just today ;)

>   2. Introduce a new proc fs file to task_mmu (e.g.
>  /proc/pid/profiler_stats), but this feels like overengineering.
> 
> > And why do you reset hiwater_rss, but not hiwater_vm?
> This is a good point. Should we reset both using the same flag, or
> introduce a new one ("6")?
> 
> [1] lkml.iu.edu/hypermail/linux/kernel/1412.1/01877.html
> [2] task_mmu.c:32: "... such snapshots can always be inconsistent."
-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-14 Thread Petr Cermak
On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
> And how it's not an ABI break?
I don't think this is an ABI break because the current behaviour is not
changed unless you write "5" to /proc/pid/clear_refs. If you do, you are
explicitly requesting the new functionality.

> We have never-lowering VmHWM for 9+ years. How can you know that nobody
> expects this behaviour?
This is why we sent an RFC [1] several weeks ago. We expect this to be
used mainly by performance-related tools (e.g. profilers) and from the
comments in the code [2] VmHWM seems to be a best-effort counter. If this
is strictly a no-go, I can only think of the following two alternatives:

  1. Add an extra resettable field to /proc/pid/status (e.g.
 resettable_hiwater_rss). While this doesn't violate the current
 definition of VmHWM, it adds an extra line to /proc/pid/status,
 which I think is a much bigger issue.
  2. Introduce a new proc fs file to task_mmu (e.g.
 /proc/pid/profiler_stats), but this feels like overengineering.

> And why do you reset hiwater_rss, but not hiwater_vm?
This is a good point. Should we reset both using the same flag, or
introduce a new one ("6")?

[1] lkml.iu.edu/hypermail/linux/kernel/1412.1/01877.html
[2] task_mmu.c:32: "... such snapshots can always be inconsistent."

Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-14 Thread Kirill A. Shutemov
On Wed, Jan 14, 2015 at 03:22:25PM +, Petr Cermak wrote:
 On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
  And how it's not an ABI break?
 I don't think this is an ABI break because the current behaviour is not
 changed unless you write 5 to /proc/pid/clear_refs. If you do, you are
 explicitly requesting the new functionality.

I'm not sure if it should be considered ABI break or not. Just asking.

I would like to hear opinion from other people.
 
  We have never-lowering VmHWM for 9+ years. How can you know that nobody
  expects this behaviour?
 This is why we sent an RFC [1] several weeks ago. We expect this to be
 used mainly by performance-related tools (e.g. profilers) and from the
 comments in the code [2] VmHWM seems to be a best-effort counter. If this
 is strictly a no-go, I can only think of the following two alternatives:
 
   1. Add an extra resettable field to /proc/pid/status (e.g.
  resettable_hiwater_rss). While this doesn't violate the current
  definition of VmHWM, it adds an extra line to /proc/pid/status,
  which I think is a much bigger issue.

I don't think extra line is bigger issue. Sane applications would look for
a key, not line number. We do add lines there. I've posted patch which
adds one more just today ;)

   2. Introduce a new proc fs file to task_mmu (e.g.
  /proc/pid/profiler_stats), but this feels like overengineering.
 
  And why do you reset hiwater_rss, but not hiwater_vm?
 This is a good point. Should we reset both using the same flag, or
 introduce a new one (6)?
 
 [1] lkml.iu.edu/hypermail/linux/kernel/1412.1/01877.html
 [2] task_mmu.c:32: ... such snapshots can always be inconsistent.
-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-14 Thread Kirill A. Shutemov
On Wed, Jan 14, 2015 at 03:22:25PM +, Petr Cermak wrote:
 On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
  And how it's not an ABI break?
 I don't think this is an ABI break because the current behaviour is not
 changed unless you write 5 to /proc/pid/clear_refs. If you do, you are
 explicitly requesting the new functionality.
 
  We have never-lowering VmHWM for 9+ years. How can you know that nobody
  expects this behaviour?
 This is why we sent an RFC [1] several weeks ago. We expect this to be
 used mainly by performance-related tools (e.g. profilers) and from the
 comments in the code [2] VmHWM seems to be a best-effort counter. If this
 is strictly a no-go, I can only think of the following two alternatives:
 
   1. Add an extra resettable field to /proc/pid/status (e.g.
  resettable_hiwater_rss). While this doesn't violate the current
  definition of VmHWM, it adds an extra line to /proc/pid/status,
  which I think is a much bigger issue.
   2. Introduce a new proc fs file to task_mmu (e.g.
  /proc/pid/profiler_stats), but this feels like overengineering.

BTW, we have memory.max_usage_in_byte in memory cgroup. And it's resetable.
Wouldn't it be enough for your profiling use-case?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-14 Thread Petr Cermak
On Wed, Jan 07, 2015 at 07:24:52PM +0200, Kirill A. Shutemov wrote:
 And how it's not an ABI break?
I don't think this is an ABI break because the current behaviour is not
changed unless you write 5 to /proc/pid/clear_refs. If you do, you are
explicitly requesting the new functionality.

 We have never-lowering VmHWM for 9+ years. How can you know that nobody
 expects this behaviour?
This is why we sent an RFC [1] several weeks ago. We expect this to be
used mainly by performance-related tools (e.g. profilers) and from the
comments in the code [2] VmHWM seems to be a best-effort counter. If this
is strictly a no-go, I can only think of the following two alternatives:

  1. Add an extra resettable field to /proc/pid/status (e.g.
 resettable_hiwater_rss). While this doesn't violate the current
 definition of VmHWM, it adds an extra line to /proc/pid/status,
 which I think is a much bigger issue.
  2. Introduce a new proc fs file to task_mmu (e.g.
 /proc/pid/profiler_stats), but this feels like overengineering.

 And why do you reset hiwater_rss, but not hiwater_vm?
This is a good point. Should we reset both using the same flag, or
introduce a new one (6)?

[1] lkml.iu.edu/hypermail/linux/kernel/1412.1/01877.html
[2] task_mmu.c:32: ... such snapshots can always be inconsistent.

Petr
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm->hiwater_rss (peak RSS)

2015-01-07 Thread Kirill A. Shutemov
On Wed, Jan 07, 2015 at 05:06:54PM +, Petr Cermak wrote:
> Peak resident size of a process can be reset by writing "5" to
> /proc/pid/clear_refs. The driving use-case for this would be getting the
> peak RSS value, which can be retrieved from the VmHWM field in
> /proc/pid/status, per benchmark iteration or test scenario.

And how it's not an ABI break?

We have never-lowering VmHWM for 9+ years. How can you know that nobody
expects this behaviour?

And why do you reset hiwater_rss, but not hiwater_vm?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] task_mmu: Add user-space support for resetting mm-hiwater_rss (peak RSS)

2015-01-07 Thread Kirill A. Shutemov
On Wed, Jan 07, 2015 at 05:06:54PM +, Petr Cermak wrote:
 Peak resident size of a process can be reset by writing 5 to
 /proc/pid/clear_refs. The driving use-case for this would be getting the
 peak RSS value, which can be retrieved from the VmHWM field in
 /proc/pid/status, per benchmark iteration or test scenario.

And how it's not an ABI break?

We have never-lowering VmHWM for 9+ years. How can you know that nobody
expects this behaviour?

And why do you reset hiwater_rss, but not hiwater_vm?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/