Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2019-01-03 Thread Steven Rostedt
On Thu, 3 Jan 2019 23:02:07 -0500
Steven Rostedt  wrote:

> On Thu, 27 Apr 2017 23:03:24 +0530
> Pratyush Anand  wrote:
> 
> Hi Pratyush,
> 
> I was digging through old email, and came across this. Our thread sorta
> just died. Are you still interested in implementing "trace-cmd top"?
> 
> We've changed the code structure quite a bit since this came out so it
> would require a rewrite. But would actually fit better now.
> 

And it appears you don't work for Red Hat anymore!

-- Steve

> 
> 
> > On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:  
> > > On Thu, 27 Apr 2017 19:32:43 +0530
> > > Pratyush Anand  wrote:
> > >
> > >> I will implement your review comments and will send next revision.
> > >> However, I had couple of observation which I was unable to justify:
> > >>
> > >> # ./trace-cmd top -s /tmp/test
> > >> # ./trace-cmd top -p /tmp/test | grep trace-cmd
> > >>15292   trace-cmd   22144   15808
> > >
> > > What does it give for your /tmp/test ?
> > 
> > /tmp/test is name of socket through which both trace aggregator (-s) and 
> > printer (-p) process are talking to each other.
> >   
> > >
> > > Note, tracing doesn't start till after trace-cmd is loaded. Everything
> > > before is not going to be seen.
> > 
> > Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it 
> > aggregates 
> > stats for all processes including self process as well. After some time I 
> > execute `./trace-cmd top -p /tmp/test` which prints stats for all the 
> > processes including self. In the above example, I see that peak memory 
> > calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.
> >   
> > >
> > >> Here,15292 is the pid of trace-cmd task
> > >> 22144 KB is the peak memory usage
> > >> 15808 KB is the current memory usage
> > >>
> > >> Now check rss component from statm
> > >> # cat /proc/15292/statm
> > >>50 35 23 7 0 12 0 36
> > >>
> > >> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 
> > >> 35
> > >> pages = 35*64 = 2240KB
> > >> I patched my kernel [2] for test purpose, so that statm reports peak 
> > >> memory as
> > >> well. Here, the last extra entry in statm output is peak memory and it 
> > >> is 36
> > >> pages = 2304KB.
> > >> So, this is a huge difference between what has been reported by statm 
> > >> and what
> > >> we get from trace-cmd.
> > >> I understand that `trace-cmd top` output would only be approximate, 
> > >> because
> > >> many of the memory could be allocated by task and freed in interrupt 
> > >> context.
> > >> So, many a time it can differ. But, is such a huge difference justified? 
> > >> If
> > >> yes, can we count on the output of this utility to find early boot time 
> > >> oom
> > >> issues?
> > >
> > > Doesn't it only just trace the memory usage of when the tracing starts?   
> > >  
> > 
> > So,should be less than what is being reported by statm, right? But, here we 
> > see that value reported by trace-cmd is far more than rss value of statm.
> > 
> > Well, there is one known bug in the code, which I am trying to find out a 
> > way. 
> > But, that should not cause this huge difference.
> > 
> > When we receive a trace entry for any new process, we read rss value from 
> > statm. We think,that these are the memory consumption by that process 
> > before 
> > tracing started. So,we initialize peak and current memory value for that 
> > process with rss from statm, and then we increase current when there is an 
> > entry for mm_page_alloc() and decrease when an entry for mm_page_free(). 
> > peak 
> > value of current at any point is reported as peak memory.
> > 
> > Now, lets consider a process p1 which was existing before tracing started. 
> > its 
> > rss was having value r1 KB. we receive first trace entry for p1, which says 
> > 8KB was allocated for p1.
> > 
> > p1->cur = r1 + 8;
> > p->peak = r1 + 8;
> > 
> > We receive another entry with 4KB de-allocation.
> > 
> > p1->cur = r1 + 4;
> > p->peak = r1 + 8;
> > 
> > There would have been some of these entries which have also been taken care 
> > in 
> > statm/rss (r1 here). We do not know how many entries were already 
> > considered.
> > Currently, these n (unknown) entries [which had been generated between (a) 
> > 1st 
> > tarce entry and (b) when trace-cmd top reads statm/rss] are considered 
> > twice, 
> > which is a bug.
> > 
> > ~Pratyush
> >   
> > >>
> > >> [1]
> > >> https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
> > >> [2]
> > >> https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517
> > >> 
> > >
> 



Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2019-01-03 Thread Steven Rostedt
On Thu, 27 Apr 2017 23:03:24 +0530
Pratyush Anand  wrote:

Hi Pratyush,

I was digging through old email, and came across this. Our thread sorta
just died. Are you still interested in implementing "trace-cmd top"?

We've changed the code structure quite a bit since this came out so it
would require a rewrite. But would actually fit better now.

-- Steve


> On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:
> > On Thu, 27 Apr 2017 19:32:43 +0530
> > Pratyush Anand  wrote:
> >  
> >> I will implement your review comments and will send next revision.
> >> However, I had couple of observation which I was unable to justify:
> >>
> >> # ./trace-cmd top -s /tmp/test
> >> # ./trace-cmd top -p /tmp/test | grep trace-cmd
> >>15292   trace-cmd   22144   15808  
> >
> > What does it give for your /tmp/test ?  
> 
> /tmp/test is name of socket through which both trace aggregator (-s) and 
> printer (-p) process are talking to each other.
> 
> >
> > Note, tracing doesn't start till after trace-cmd is loaded. Everything
> > before is not going to be seen.  
> 
> Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates 
> stats for all processes including self process as well. After some time I 
> execute `./trace-cmd top -p /tmp/test` which prints stats for all the 
> processes including self. In the above example, I see that peak memory 
> calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.
> 
> >  
> >> Here,15292 is the pid of trace-cmd task
> >> 22144 KB is the peak memory usage
> >> 15808 KB is the current memory usage
> >>
> >> Now check rss component from statm
> >> # cat /proc/15292/statm
> >>50 35 23 7 0 12 0 36
> >>
> >> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
> >> pages = 35*64 = 2240KB
> >> I patched my kernel [2] for test purpose, so that statm reports peak 
> >> memory as
> >> well. Here, the last extra entry in statm output is peak memory and it is 
> >> 36
> >> pages = 2304KB.
> >> So, this is a huge difference between what has been reported by statm and 
> >> what
> >> we get from trace-cmd.
> >> I understand that `trace-cmd top` output would only be approximate, because
> >> many of the memory could be allocated by task and freed in interrupt 
> >> context.
> >> So, many a time it can differ. But, is such a huge difference justified? If
> >> yes, can we count on the output of this utility to find early boot time oom
> >> issues?  
> >
> > Doesn't it only just trace the memory usage of when the tracing starts?  
> 
> So,should be less than what is being reported by statm, right? But, here we 
> see that value reported by trace-cmd is far more than rss value of statm.
> 
> Well, there is one known bug in the code, which I am trying to find out a 
> way. 
> But, that should not cause this huge difference.
> 
> When we receive a trace entry for any new process, we read rss value from 
> statm. We think,that these are the memory consumption by that process before 
> tracing started. So,we initialize peak and current memory value for that 
> process with rss from statm, and then we increase current when there is an 
> entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak 
> value of current at any point is reported as peak memory.
> 
> Now, lets consider a process p1 which was existing before tracing started. 
> its 
> rss was having value r1 KB. we receive first trace entry for p1, which says 
> 8KB was allocated for p1.
> 
> p1->cur = r1 + 8;
> p->peak = r1 + 8;
> 
> We receive another entry with 4KB de-allocation.
> 
> p1->cur = r1 + 4;
> p->peak = r1 + 8;
> 
> There would have been some of these entries which have also been taken care 
> in 
> statm/rss (r1 here). We do not know how many entries were already considered.
> Currently, these n (unknown) entries [which had been generated between (a) 
> 1st 
> tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, 
> which is a bug.
> 
> ~Pratyush
> 
> >>
> >> [1]
> >> https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
> >> [2]
> >> https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517
> >>   
> >  



Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Steven Rostedt
On Thu, 27 Apr 2017 23:03:24 +0530
Pratyush Anand  wrote:

> On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:
> > On Thu, 27 Apr 2017 19:32:43 +0530
> > Pratyush Anand  wrote:
> >  
> >> I will implement your review comments and will send next revision.
> >> However, I had couple of observation which I was unable to justify:
> >>
> >> # ./trace-cmd top -s /tmp/test
> >> # ./trace-cmd top -p /tmp/test | grep trace-cmd
> >>15292   trace-cmd   22144   15808  
> >
> > What does it give for your /tmp/test ?  
> 
> /tmp/test is name of socket through which both trace aggregator (-s) and 
> printer (-p) process are talking to each other.

Ah ok, I see what you are doing. I rather not need to have a socket
that is specified. Perhaps just something like normal "top".


> 
> >
> > Note, tracing doesn't start till after trace-cmd is loaded. Everything
> > before is not going to be seen.  
> 
> Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates 
> stats for all processes including self process as well. After some time I 
> execute `./trace-cmd top -p /tmp/test` which prints stats for all the 
> processes including self. In the above example, I see that peak memory 
> calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.
> 
> >  
> >> Here,15292 is the pid of trace-cmd task
> >> 22144 KB is the peak memory usage
> >> 15808 KB is the current memory usage
> >>
> >> Now check rss component from statm
> >> # cat /proc/15292/statm
> >>50 35 23 7 0 12 0 36
> >>
> >> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
> >> pages = 35*64 = 2240KB
> >> I patched my kernel [2] for test purpose, so that statm reports peak 
> >> memory as
> >> well. Here, the last extra entry in statm output is peak memory and it is 
> >> 36
> >> pages = 2304KB.
> >> So, this is a huge difference between what has been reported by statm and 
> >> what
> >> we get from trace-cmd.
> >> I understand that `trace-cmd top` output would only be approximate, because
> >> many of the memory could be allocated by task and freed in interrupt 
> >> context.
> >> So, many a time it can differ. But, is such a huge difference justified? If
> >> yes, can we count on the output of this utility to find early boot time oom
> >> issues?  
> >
> > Doesn't it only just trace the memory usage of when the tracing starts?  
> 
> So,should be less than what is being reported by statm, right? But, here we 
> see that value reported by trace-cmd is far more than rss value of statm.
> 
> Well, there is one known bug in the code, which I am trying to find out a 
> way. 
> But, that should not cause this huge difference.
> 
> When we receive a trace entry for any new process, we read rss value from 
> statm. We think,that these are the memory consumption by that process before 
> tracing started. So,we initialize peak and current memory value for that 
> process with rss from statm, and then we increase current when there is an 
> entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak 
> value of current at any point is reported as peak memory.
> 
> Now, lets consider a process p1 which was existing before tracing started. 
> its 
> rss was having value r1 KB. we receive first trace entry for p1, which says 
> 8KB was allocated for p1.
> 
> p1->cur = r1 + 8;
> p->peak = r1 + 8;
> 
> We receive another entry with 4KB de-allocation.
> 
> p1->cur = r1 + 4;
> p->peak = r1 + 8;
> 
> There would have been some of these entries which have also been taken care 
> in 
> statm/rss (r1 here). We do not know how many entries were already considered.
> Currently, these n (unknown) entries [which had been generated between (a) 
> 1st 
> tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, 
> which is a bug.
> 

Note, it appears you are tracing all the alloc and frees of pages. This
can be very deceiving. There can be a lot of cases where pages are
allocated by the application, and then freed via rcu. That is very
common. Those that are freed via rcu will not be accounted for.

-- Steve


Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Steven Rostedt
On Thu, 27 Apr 2017 23:03:24 +0530
Pratyush Anand  wrote:

> On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:
> > On Thu, 27 Apr 2017 19:32:43 +0530
> > Pratyush Anand  wrote:
> >  
> >> I will implement your review comments and will send next revision.
> >> However, I had couple of observation which I was unable to justify:
> >>
> >> # ./trace-cmd top -s /tmp/test
> >> # ./trace-cmd top -p /tmp/test | grep trace-cmd
> >>15292   trace-cmd   22144   15808  
> >
> > What does it give for your /tmp/test ?  
> 
> /tmp/test is name of socket through which both trace aggregator (-s) and 
> printer (-p) process are talking to each other.

Ah ok, I see what you are doing. I rather not need to have a socket
that is specified. Perhaps just something like normal "top".


> 
> >
> > Note, tracing doesn't start till after trace-cmd is loaded. Everything
> > before is not going to be seen.  
> 
> Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates 
> stats for all processes including self process as well. After some time I 
> execute `./trace-cmd top -p /tmp/test` which prints stats for all the 
> processes including self. In the above example, I see that peak memory 
> calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.
> 
> >  
> >> Here,15292 is the pid of trace-cmd task
> >> 22144 KB is the peak memory usage
> >> 15808 KB is the current memory usage
> >>
> >> Now check rss component from statm
> >> # cat /proc/15292/statm
> >>50 35 23 7 0 12 0 36
> >>
> >> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
> >> pages = 35*64 = 2240KB
> >> I patched my kernel [2] for test purpose, so that statm reports peak 
> >> memory as
> >> well. Here, the last extra entry in statm output is peak memory and it is 
> >> 36
> >> pages = 2304KB.
> >> So, this is a huge difference between what has been reported by statm and 
> >> what
> >> we get from trace-cmd.
> >> I understand that `trace-cmd top` output would only be approximate, because
> >> many of the memory could be allocated by task and freed in interrupt 
> >> context.
> >> So, many a time it can differ. But, is such a huge difference justified? If
> >> yes, can we count on the output of this utility to find early boot time oom
> >> issues?  
> >
> > Doesn't it only just trace the memory usage of when the tracing starts?  
> 
> So,should be less than what is being reported by statm, right? But, here we 
> see that value reported by trace-cmd is far more than rss value of statm.
> 
> Well, there is one known bug in the code, which I am trying to find out a 
> way. 
> But, that should not cause this huge difference.
> 
> When we receive a trace entry for any new process, we read rss value from 
> statm. We think,that these are the memory consumption by that process before 
> tracing started. So,we initialize peak and current memory value for that 
> process with rss from statm, and then we increase current when there is an 
> entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak 
> value of current at any point is reported as peak memory.
> 
> Now, lets consider a process p1 which was existing before tracing started. 
> its 
> rss was having value r1 KB. we receive first trace entry for p1, which says 
> 8KB was allocated for p1.
> 
> p1->cur = r1 + 8;
> p->peak = r1 + 8;
> 
> We receive another entry with 4KB de-allocation.
> 
> p1->cur = r1 + 4;
> p->peak = r1 + 8;
> 
> There would have been some of these entries which have also been taken care 
> in 
> statm/rss (r1 here). We do not know how many entries were already considered.
> Currently, these n (unknown) entries [which had been generated between (a) 
> 1st 
> tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, 
> which is a bug.
> 

Note, it appears you are tracing all the alloc and frees of pages. This
can be very deceiving. There can be a lot of cases where pages are
allocated by the application, and then freed via rcu. That is very
common. Those that are freed via rcu will not be accounted for.

-- Steve


Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand



On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:

On Thu, 27 Apr 2017 19:32:43 +0530
Pratyush Anand  wrote:


I will implement your review comments and will send next revision.
However, I had couple of observation which I was unable to justify:

# ./trace-cmd top -s /tmp/test
# ./trace-cmd top -p /tmp/test | grep trace-cmd
   15292   trace-cmd   22144   15808


What does it give for your /tmp/test ?


/tmp/test is name of socket through which both trace aggregator (-s) and 
printer (-p) process are talking to each other.




Note, tracing doesn't start till after trace-cmd is loaded. Everything
before is not going to be seen.


Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates 
stats for all processes including self process as well. After some time I 
execute `./trace-cmd top -p /tmp/test` which prints stats for all the 
processes including self. In the above example, I see that peak memory 
calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.





Here,15292 is the pid of trace-cmd task
22144 KB is the peak memory usage
15808 KB is the current memory usage

Now check rss component from statm
# cat /proc/15292/statm
   50 35 23 7 0 12 0 36

This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
pages = 35*64 = 2240KB
I patched my kernel [2] for test purpose, so that statm reports peak memory as
well. Here, the last extra entry in statm output is peak memory and it is 36
pages = 2304KB.
So, this is a huge difference between what has been reported by statm and what
we get from trace-cmd.
I understand that `trace-cmd top` output would only be approximate, because
many of the memory could be allocated by task and freed in interrupt context.
So, many a time it can differ. But, is such a huge difference justified? If
yes, can we count on the output of this utility to find early boot time oom
issues?


Doesn't it only just trace the memory usage of when the tracing starts?


So,should be less than what is being reported by statm, right? But, here we 
see that value reported by trace-cmd is far more than rss value of statm.


Well, there is one known bug in the code, which I am trying to find out a way. 
But, that should not cause this huge difference.


When we receive a trace entry for any new process, we read rss value from 
statm. We think,that these are the memory consumption by that process before 
tracing started. So,we initialize peak and current memory value for that 
process with rss from statm, and then we increase current when there is an 
entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak 
value of current at any point is reported as peak memory.


Now, lets consider a process p1 which was existing before tracing started. its 
rss was having value r1 KB. we receive first trace entry for p1, which says 
8KB was allocated for p1.


p1->cur = r1 + 8;
p->peak = r1 + 8;

We receive another entry with 4KB de-allocation.

p1->cur = r1 + 4;
p->peak = r1 + 8;

There would have been some of these entries which have also been taken care in 
statm/rss (r1 here). We do not know how many entries were already considered.
Currently, these n (unknown) entries [which had been generated between (a) 1st 
tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, 
which is a bug.


~Pratyush



[1]
https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
[2]
https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517




Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand



On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:

On Thu, 27 Apr 2017 19:32:43 +0530
Pratyush Anand  wrote:


I will implement your review comments and will send next revision.
However, I had couple of observation which I was unable to justify:

# ./trace-cmd top -s /tmp/test
# ./trace-cmd top -p /tmp/test | grep trace-cmd
   15292   trace-cmd   22144   15808


What does it give for your /tmp/test ?


/tmp/test is name of socket through which both trace aggregator (-s) and 
printer (-p) process are talking to each other.




Note, tracing doesn't start till after trace-cmd is loaded. Everything
before is not going to be seen.


Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates 
stats for all processes including self process as well. After some time I 
execute `./trace-cmd top -p /tmp/test` which prints stats for all the 
processes including self. In the above example, I see that peak memory 
calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.





Here,15292 is the pid of trace-cmd task
22144 KB is the peak memory usage
15808 KB is the current memory usage

Now check rss component from statm
# cat /proc/15292/statm
   50 35 23 7 0 12 0 36

This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
pages = 35*64 = 2240KB
I patched my kernel [2] for test purpose, so that statm reports peak memory as
well. Here, the last extra entry in statm output is peak memory and it is 36
pages = 2304KB.
So, this is a huge difference between what has been reported by statm and what
we get from trace-cmd.
I understand that `trace-cmd top` output would only be approximate, because
many of the memory could be allocated by task and freed in interrupt context.
So, many a time it can differ. But, is such a huge difference justified? If
yes, can we count on the output of this utility to find early boot time oom
issues?


Doesn't it only just trace the memory usage of when the tracing starts?


So,should be less than what is being reported by statm, right? But, here we 
see that value reported by trace-cmd is far more than rss value of statm.


Well, there is one known bug in the code, which I am trying to find out a way. 
But, that should not cause this huge difference.


When we receive a trace entry for any new process, we read rss value from 
statm. We think,that these are the memory consumption by that process before 
tracing started. So,we initialize peak and current memory value for that 
process with rss from statm, and then we increase current when there is an 
entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak 
value of current at any point is reported as peak memory.


Now, lets consider a process p1 which was existing before tracing started. its 
rss was having value r1 KB. we receive first trace entry for p1, which says 
8KB was allocated for p1.


p1->cur = r1 + 8;
p->peak = r1 + 8;

We receive another entry with 4KB de-allocation.

p1->cur = r1 + 4;
p->peak = r1 + 8;

There would have been some of these entries which have also been taken care in 
statm/rss (r1 here). We do not know how many entries were already considered.
Currently, these n (unknown) entries [which had been generated between (a) 1st 
tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, 
which is a bug.


~Pratyush



[1]
https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
[2]
https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517




Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Steven Rostedt
On Thu, 27 Apr 2017 19:32:43 +0530
Pratyush Anand  wrote:

> I will implement your review comments and will send next revision.
> However, I had couple of observation which I was unable to justify:
> 
> # ./trace-cmd top -s /tmp/test
> # ./trace-cmd top -p /tmp/test | grep trace-cmd
>15292   trace-cmd   22144   15808

What does it give for your /tmp/test ?

Note, tracing doesn't start till after trace-cmd is loaded. Everything
before is not going to be seen.

> Here,15292 is the pid of trace-cmd task
> 22144 KB is the peak memory usage
> 15808 KB is the current memory usage
> 
> Now check rss component from statm
> # cat /proc/15292/statm
>50 35 23 7 0 12 0 36
> 
> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 
> pages = 35*64 = 2240KB
> I patched my kernel [2] for test purpose, so that statm reports peak memory 
> as 
> well. Here, the last extra entry in statm output is peak memory and it is 36 
> pages = 2304KB.
> So, this is a huge difference between what has been reported by statm and 
> what 
> we get from trace-cmd.
> I understand that `trace-cmd top` output would only be approximate, because 
> many of the memory could be allocated by task and freed in interrupt context. 
> So, many a time it can differ. But, is such a huge difference justified? If 
> yes, can we count on the output of this utility to find early boot time oom 
> issues?

Doesn't it only just trace the memory usage of when the tracing starts?

-- Steve

> 
> 
> [1] 
> https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
> [2] 
> https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517



Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Steven Rostedt
On Thu, 27 Apr 2017 19:32:43 +0530
Pratyush Anand  wrote:

> I will implement your review comments and will send next revision.
> However, I had couple of observation which I was unable to justify:
> 
> # ./trace-cmd top -s /tmp/test
> # ./trace-cmd top -p /tmp/test | grep trace-cmd
>15292   trace-cmd   22144   15808

What does it give for your /tmp/test ?

Note, tracing doesn't start till after trace-cmd is loaded. Everything
before is not going to be seen.

> Here,15292 is the pid of trace-cmd task
> 22144 KB is the peak memory usage
> 15808 KB is the current memory usage
> 
> Now check rss component from statm
> # cat /proc/15292/statm
>50 35 23 7 0 12 0 36
> 
> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 
> pages = 35*64 = 2240KB
> I patched my kernel [2] for test purpose, so that statm reports peak memory 
> as 
> well. Here, the last extra entry in statm output is peak memory and it is 36 
> pages = 2304KB.
> So, this is a huge difference between what has been reported by statm and 
> what 
> we get from trace-cmd.
> I understand that `trace-cmd top` output would only be approximate, because 
> many of the memory could be allocated by task and freed in interrupt context. 
> So, many a time it can differ. But, is such a huge difference justified? If 
> yes, can we count on the output of this utility to find early boot time oom 
> issues?

Doesn't it only just trace the memory usage of when the tracing starts?

-- Steve

> 
> 
> [1] 
> https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
> [2] 
> https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517



Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand

Hi Steve,

On Wednesday 26 April 2017 07:31 PM, Steven Rostedt wrote:


Sorry for the late reply. I finally have time to start looking at
trace-cmd again.


Thanks a lot for your review :-)



On Wed,  1 Mar 2017 20:32:37 +0530
Pratyush Anand  wrote:



[...]


This is as nice idea, but it needs some clean ups. I tried it out just
to play with it.

First, I just did:

 trace-cmd top

And nothing happened. Either a help message needs to be printed, or it
starts something. Maybe have it be interactive, that is, print out data
as it comes in?



[...]



But then for some reason it -p stopped doing anything. And -e didn't
stop it either. But after killing it manually and removing the pid
file, it appeared to work normal again. I can't seem to repeat it, so
lets not worry about that now.



Yes, this version was sharing info between two processes using files. I have a 
new version [1] which has sharing through socket and that works better. Help 
issue is also fixed in that version.



More below.



[...]


+void top_disable_events(void)
+{
+   char *path;
+   char c;
+   int fd;
+   int ret;
+
+   /*
+* WARNING: We want only few events related to memory allocation to
+* be enabled. Disable all events here. Latter, selective events
+* would be enabled


trace-cmd reset does this. You may want to call that instead of doing
it manually here. Someone else wanted to pull out trace-cmd reset into
its own file. I'll have to get those patches. But then we can call that
and enable tracing again.

Or better yet, if we are only tracing memory events, just create your
own instance and use that. Why bother with normal tracing?

 mkdir /sys/kernel/debug/tracing/instances/trace-cmd-memory


Yes, it seems a good idea. Will do.




+*/
+   c = '0';
+   path = get_instance_file(_instance, "events/enable");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   ret = write(fd, , 1);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+   if (ret < 1)
+   die("failed to write 0 to events/enable");
+
+   path = get_instance_file(_instance, "tracing_on");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   ret = write(fd, , 1);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+   if (ret < 1)
+   die("failed to write 0 to tracing_on\n");
+
+   path = get_instance_file(_instance, "set_event");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+}
+
+void top_exit(void)
+{
+   int i;
+   /*
+* free all the dynamic memories which could have been allocated by
+* this program
+*/
+   if (kbuf)
+   kbuffer_free(kbuf);
+   if (record)
+   free_record(record);
+   if (pevent)
+   pevent_free(pevent);


The above three functions all can handle a NULL pointer. Remove the if
statements.


ok




+   if (top_task_stats) {
+   for (i = 0; i < top_task_cnt; i++)
+   free(top_task_stats[i].comm);
+   free(top_task_stats);
+   }
+
+   top_disable_events();
+   remove(TRACE_CMD_TOP_PID_FILE);
+}
+
+void top_initialize_events(void)
+{
+   char *path;
+   int fd;
+   char c;
+   int ret;
+   char *buf;
+   int size;
+   enum kbuffer_long_size long_size;
+   enum kbuffer_endian endian;
+   char id_str[8];
+


Might want to look at trace_stream_init() and use that instead. It is
made for live streaming of events.


OK, will look into trace_stream_init().




+   /* trace data is read from the trace_raw_pipe directly */
+   if (tracecmd_host_bigendian())
+   endian = KBUFFER_ENDIAN_BIG;
+   else
+   endian = KBUFFER_ENDIAN_LITTLE;
+


[...]


+void top_start(void)
+{
+   pid_t pid;
+   int fd, ret;
+
+   top_disable_events();
+   top_initialize_events();
+   page_size = getpagesize();
+
+   pid = fork();
+   if (!pid) {
+   do {
+   top_process_events();
+   sleep(1);


We should probably just sleep on the buffer, waiting for data.


OK




+   if (set_print_only) {
+   top_print_stats();
+   set_print_only = false;
+   }
+   if (set_print_and_exit) {
+   top_print_stats();
+   top_exit();
+   break;
+   }
+   } while (1);
+   } else {
+   fd = open(TRACE_CMD_TOP_PID_FILE, O_WRONLY | O_CREAT);
+   if (fd < 0)
+   goto 

Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand

Hi Steve,

On Wednesday 26 April 2017 07:31 PM, Steven Rostedt wrote:


Sorry for the late reply. I finally have time to start looking at
trace-cmd again.


Thanks a lot for your review :-)



On Wed,  1 Mar 2017 20:32:37 +0530
Pratyush Anand  wrote:



[...]


This is as nice idea, but it needs some clean ups. I tried it out just
to play with it.

First, I just did:

 trace-cmd top

And nothing happened. Either a help message needs to be printed, or it
starts something. Maybe have it be interactive, that is, print out data
as it comes in?



[...]



But then for some reason it -p stopped doing anything. And -e didn't
stop it either. But after killing it manually and removing the pid
file, it appeared to work normal again. I can't seem to repeat it, so
lets not worry about that now.



Yes, this version was sharing info between two processes using files. I have a 
new version [1] which has sharing through socket and that works better. Help 
issue is also fixed in that version.



More below.



[...]


+void top_disable_events(void)
+{
+   char *path;
+   char c;
+   int fd;
+   int ret;
+
+   /*
+* WARNING: We want only few events related to memory allocation to
+* be enabled. Disable all events here. Latter, selective events
+* would be enabled


trace-cmd reset does this. You may want to call that instead of doing
it manually here. Someone else wanted to pull out trace-cmd reset into
its own file. I'll have to get those patches. But then we can call that
and enable tracing again.

Or better yet, if we are only tracing memory events, just create your
own instance and use that. Why bother with normal tracing?

 mkdir /sys/kernel/debug/tracing/instances/trace-cmd-memory


Yes, it seems a good idea. Will do.




+*/
+   c = '0';
+   path = get_instance_file(_instance, "events/enable");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   ret = write(fd, , 1);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+   if (ret < 1)
+   die("failed to write 0 to events/enable");
+
+   path = get_instance_file(_instance, "tracing_on");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   ret = write(fd, , 1);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+   if (ret < 1)
+   die("failed to write 0 to tracing_on\n");
+
+   path = get_instance_file(_instance, "set_event");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+}
+
+void top_exit(void)
+{
+   int i;
+   /*
+* free all the dynamic memories which could have been allocated by
+* this program
+*/
+   if (kbuf)
+   kbuffer_free(kbuf);
+   if (record)
+   free_record(record);
+   if (pevent)
+   pevent_free(pevent);


The above three functions all can handle a NULL pointer. Remove the if
statements.


ok




+   if (top_task_stats) {
+   for (i = 0; i < top_task_cnt; i++)
+   free(top_task_stats[i].comm);
+   free(top_task_stats);
+   }
+
+   top_disable_events();
+   remove(TRACE_CMD_TOP_PID_FILE);
+}
+
+void top_initialize_events(void)
+{
+   char *path;
+   int fd;
+   char c;
+   int ret;
+   char *buf;
+   int size;
+   enum kbuffer_long_size long_size;
+   enum kbuffer_endian endian;
+   char id_str[8];
+


Might want to look at trace_stream_init() and use that instead. It is
made for live streaming of events.


OK, will look into trace_stream_init().




+   /* trace data is read from the trace_raw_pipe directly */
+   if (tracecmd_host_bigendian())
+   endian = KBUFFER_ENDIAN_BIG;
+   else
+   endian = KBUFFER_ENDIAN_LITTLE;
+


[...]


+void top_start(void)
+{
+   pid_t pid;
+   int fd, ret;
+
+   top_disable_events();
+   top_initialize_events();
+   page_size = getpagesize();
+
+   pid = fork();
+   if (!pid) {
+   do {
+   top_process_events();
+   sleep(1);


We should probably just sleep on the buffer, waiting for data.


OK




+   if (set_print_only) {
+   top_print_stats();
+   set_print_only = false;
+   }
+   if (set_print_and_exit) {
+   top_print_stats();
+   top_exit();
+   break;
+   }
+   } while (1);
+   } else {
+   fd = open(TRACE_CMD_TOP_PID_FILE, O_WRONLY | O_CREAT);
+   if (fd < 0)
+   goto kill_child;
+  

Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-26 Thread Steven Rostedt

Sorry for the late reply. I finally have time to start looking at
trace-cmd again.

On Wed,  1 Mar 2017 20:32:37 +0530
Pratyush Anand  wrote:

> A new interface "trace-cmd top" has been introduced in this patch to
> know peak memory usage of any task.
> 
> $ trace-cmd top -s
> It will start to gather statistics of all the process which will be
> scheduled after this command execution.
> $ trace-cmd top -p
> It will print peak memory usage(in KB) against each scheduled task.
> $ trace-cmd top -e
> It will print peak memory usage(in KB) against each scheduled task and
> will stop gathering statistics. It will also kill the child process
> initiated by -s option.
> 
> $ ./trace-cmd top -s
> $ ./trace-cmd top -e
> 
> commpeak memory(in KB)
> NetworkManager  15912
> gnome-shell 114292
> gnome-shell 186356
> firefox 853244
> bash5984
> thunderbird 1896396
> kworker/3:0 0
> kworker/u16:1   0
> trace-cmd   256

This is as nice idea, but it needs some clean ups. I tried it out just
to play with it.

First, I just did:

 trace-cmd top

And nothing happened. Either a help message needs to be printed, or it
starts something. Maybe have it be interactive, that is, print out data
as it comes in?

Next I did:

# trace-cmd top -s
# trace-cmd top -p

commpeak memory(in KB)
systemd 9436
rcuc/0  0
rcuc/1  0
rcuc/2  0
rcuc/3  0
irq/24-ahci[000 0
jbd2/dm-0-8 232
systemd-journal 88704
smartd  4208
avahi-daemon3960
dbus-daemon 4504
in:imjournal51888
gdbus   15468
irq/26-em1  0
dhclient19328
rs:main Q:Reg   51888
sendmail4272
sendmail3792
sshd5676
bash14796
kworker/2:2 1448
kworker/u8:0904
emacs   66852
trace-cmd   20792


But then for some reason it -p stopped doing anything. And -e didn't
stop it either. But after killing it manually and removing the pid
file, it appeared to work normal again. I can't seem to repeat it, so
lets not worry about that now.

More below.

> 
> Signed-off-by: Pratyush Anand 
> ---
>  Makefile   |   2 +-
>  trace-cmd.c|   4 +
>  trace-local.h  |   3 +
>  trace-record.c |   2 +-
>  trace-top.c| 620 
> +
>  trace-usage.c  |   8 +
>  6 files changed, 637 insertions(+), 2 deletions(-)
>  create mode 100644 trace-top.c
> 
> diff --git a/Makefile b/Makefile
> index 62cb25b6c1bd..5324c141d3d6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -331,7 +331,7 @@ TRACE_GUI_OBJS = trace-filter.o trace-compat.o 
> trace-filter-hash.o trace-dialog.
>   trace-xml.o
>  TRACE_CMD_OBJS = trace-cmd.o trace-record.o trace-read.o trace-split.o 
> trace-listen.o \
>trace-stack.o trace-hist.o trace-mem.o trace-snapshot.o trace-stat.o \
> -  trace-hash.o trace-profile.o trace-stream.o
> +  trace-hash.o trace-profile.o trace-stream.o trace-top.o
>  TRACE_VIEW_OBJS = trace-view.o trace-view-store.o
>  TRACE_GRAPH_OBJS = trace-graph.o trace-plot.o trace-plot-cpu.o 
> trace-plot-task.o
>  TRACE_VIEW_MAIN_OBJS = trace-view-main.o $(TRACE_VIEW_OBJS) $(TRACE_GUI_OBJS)
> diff --git a/trace-cmd.c b/trace-cmd.c
> index 1a62faaf71ca..66e0f7cd37bc 100644
> --- a/trace-cmd.c
> +++ b/trace-cmd.c
> @@ -495,6 +495,10 @@ int main (int argc, char **argv)
>   trace_stat(argc, argv);
>   exit(0);
>  
> + } else if (strcmp(argv[1], "top") == 0) {
> + trace_top(argc, argv);
> + exit(0);
> +
>   } else if (strcmp(argv[1], "options") == 0) {
>   show_plugin_options();
>   exit(0);
> diff --git a/trace-local.h b/trace-local.h
> index 62363d0f9248..c9eac07da79a 100644
> --- a/trace-local.h
> +++ b/trace-local.h
> @@ -74,6 +74,8 @@ void trace_mem(int argc, char **argv);
>  
>  void trace_stat(int argc, char **argv);
>  
> +void trace_top(int argc, char **argv);
> +
>  struct hook_list;
>  
>  int trace_profile_record(struct tracecmd_input *handle,
> @@ -184,5 +186,6 @@ void update_first_instance(struct buffer_instance 
> *instance, int topt);
>  
>  void show_instance_file(struct buffer_instance *instance, const char *name);
>  int count_cpus(void);
> +char *read_file(char *file, int *psize);
>  
>  #endif /* __TRACE_LOCAL_H */
> diff --git a/trace-record.c b/trace-record.c
> index 22b6835c0d5b..b2ce59e7eb4f 100644
> --- a/trace-record.c
> +++ b/trace-record.c
> @@ -3323,7 +3323,7 @@ static char *read_instance_file(struct buffer_instance 
> *instance, char *file, in
>   return buf;
>  }
>  
> -static char *read_file(char *file, int *psize)
> +char *read_file(char *file, int 

Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-26 Thread Steven Rostedt

Sorry for the late reply. I finally have time to start looking at
trace-cmd again.

On Wed,  1 Mar 2017 20:32:37 +0530
Pratyush Anand  wrote:

> A new interface "trace-cmd top" has been introduced in this patch to
> know peak memory usage of any task.
> 
> $ trace-cmd top -s
> It will start to gather statistics of all the process which will be
> scheduled after this command execution.
> $ trace-cmd top -p
> It will print peak memory usage(in KB) against each scheduled task.
> $ trace-cmd top -e
> It will print peak memory usage(in KB) against each scheduled task and
> will stop gathering statistics. It will also kill the child process
> initiated by -s option.
> 
> $ ./trace-cmd top -s
> $ ./trace-cmd top -e
> 
> commpeak memory(in KB)
> NetworkManager  15912
> gnome-shell 114292
> gnome-shell 186356
> firefox 853244
> bash5984
> thunderbird 1896396
> kworker/3:0 0
> kworker/u16:1   0
> trace-cmd   256

This is as nice idea, but it needs some clean ups. I tried it out just
to play with it.

First, I just did:

 trace-cmd top

And nothing happened. Either a help message needs to be printed, or it
starts something. Maybe have it be interactive, that is, print out data
as it comes in?

Next I did:

# trace-cmd top -s
# trace-cmd top -p

commpeak memory(in KB)
systemd 9436
rcuc/0  0
rcuc/1  0
rcuc/2  0
rcuc/3  0
irq/24-ahci[000 0
jbd2/dm-0-8 232
systemd-journal 88704
smartd  4208
avahi-daemon3960
dbus-daemon 4504
in:imjournal51888
gdbus   15468
irq/26-em1  0
dhclient19328
rs:main Q:Reg   51888
sendmail4272
sendmail3792
sshd5676
bash14796
kworker/2:2 1448
kworker/u8:0904
emacs   66852
trace-cmd   20792


But then for some reason it -p stopped doing anything. And -e didn't
stop it either. But after killing it manually and removing the pid
file, it appeared to work normal again. I can't seem to repeat it, so
lets not worry about that now.

More below.

> 
> Signed-off-by: Pratyush Anand 
> ---
>  Makefile   |   2 +-
>  trace-cmd.c|   4 +
>  trace-local.h  |   3 +
>  trace-record.c |   2 +-
>  trace-top.c| 620 
> +
>  trace-usage.c  |   8 +
>  6 files changed, 637 insertions(+), 2 deletions(-)
>  create mode 100644 trace-top.c
> 
> diff --git a/Makefile b/Makefile
> index 62cb25b6c1bd..5324c141d3d6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -331,7 +331,7 @@ TRACE_GUI_OBJS = trace-filter.o trace-compat.o 
> trace-filter-hash.o trace-dialog.
>   trace-xml.o
>  TRACE_CMD_OBJS = trace-cmd.o trace-record.o trace-read.o trace-split.o 
> trace-listen.o \
>trace-stack.o trace-hist.o trace-mem.o trace-snapshot.o trace-stat.o \
> -  trace-hash.o trace-profile.o trace-stream.o
> +  trace-hash.o trace-profile.o trace-stream.o trace-top.o
>  TRACE_VIEW_OBJS = trace-view.o trace-view-store.o
>  TRACE_GRAPH_OBJS = trace-graph.o trace-plot.o trace-plot-cpu.o 
> trace-plot-task.o
>  TRACE_VIEW_MAIN_OBJS = trace-view-main.o $(TRACE_VIEW_OBJS) $(TRACE_GUI_OBJS)
> diff --git a/trace-cmd.c b/trace-cmd.c
> index 1a62faaf71ca..66e0f7cd37bc 100644
> --- a/trace-cmd.c
> +++ b/trace-cmd.c
> @@ -495,6 +495,10 @@ int main (int argc, char **argv)
>   trace_stat(argc, argv);
>   exit(0);
>  
> + } else if (strcmp(argv[1], "top") == 0) {
> + trace_top(argc, argv);
> + exit(0);
> +
>   } else if (strcmp(argv[1], "options") == 0) {
>   show_plugin_options();
>   exit(0);
> diff --git a/trace-local.h b/trace-local.h
> index 62363d0f9248..c9eac07da79a 100644
> --- a/trace-local.h
> +++ b/trace-local.h
> @@ -74,6 +74,8 @@ void trace_mem(int argc, char **argv);
>  
>  void trace_stat(int argc, char **argv);
>  
> +void trace_top(int argc, char **argv);
> +
>  struct hook_list;
>  
>  int trace_profile_record(struct tracecmd_input *handle,
> @@ -184,5 +186,6 @@ void update_first_instance(struct buffer_instance 
> *instance, int topt);
>  
>  void show_instance_file(struct buffer_instance *instance, const char *name);
>  int count_cpus(void);
> +char *read_file(char *file, int *psize);
>  
>  #endif /* __TRACE_LOCAL_H */
> diff --git a/trace-record.c b/trace-record.c
> index 22b6835c0d5b..b2ce59e7eb4f 100644
> --- a/trace-record.c
> +++ b/trace-record.c
> @@ -3323,7 +3323,7 @@ static char *read_instance_file(struct buffer_instance 
> *instance, char *file, in
>   return buf;
>  }
>  
> -static char *read_file(char *file, int *psize)
> +char *read_file(char *file, int *psize)
>  {
>   return