Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
On Thu, 3 Jan 2019 23:02:07 -0500 Steven Rostedt wrote: > On Thu, 27 Apr 2017 23:03:24 +0530 > Pratyush Anand wrote: > > Hi Pratyush, > > I was digging through old email, and came across this. Our thread sorta > just died. Are you still interested in implementing "trace-cmd top"? > > We've changed the code structure quite a bit since this came out so it > would require a rewrite. But would actually fit better now. > And it appears you don't work for Red Hat anymore! -- Steve > > > > On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote: > > > On Thu, 27 Apr 2017 19:32:43 +0530 > > > Pratyush Anand wrote: > > > > > >> I will implement your review comments and will send next revision. > > >> However, I had couple of observation which I was unable to justify: > > >> > > >> # ./trace-cmd top -s /tmp/test > > >> # ./trace-cmd top -p /tmp/test | grep trace-cmd > > >>15292 trace-cmd 22144 15808 > > > > > > What does it give for your /tmp/test ? > > > > /tmp/test is name of socket through which both trace aggregator (-s) and > > printer (-p) process are talking to each other. > > > > > > > > Note, tracing doesn't start till after trace-cmd is loaded. Everything > > > before is not going to be seen. > > > > Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it > > aggregates > > stats for all processes including self process as well. After some time I > > execute `./trace-cmd top -p /tmp/test` which prints stats for all the > > processes including self. In the above example, I see that peak memory > > calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB. > > > > > > > >> Here,15292 is the pid of trace-cmd task > > >> 22144 KB is the peak memory usage > > >> 15808 KB is the current memory usage > > >> > > >> Now check rss component from statm > > >> # cat /proc/15292/statm > > >>50 35 23 7 0 12 0 36 > > >> > > >> This is a result on ARM64/64KB page size. Therefore, as per statm rss is > > >> 35 > > >> pages = 35*64 = 2240KB > > >> I patched my kernel [2] for test purpose, so that statm reports peak > > >> memory as > > >> well. Here, the last extra entry in statm output is peak memory and it > > >> is 36 > > >> pages = 2304KB. > > >> So, this is a huge difference between what has been reported by statm > > >> and what > > >> we get from trace-cmd. > > >> I understand that `trace-cmd top` output would only be approximate, > > >> because > > >> many of the memory could be allocated by task and freed in interrupt > > >> context. > > >> So, many a time it can differ. But, is such a huge difference justified? > > >> If > > >> yes, can we count on the output of this utility to find early boot time > > >> oom > > >> issues? > > > > > > Doesn't it only just trace the memory usage of when the tracing starts? > > > > > > > So,should be less than what is being reported by statm, right? But, here we > > see that value reported by trace-cmd is far more than rss value of statm. > > > > Well, there is one known bug in the code, which I am trying to find out a > > way. > > But, that should not cause this huge difference. > > > > When we receive a trace entry for any new process, we read rss value from > > statm. We think,that these are the memory consumption by that process > > before > > tracing started. So,we initialize peak and current memory value for that > > process with rss from statm, and then we increase current when there is an > > entry for mm_page_alloc() and decrease when an entry for mm_page_free(). > > peak > > value of current at any point is reported as peak memory. > > > > Now, lets consider a process p1 which was existing before tracing started. > > its > > rss was having value r1 KB. we receive first trace entry for p1, which says > > 8KB was allocated for p1. > > > > p1->cur = r1 + 8; > > p->peak = r1 + 8; > > > > We receive another entry with 4KB de-allocation. > > > > p1->cur = r1 + 4; > > p->peak = r1 + 8; > > > > There would have been some of these entries which have also been taken care > > in > > statm/rss (r1 here). We do not know how many entries were already > > considered. > > Currently, these n (unknown) entries [which had been generated between (a) > > 1st > > tarce entry and (b) when trace-cmd top reads statm/rss] are considered > > twice, > > which is a bug. > > > > ~Pratyush > > > > >> > > >> [1] > > >> https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4 > > >> [2] > > >> https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517 > > >> > > > >
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
On Thu, 27 Apr 2017 23:03:24 +0530 Pratyush Anand wrote: Hi Pratyush, I was digging through old email, and came across this. Our thread sorta just died. Are you still interested in implementing "trace-cmd top"? We've changed the code structure quite a bit since this came out so it would require a rewrite. But would actually fit better now. -- Steve > On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote: > > On Thu, 27 Apr 2017 19:32:43 +0530 > > Pratyush Anand wrote: > > > >> I will implement your review comments and will send next revision. > >> However, I had couple of observation which I was unable to justify: > >> > >> # ./trace-cmd top -s /tmp/test > >> # ./trace-cmd top -p /tmp/test | grep trace-cmd > >>15292 trace-cmd 22144 15808 > > > > What does it give for your /tmp/test ? > > /tmp/test is name of socket through which both trace aggregator (-s) and > printer (-p) process are talking to each other. > > > > > Note, tracing doesn't start till after trace-cmd is loaded. Everything > > before is not going to be seen. > > Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates > stats for all processes including self process as well. After some time I > execute `./trace-cmd top -p /tmp/test` which prints stats for all the > processes including self. In the above example, I see that peak memory > calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB. > > > > >> Here,15292 is the pid of trace-cmd task > >> 22144 KB is the peak memory usage > >> 15808 KB is the current memory usage > >> > >> Now check rss component from statm > >> # cat /proc/15292/statm > >>50 35 23 7 0 12 0 36 > >> > >> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 > >> pages = 35*64 = 2240KB > >> I patched my kernel [2] for test purpose, so that statm reports peak > >> memory as > >> well. Here, the last extra entry in statm output is peak memory and it is > >> 36 > >> pages = 2304KB. > >> So, this is a huge difference between what has been reported by statm and > >> what > >> we get from trace-cmd. > >> I understand that `trace-cmd top` output would only be approximate, because > >> many of the memory could be allocated by task and freed in interrupt > >> context. > >> So, many a time it can differ. But, is such a huge difference justified? If > >> yes, can we count on the output of this utility to find early boot time oom > >> issues? > > > > Doesn't it only just trace the memory usage of when the tracing starts? > > So,should be less than what is being reported by statm, right? But, here we > see that value reported by trace-cmd is far more than rss value of statm. > > Well, there is one known bug in the code, which I am trying to find out a > way. > But, that should not cause this huge difference. > > When we receive a trace entry for any new process, we read rss value from > statm. We think,that these are the memory consumption by that process before > tracing started. So,we initialize peak and current memory value for that > process with rss from statm, and then we increase current when there is an > entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak > value of current at any point is reported as peak memory. > > Now, lets consider a process p1 which was existing before tracing started. > its > rss was having value r1 KB. we receive first trace entry for p1, which says > 8KB was allocated for p1. > > p1->cur = r1 + 8; > p->peak = r1 + 8; > > We receive another entry with 4KB de-allocation. > > p1->cur = r1 + 4; > p->peak = r1 + 8; > > There would have been some of these entries which have also been taken care > in > statm/rss (r1 here). We do not know how many entries were already considered. > Currently, these n (unknown) entries [which had been generated between (a) > 1st > tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, > which is a bug. > > ~Pratyush > > >> > >> [1] > >> https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4 > >> [2] > >> https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517 > >> > >
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
On Thu, 27 Apr 2017 23:03:24 +0530 Pratyush Anandwrote: > On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote: > > On Thu, 27 Apr 2017 19:32:43 +0530 > > Pratyush Anand wrote: > > > >> I will implement your review comments and will send next revision. > >> However, I had couple of observation which I was unable to justify: > >> > >> # ./trace-cmd top -s /tmp/test > >> # ./trace-cmd top -p /tmp/test | grep trace-cmd > >>15292 trace-cmd 22144 15808 > > > > What does it give for your /tmp/test ? > > /tmp/test is name of socket through which both trace aggregator (-s) and > printer (-p) process are talking to each other. Ah ok, I see what you are doing. I rather not need to have a socket that is specified. Perhaps just something like normal "top". > > > > > Note, tracing doesn't start till after trace-cmd is loaded. Everything > > before is not going to be seen. > > Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates > stats for all processes including self process as well. After some time I > execute `./trace-cmd top -p /tmp/test` which prints stats for all the > processes including self. In the above example, I see that peak memory > calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB. > > > > >> Here,15292 is the pid of trace-cmd task > >> 22144 KB is the peak memory usage > >> 15808 KB is the current memory usage > >> > >> Now check rss component from statm > >> # cat /proc/15292/statm > >>50 35 23 7 0 12 0 36 > >> > >> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 > >> pages = 35*64 = 2240KB > >> I patched my kernel [2] for test purpose, so that statm reports peak > >> memory as > >> well. Here, the last extra entry in statm output is peak memory and it is > >> 36 > >> pages = 2304KB. > >> So, this is a huge difference between what has been reported by statm and > >> what > >> we get from trace-cmd. > >> I understand that `trace-cmd top` output would only be approximate, because > >> many of the memory could be allocated by task and freed in interrupt > >> context. > >> So, many a time it can differ. But, is such a huge difference justified? If > >> yes, can we count on the output of this utility to find early boot time oom > >> issues? > > > > Doesn't it only just trace the memory usage of when the tracing starts? > > So,should be less than what is being reported by statm, right? But, here we > see that value reported by trace-cmd is far more than rss value of statm. > > Well, there is one known bug in the code, which I am trying to find out a > way. > But, that should not cause this huge difference. > > When we receive a trace entry for any new process, we read rss value from > statm. We think,that these are the memory consumption by that process before > tracing started. So,we initialize peak and current memory value for that > process with rss from statm, and then we increase current when there is an > entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak > value of current at any point is reported as peak memory. > > Now, lets consider a process p1 which was existing before tracing started. > its > rss was having value r1 KB. we receive first trace entry for p1, which says > 8KB was allocated for p1. > > p1->cur = r1 + 8; > p->peak = r1 + 8; > > We receive another entry with 4KB de-allocation. > > p1->cur = r1 + 4; > p->peak = r1 + 8; > > There would have been some of these entries which have also been taken care > in > statm/rss (r1 here). We do not know how many entries were already considered. > Currently, these n (unknown) entries [which had been generated between (a) > 1st > tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, > which is a bug. > Note, it appears you are tracing all the alloc and frees of pages. This can be very deceiving. There can be a lot of cases where pages are allocated by the application, and then freed via rcu. That is very common. Those that are freed via rcu will not be accounted for. -- Steve
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
On Thu, 27 Apr 2017 23:03:24 +0530 Pratyush Anand wrote: > On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote: > > On Thu, 27 Apr 2017 19:32:43 +0530 > > Pratyush Anand wrote: > > > >> I will implement your review comments and will send next revision. > >> However, I had couple of observation which I was unable to justify: > >> > >> # ./trace-cmd top -s /tmp/test > >> # ./trace-cmd top -p /tmp/test | grep trace-cmd > >>15292 trace-cmd 22144 15808 > > > > What does it give for your /tmp/test ? > > /tmp/test is name of socket through which both trace aggregator (-s) and > printer (-p) process are talking to each other. Ah ok, I see what you are doing. I rather not need to have a socket that is specified. Perhaps just something like normal "top". > > > > > Note, tracing doesn't start till after trace-cmd is loaded. Everything > > before is not going to be seen. > > Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates > stats for all processes including self process as well. After some time I > execute `./trace-cmd top -p /tmp/test` which prints stats for all the > processes including self. In the above example, I see that peak memory > calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB. > > > > >> Here,15292 is the pid of trace-cmd task > >> 22144 KB is the peak memory usage > >> 15808 KB is the current memory usage > >> > >> Now check rss component from statm > >> # cat /proc/15292/statm > >>50 35 23 7 0 12 0 36 > >> > >> This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 > >> pages = 35*64 = 2240KB > >> I patched my kernel [2] for test purpose, so that statm reports peak > >> memory as > >> well. Here, the last extra entry in statm output is peak memory and it is > >> 36 > >> pages = 2304KB. > >> So, this is a huge difference between what has been reported by statm and > >> what > >> we get from trace-cmd. > >> I understand that `trace-cmd top` output would only be approximate, because > >> many of the memory could be allocated by task and freed in interrupt > >> context. > >> So, many a time it can differ. But, is such a huge difference justified? If > >> yes, can we count on the output of this utility to find early boot time oom > >> issues? > > > > Doesn't it only just trace the memory usage of when the tracing starts? > > So,should be less than what is being reported by statm, right? But, here we > see that value reported by trace-cmd is far more than rss value of statm. > > Well, there is one known bug in the code, which I am trying to find out a > way. > But, that should not cause this huge difference. > > When we receive a trace entry for any new process, we read rss value from > statm. We think,that these are the memory consumption by that process before > tracing started. So,we initialize peak and current memory value for that > process with rss from statm, and then we increase current when there is an > entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak > value of current at any point is reported as peak memory. > > Now, lets consider a process p1 which was existing before tracing started. > its > rss was having value r1 KB. we receive first trace entry for p1, which says > 8KB was allocated for p1. > > p1->cur = r1 + 8; > p->peak = r1 + 8; > > We receive another entry with 4KB de-allocation. > > p1->cur = r1 + 4; > p->peak = r1 + 8; > > There would have been some of these entries which have also been taken care > in > statm/rss (r1 here). We do not know how many entries were already considered. > Currently, these n (unknown) entries [which had been generated between (a) > 1st > tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, > which is a bug. > Note, it appears you are tracing all the alloc and frees of pages. This can be very deceiving. There can be a lot of cases where pages are allocated by the application, and then freed via rcu. That is very common. Those that are freed via rcu will not be accounted for. -- Steve
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote: On Thu, 27 Apr 2017 19:32:43 +0530 Pratyush Anandwrote: I will implement your review comments and will send next revision. However, I had couple of observation which I was unable to justify: # ./trace-cmd top -s /tmp/test # ./trace-cmd top -p /tmp/test | grep trace-cmd 15292 trace-cmd 22144 15808 What does it give for your /tmp/test ? /tmp/test is name of socket through which both trace aggregator (-s) and printer (-p) process are talking to each other. Note, tracing doesn't start till after trace-cmd is loaded. Everything before is not going to be seen. Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates stats for all processes including self process as well. After some time I execute `./trace-cmd top -p /tmp/test` which prints stats for all the processes including self. In the above example, I see that peak memory calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB. Here,15292 is the pid of trace-cmd task 22144 KB is the peak memory usage 15808 KB is the current memory usage Now check rss component from statm # cat /proc/15292/statm 50 35 23 7 0 12 0 36 This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 pages = 35*64 = 2240KB I patched my kernel [2] for test purpose, so that statm reports peak memory as well. Here, the last extra entry in statm output is peak memory and it is 36 pages = 2304KB. So, this is a huge difference between what has been reported by statm and what we get from trace-cmd. I understand that `trace-cmd top` output would only be approximate, because many of the memory could be allocated by task and freed in interrupt context. So, many a time it can differ. But, is such a huge difference justified? If yes, can we count on the output of this utility to find early boot time oom issues? Doesn't it only just trace the memory usage of when the tracing starts? So,should be less than what is being reported by statm, right? But, here we see that value reported by trace-cmd is far more than rss value of statm. Well, there is one known bug in the code, which I am trying to find out a way. But, that should not cause this huge difference. When we receive a trace entry for any new process, we read rss value from statm. We think,that these are the memory consumption by that process before tracing started. So,we initialize peak and current memory value for that process with rss from statm, and then we increase current when there is an entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak value of current at any point is reported as peak memory. Now, lets consider a process p1 which was existing before tracing started. its rss was having value r1 KB. we receive first trace entry for p1, which says 8KB was allocated for p1. p1->cur = r1 + 8; p->peak = r1 + 8; We receive another entry with 4KB de-allocation. p1->cur = r1 + 4; p->peak = r1 + 8; There would have been some of these entries which have also been taken care in statm/rss (r1 here). We do not know how many entries were already considered. Currently, these n (unknown) entries [which had been generated between (a) 1st tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, which is a bug. ~Pratyush [1] https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4 [2] https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote: On Thu, 27 Apr 2017 19:32:43 +0530 Pratyush Anand wrote: I will implement your review comments and will send next revision. However, I had couple of observation which I was unable to justify: # ./trace-cmd top -s /tmp/test # ./trace-cmd top -p /tmp/test | grep trace-cmd 15292 trace-cmd 22144 15808 What does it give for your /tmp/test ? /tmp/test is name of socket through which both trace aggregator (-s) and printer (-p) process are talking to each other. Note, tracing doesn't start till after trace-cmd is loaded. Everything before is not going to be seen. Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates stats for all processes including self process as well. After some time I execute `./trace-cmd top -p /tmp/test` which prints stats for all the processes including self. In the above example, I see that peak memory calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB. Here,15292 is the pid of trace-cmd task 22144 KB is the peak memory usage 15808 KB is the current memory usage Now check rss component from statm # cat /proc/15292/statm 50 35 23 7 0 12 0 36 This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 pages = 35*64 = 2240KB I patched my kernel [2] for test purpose, so that statm reports peak memory as well. Here, the last extra entry in statm output is peak memory and it is 36 pages = 2304KB. So, this is a huge difference between what has been reported by statm and what we get from trace-cmd. I understand that `trace-cmd top` output would only be approximate, because many of the memory could be allocated by task and freed in interrupt context. So, many a time it can differ. But, is such a huge difference justified? If yes, can we count on the output of this utility to find early boot time oom issues? Doesn't it only just trace the memory usage of when the tracing starts? So,should be less than what is being reported by statm, right? But, here we see that value reported by trace-cmd is far more than rss value of statm. Well, there is one known bug in the code, which I am trying to find out a way. But, that should not cause this huge difference. When we receive a trace entry for any new process, we read rss value from statm. We think,that these are the memory consumption by that process before tracing started. So,we initialize peak and current memory value for that process with rss from statm, and then we increase current when there is an entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak value of current at any point is reported as peak memory. Now, lets consider a process p1 which was existing before tracing started. its rss was having value r1 KB. we receive first trace entry for p1, which says 8KB was allocated for p1. p1->cur = r1 + 8; p->peak = r1 + 8; We receive another entry with 4KB de-allocation. p1->cur = r1 + 4; p->peak = r1 + 8; There would have been some of these entries which have also been taken care in statm/rss (r1 here). We do not know how many entries were already considered. Currently, these n (unknown) entries [which had been generated between (a) 1st tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, which is a bug. ~Pratyush [1] https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4 [2] https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
On Thu, 27 Apr 2017 19:32:43 +0530 Pratyush Anandwrote: > I will implement your review comments and will send next revision. > However, I had couple of observation which I was unable to justify: > > # ./trace-cmd top -s /tmp/test > # ./trace-cmd top -p /tmp/test | grep trace-cmd >15292 trace-cmd 22144 15808 What does it give for your /tmp/test ? Note, tracing doesn't start till after trace-cmd is loaded. Everything before is not going to be seen. > Here,15292 is the pid of trace-cmd task > 22144 KB is the peak memory usage > 15808 KB is the current memory usage > > Now check rss component from statm > # cat /proc/15292/statm >50 35 23 7 0 12 0 36 > > This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 > pages = 35*64 = 2240KB > I patched my kernel [2] for test purpose, so that statm reports peak memory > as > well. Here, the last extra entry in statm output is peak memory and it is 36 > pages = 2304KB. > So, this is a huge difference between what has been reported by statm and > what > we get from trace-cmd. > I understand that `trace-cmd top` output would only be approximate, because > many of the memory could be allocated by task and freed in interrupt context. > So, many a time it can differ. But, is such a huge difference justified? If > yes, can we count on the output of this utility to find early boot time oom > issues? Doesn't it only just trace the memory usage of when the tracing starts? -- Steve > > > [1] > https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4 > [2] > https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
On Thu, 27 Apr 2017 19:32:43 +0530 Pratyush Anand wrote: > I will implement your review comments and will send next revision. > However, I had couple of observation which I was unable to justify: > > # ./trace-cmd top -s /tmp/test > # ./trace-cmd top -p /tmp/test | grep trace-cmd >15292 trace-cmd 22144 15808 What does it give for your /tmp/test ? Note, tracing doesn't start till after trace-cmd is loaded. Everything before is not going to be seen. > Here,15292 is the pid of trace-cmd task > 22144 KB is the peak memory usage > 15808 KB is the current memory usage > > Now check rss component from statm > # cat /proc/15292/statm >50 35 23 7 0 12 0 36 > > This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 > pages = 35*64 = 2240KB > I patched my kernel [2] for test purpose, so that statm reports peak memory > as > well. Here, the last extra entry in statm output is peak memory and it is 36 > pages = 2304KB. > So, this is a huge difference between what has been reported by statm and > what > we get from trace-cmd. > I understand that `trace-cmd top` output would only be approximate, because > many of the memory could be allocated by task and freed in interrupt context. > So, many a time it can differ. But, is such a huge difference justified? If > yes, can we count on the output of this utility to find early boot time oom > issues? Doesn't it only just trace the memory usage of when the tracing starts? -- Steve > > > [1] > https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4 > [2] > https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
Hi Steve, On Wednesday 26 April 2017 07:31 PM, Steven Rostedt wrote: Sorry for the late reply. I finally have time to start looking at trace-cmd again. Thanks a lot for your review :-) On Wed, 1 Mar 2017 20:32:37 +0530 Pratyush Anandwrote: [...] This is as nice idea, but it needs some clean ups. I tried it out just to play with it. First, I just did: trace-cmd top And nothing happened. Either a help message needs to be printed, or it starts something. Maybe have it be interactive, that is, print out data as it comes in? [...] But then for some reason it -p stopped doing anything. And -e didn't stop it either. But after killing it manually and removing the pid file, it appeared to work normal again. I can't seem to repeat it, so lets not worry about that now. Yes, this version was sharing info between two processes using files. I have a new version [1] which has sharing through socket and that works better. Help issue is also fixed in that version. More below. [...] +void top_disable_events(void) +{ + char *path; + char c; + int fd; + int ret; + + /* +* WARNING: We want only few events related to memory allocation to +* be enabled. Disable all events here. Latter, selective events +* would be enabled trace-cmd reset does this. You may want to call that instead of doing it manually here. Someone else wanted to pull out trace-cmd reset into its own file. I'll have to get those patches. But then we can call that and enable tracing again. Or better yet, if we are only tracing memory events, just create your own instance and use that. Why bother with normal tracing? mkdir /sys/kernel/debug/tracing/instances/trace-cmd-memory Yes, it seems a good idea. Will do. +*/ + c = '0'; + path = get_instance_file(_instance, "events/enable"); + fd = open(path, O_WRONLY); + if (fd < 0) + die("failed to open '%s'", path); + ret = write(fd, , 1); + close(fd); + tracecmd_put_tracing_file(path); + if (ret < 1) + die("failed to write 0 to events/enable"); + + path = get_instance_file(_instance, "tracing_on"); + fd = open(path, O_WRONLY); + if (fd < 0) + die("failed to open '%s'", path); + ret = write(fd, , 1); + close(fd); + tracecmd_put_tracing_file(path); + if (ret < 1) + die("failed to write 0 to tracing_on\n"); + + path = get_instance_file(_instance, "set_event"); + fd = open(path, O_WRONLY); + if (fd < 0) + die("failed to open '%s'", path); + close(fd); + tracecmd_put_tracing_file(path); +} + +void top_exit(void) +{ + int i; + /* +* free all the dynamic memories which could have been allocated by +* this program +*/ + if (kbuf) + kbuffer_free(kbuf); + if (record) + free_record(record); + if (pevent) + pevent_free(pevent); The above three functions all can handle a NULL pointer. Remove the if statements. ok + if (top_task_stats) { + for (i = 0; i < top_task_cnt; i++) + free(top_task_stats[i].comm); + free(top_task_stats); + } + + top_disable_events(); + remove(TRACE_CMD_TOP_PID_FILE); +} + +void top_initialize_events(void) +{ + char *path; + int fd; + char c; + int ret; + char *buf; + int size; + enum kbuffer_long_size long_size; + enum kbuffer_endian endian; + char id_str[8]; + Might want to look at trace_stream_init() and use that instead. It is made for live streaming of events. OK, will look into trace_stream_init(). + /* trace data is read from the trace_raw_pipe directly */ + if (tracecmd_host_bigendian()) + endian = KBUFFER_ENDIAN_BIG; + else + endian = KBUFFER_ENDIAN_LITTLE; + [...] +void top_start(void) +{ + pid_t pid; + int fd, ret; + + top_disable_events(); + top_initialize_events(); + page_size = getpagesize(); + + pid = fork(); + if (!pid) { + do { + top_process_events(); + sleep(1); We should probably just sleep on the buffer, waiting for data. OK + if (set_print_only) { + top_print_stats(); + set_print_only = false; + } + if (set_print_and_exit) { + top_print_stats(); + top_exit(); + break; + } + } while (1); + } else { + fd = open(TRACE_CMD_TOP_PID_FILE, O_WRONLY | O_CREAT); + if (fd < 0) + goto
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
Hi Steve, On Wednesday 26 April 2017 07:31 PM, Steven Rostedt wrote: Sorry for the late reply. I finally have time to start looking at trace-cmd again. Thanks a lot for your review :-) On Wed, 1 Mar 2017 20:32:37 +0530 Pratyush Anand wrote: [...] This is as nice idea, but it needs some clean ups. I tried it out just to play with it. First, I just did: trace-cmd top And nothing happened. Either a help message needs to be printed, or it starts something. Maybe have it be interactive, that is, print out data as it comes in? [...] But then for some reason it -p stopped doing anything. And -e didn't stop it either. But after killing it manually and removing the pid file, it appeared to work normal again. I can't seem to repeat it, so lets not worry about that now. Yes, this version was sharing info between two processes using files. I have a new version [1] which has sharing through socket and that works better. Help issue is also fixed in that version. More below. [...] +void top_disable_events(void) +{ + char *path; + char c; + int fd; + int ret; + + /* +* WARNING: We want only few events related to memory allocation to +* be enabled. Disable all events here. Latter, selective events +* would be enabled trace-cmd reset does this. You may want to call that instead of doing it manually here. Someone else wanted to pull out trace-cmd reset into its own file. I'll have to get those patches. But then we can call that and enable tracing again. Or better yet, if we are only tracing memory events, just create your own instance and use that. Why bother with normal tracing? mkdir /sys/kernel/debug/tracing/instances/trace-cmd-memory Yes, it seems a good idea. Will do. +*/ + c = '0'; + path = get_instance_file(_instance, "events/enable"); + fd = open(path, O_WRONLY); + if (fd < 0) + die("failed to open '%s'", path); + ret = write(fd, , 1); + close(fd); + tracecmd_put_tracing_file(path); + if (ret < 1) + die("failed to write 0 to events/enable"); + + path = get_instance_file(_instance, "tracing_on"); + fd = open(path, O_WRONLY); + if (fd < 0) + die("failed to open '%s'", path); + ret = write(fd, , 1); + close(fd); + tracecmd_put_tracing_file(path); + if (ret < 1) + die("failed to write 0 to tracing_on\n"); + + path = get_instance_file(_instance, "set_event"); + fd = open(path, O_WRONLY); + if (fd < 0) + die("failed to open '%s'", path); + close(fd); + tracecmd_put_tracing_file(path); +} + +void top_exit(void) +{ + int i; + /* +* free all the dynamic memories which could have been allocated by +* this program +*/ + if (kbuf) + kbuffer_free(kbuf); + if (record) + free_record(record); + if (pevent) + pevent_free(pevent); The above three functions all can handle a NULL pointer. Remove the if statements. ok + if (top_task_stats) { + for (i = 0; i < top_task_cnt; i++) + free(top_task_stats[i].comm); + free(top_task_stats); + } + + top_disable_events(); + remove(TRACE_CMD_TOP_PID_FILE); +} + +void top_initialize_events(void) +{ + char *path; + int fd; + char c; + int ret; + char *buf; + int size; + enum kbuffer_long_size long_size; + enum kbuffer_endian endian; + char id_str[8]; + Might want to look at trace_stream_init() and use that instead. It is made for live streaming of events. OK, will look into trace_stream_init(). + /* trace data is read from the trace_raw_pipe directly */ + if (tracecmd_host_bigendian()) + endian = KBUFFER_ENDIAN_BIG; + else + endian = KBUFFER_ENDIAN_LITTLE; + [...] +void top_start(void) +{ + pid_t pid; + int fd, ret; + + top_disable_events(); + top_initialize_events(); + page_size = getpagesize(); + + pid = fork(); + if (!pid) { + do { + top_process_events(); + sleep(1); We should probably just sleep on the buffer, waiting for data. OK + if (set_print_only) { + top_print_stats(); + set_print_only = false; + } + if (set_print_and_exit) { + top_print_stats(); + top_exit(); + break; + } + } while (1); + } else { + fd = open(TRACE_CMD_TOP_PID_FILE, O_WRONLY | O_CREAT); + if (fd < 0) + goto kill_child; +
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
Sorry for the late reply. I finally have time to start looking at trace-cmd again. On Wed, 1 Mar 2017 20:32:37 +0530 Pratyush Anandwrote: > A new interface "trace-cmd top" has been introduced in this patch to > know peak memory usage of any task. > > $ trace-cmd top -s > It will start to gather statistics of all the process which will be > scheduled after this command execution. > $ trace-cmd top -p > It will print peak memory usage(in KB) against each scheduled task. > $ trace-cmd top -e > It will print peak memory usage(in KB) against each scheduled task and > will stop gathering statistics. It will also kill the child process > initiated by -s option. > > $ ./trace-cmd top -s > $ ./trace-cmd top -e > > commpeak memory(in KB) > NetworkManager 15912 > gnome-shell 114292 > gnome-shell 186356 > firefox 853244 > bash5984 > thunderbird 1896396 > kworker/3:0 0 > kworker/u16:1 0 > trace-cmd 256 This is as nice idea, but it needs some clean ups. I tried it out just to play with it. First, I just did: trace-cmd top And nothing happened. Either a help message needs to be printed, or it starts something. Maybe have it be interactive, that is, print out data as it comes in? Next I did: # trace-cmd top -s # trace-cmd top -p commpeak memory(in KB) systemd 9436 rcuc/0 0 rcuc/1 0 rcuc/2 0 rcuc/3 0 irq/24-ahci[000 0 jbd2/dm-0-8 232 systemd-journal 88704 smartd 4208 avahi-daemon3960 dbus-daemon 4504 in:imjournal51888 gdbus 15468 irq/26-em1 0 dhclient19328 rs:main Q:Reg 51888 sendmail4272 sendmail3792 sshd5676 bash14796 kworker/2:2 1448 kworker/u8:0904 emacs 66852 trace-cmd 20792 But then for some reason it -p stopped doing anything. And -e didn't stop it either. But after killing it manually and removing the pid file, it appeared to work normal again. I can't seem to repeat it, so lets not worry about that now. More below. > > Signed-off-by: Pratyush Anand > --- > Makefile | 2 +- > trace-cmd.c| 4 + > trace-local.h | 3 + > trace-record.c | 2 +- > trace-top.c| 620 > + > trace-usage.c | 8 + > 6 files changed, 637 insertions(+), 2 deletions(-) > create mode 100644 trace-top.c > > diff --git a/Makefile b/Makefile > index 62cb25b6c1bd..5324c141d3d6 100644 > --- a/Makefile > +++ b/Makefile > @@ -331,7 +331,7 @@ TRACE_GUI_OBJS = trace-filter.o trace-compat.o > trace-filter-hash.o trace-dialog. > trace-xml.o > TRACE_CMD_OBJS = trace-cmd.o trace-record.o trace-read.o trace-split.o > trace-listen.o \ >trace-stack.o trace-hist.o trace-mem.o trace-snapshot.o trace-stat.o \ > - trace-hash.o trace-profile.o trace-stream.o > + trace-hash.o trace-profile.o trace-stream.o trace-top.o > TRACE_VIEW_OBJS = trace-view.o trace-view-store.o > TRACE_GRAPH_OBJS = trace-graph.o trace-plot.o trace-plot-cpu.o > trace-plot-task.o > TRACE_VIEW_MAIN_OBJS = trace-view-main.o $(TRACE_VIEW_OBJS) $(TRACE_GUI_OBJS) > diff --git a/trace-cmd.c b/trace-cmd.c > index 1a62faaf71ca..66e0f7cd37bc 100644 > --- a/trace-cmd.c > +++ b/trace-cmd.c > @@ -495,6 +495,10 @@ int main (int argc, char **argv) > trace_stat(argc, argv); > exit(0); > > + } else if (strcmp(argv[1], "top") == 0) { > + trace_top(argc, argv); > + exit(0); > + > } else if (strcmp(argv[1], "options") == 0) { > show_plugin_options(); > exit(0); > diff --git a/trace-local.h b/trace-local.h > index 62363d0f9248..c9eac07da79a 100644 > --- a/trace-local.h > +++ b/trace-local.h > @@ -74,6 +74,8 @@ void trace_mem(int argc, char **argv); > > void trace_stat(int argc, char **argv); > > +void trace_top(int argc, char **argv); > + > struct hook_list; > > int trace_profile_record(struct tracecmd_input *handle, > @@ -184,5 +186,6 @@ void update_first_instance(struct buffer_instance > *instance, int topt); > > void show_instance_file(struct buffer_instance *instance, const char *name); > int count_cpus(void); > +char *read_file(char *file, int *psize); > > #endif /* __TRACE_LOCAL_H */ > diff --git a/trace-record.c b/trace-record.c > index 22b6835c0d5b..b2ce59e7eb4f 100644 > --- a/trace-record.c > +++ b/trace-record.c > @@ -3323,7 +3323,7 @@ static char *read_instance_file(struct buffer_instance > *instance, char *file, in > return buf; > } > > -static char *read_file(char *file, int *psize) > +char *read_file(char *file, int
Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory
Sorry for the late reply. I finally have time to start looking at trace-cmd again. On Wed, 1 Mar 2017 20:32:37 +0530 Pratyush Anand wrote: > A new interface "trace-cmd top" has been introduced in this patch to > know peak memory usage of any task. > > $ trace-cmd top -s > It will start to gather statistics of all the process which will be > scheduled after this command execution. > $ trace-cmd top -p > It will print peak memory usage(in KB) against each scheduled task. > $ trace-cmd top -e > It will print peak memory usage(in KB) against each scheduled task and > will stop gathering statistics. It will also kill the child process > initiated by -s option. > > $ ./trace-cmd top -s > $ ./trace-cmd top -e > > commpeak memory(in KB) > NetworkManager 15912 > gnome-shell 114292 > gnome-shell 186356 > firefox 853244 > bash5984 > thunderbird 1896396 > kworker/3:0 0 > kworker/u16:1 0 > trace-cmd 256 This is as nice idea, but it needs some clean ups. I tried it out just to play with it. First, I just did: trace-cmd top And nothing happened. Either a help message needs to be printed, or it starts something. Maybe have it be interactive, that is, print out data as it comes in? Next I did: # trace-cmd top -s # trace-cmd top -p commpeak memory(in KB) systemd 9436 rcuc/0 0 rcuc/1 0 rcuc/2 0 rcuc/3 0 irq/24-ahci[000 0 jbd2/dm-0-8 232 systemd-journal 88704 smartd 4208 avahi-daemon3960 dbus-daemon 4504 in:imjournal51888 gdbus 15468 irq/26-em1 0 dhclient19328 rs:main Q:Reg 51888 sendmail4272 sendmail3792 sshd5676 bash14796 kworker/2:2 1448 kworker/u8:0904 emacs 66852 trace-cmd 20792 But then for some reason it -p stopped doing anything. And -e didn't stop it either. But after killing it manually and removing the pid file, it appeared to work normal again. I can't seem to repeat it, so lets not worry about that now. More below. > > Signed-off-by: Pratyush Anand > --- > Makefile | 2 +- > trace-cmd.c| 4 + > trace-local.h | 3 + > trace-record.c | 2 +- > trace-top.c| 620 > + > trace-usage.c | 8 + > 6 files changed, 637 insertions(+), 2 deletions(-) > create mode 100644 trace-top.c > > diff --git a/Makefile b/Makefile > index 62cb25b6c1bd..5324c141d3d6 100644 > --- a/Makefile > +++ b/Makefile > @@ -331,7 +331,7 @@ TRACE_GUI_OBJS = trace-filter.o trace-compat.o > trace-filter-hash.o trace-dialog. > trace-xml.o > TRACE_CMD_OBJS = trace-cmd.o trace-record.o trace-read.o trace-split.o > trace-listen.o \ >trace-stack.o trace-hist.o trace-mem.o trace-snapshot.o trace-stat.o \ > - trace-hash.o trace-profile.o trace-stream.o > + trace-hash.o trace-profile.o trace-stream.o trace-top.o > TRACE_VIEW_OBJS = trace-view.o trace-view-store.o > TRACE_GRAPH_OBJS = trace-graph.o trace-plot.o trace-plot-cpu.o > trace-plot-task.o > TRACE_VIEW_MAIN_OBJS = trace-view-main.o $(TRACE_VIEW_OBJS) $(TRACE_GUI_OBJS) > diff --git a/trace-cmd.c b/trace-cmd.c > index 1a62faaf71ca..66e0f7cd37bc 100644 > --- a/trace-cmd.c > +++ b/trace-cmd.c > @@ -495,6 +495,10 @@ int main (int argc, char **argv) > trace_stat(argc, argv); > exit(0); > > + } else if (strcmp(argv[1], "top") == 0) { > + trace_top(argc, argv); > + exit(0); > + > } else if (strcmp(argv[1], "options") == 0) { > show_plugin_options(); > exit(0); > diff --git a/trace-local.h b/trace-local.h > index 62363d0f9248..c9eac07da79a 100644 > --- a/trace-local.h > +++ b/trace-local.h > @@ -74,6 +74,8 @@ void trace_mem(int argc, char **argv); > > void trace_stat(int argc, char **argv); > > +void trace_top(int argc, char **argv); > + > struct hook_list; > > int trace_profile_record(struct tracecmd_input *handle, > @@ -184,5 +186,6 @@ void update_first_instance(struct buffer_instance > *instance, int topt); > > void show_instance_file(struct buffer_instance *instance, const char *name); > int count_cpus(void); > +char *read_file(char *file, int *psize); > > #endif /* __TRACE_LOCAL_H */ > diff --git a/trace-record.c b/trace-record.c > index 22b6835c0d5b..b2ce59e7eb4f 100644 > --- a/trace-record.c > +++ b/trace-record.c > @@ -3323,7 +3323,7 @@ static char *read_instance_file(struct buffer_instance > *instance, char *file, in > return buf; > } > > -static char *read_file(char *file, int *psize) > +char *read_file(char *file, int *psize) > { > return