Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-27 Thread Janani Ravichandran

> On Oct 21, 2016, at 3:08 AM, Michal Hocko  wrote:
> 
> Interesting.
> $ cat /debug/tracing/available_tracers 
> function_graph preemptirqsoff preemptoff irqsoff function nop
> 
> Do I have to configure anything specially? And if I do why isn't it any
> better to simply add a start tracepoint and make this available also to
> older kernels?

Well, you just need to enable the function_graph tracer in the tracing 
directory:
echo function_graph > current_tracer,

set funcgraph-proc in trace_options to get process information:
echo funcgraph-proc > trace_options,

set funcgraph-abstime in trace_options to get timestamp,
echo funcgraph-abstime > trace_options

set all the functions we’d like to observe as filters. For e.g.
echo __alloc_pages_nodemask > set_ftrace_filter

and enable the tracepoints we would like to get information from.

I didn’t add a begin tracepoint for this as Steven Rostedt had suggested
using function graph instead of begin/end tracepoints (we already have
a tracepoint in __alloc_pages_nodemask - trace_mm_page_alloc to get 
some information about the allocation and we can just use function graph 
to see how long __alloc_pages_nodemask() takes).

Janani

> -- 
> Michal Hocko
> SUSE Labs



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-27 Thread Janani Ravichandran

> On Oct 21, 2016, at 3:08 AM, Michal Hocko  wrote:
> 
> Interesting.
> $ cat /debug/tracing/available_tracers 
> function_graph preemptirqsoff preemptoff irqsoff function nop
> 
> Do I have to configure anything specially? And if I do why isn't it any
> better to simply add a start tracepoint and make this available also to
> older kernels?

Well, you just need to enable the function_graph tracer in the tracing 
directory:
echo function_graph > current_tracer,

set funcgraph-proc in trace_options to get process information:
echo funcgraph-proc > trace_options,

set funcgraph-abstime in trace_options to get timestamp,
echo funcgraph-abstime > trace_options

set all the functions we’d like to observe as filters. For e.g.
echo __alloc_pages_nodemask > set_ftrace_filter

and enable the tracepoints we would like to get information from.

I didn’t add a begin tracepoint for this as Steven Rostedt had suggested
using function graph instead of begin/end tracepoints (we already have
a tracepoint in __alloc_pages_nodemask - trace_mm_page_alloc to get 
some information about the allocation and we can just use function graph 
to see how long __alloc_pages_nodemask() takes).

Janani

> -- 
> Michal Hocko
> SUSE Labs



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-21 Thread Michal Hocko
On Thu 20-10-16 18:10:37, Janani Ravichandran wrote:
> Michal,
> 
> > On Oct 18, 2016, at 8:13 AM, Michal Hocko  wrote:
> > 
> >> 
> > 
> > yes, function_graph tracer will give you _some_ information but it will
> > not have the context you are looking for, right? See the following
> > example
> > 
> > --
> > 0) x-www-b-22756  =>  x-termi-4083 
> > --
> > 
> > 0)   |  __alloc_pages_nodemask() {
> > 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
> > order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 0)   3.328 us|  }
> > 3)   |  __alloc_pages_nodemask() {
> > 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
> > order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 3)   1.011 us|  }
> > 0)   |  __alloc_pages_nodemask() {
> > 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
> > order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 0)   0.587 us|  }
> > 3)   |  __alloc_pages_nodemask() {
> > 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
> > order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 3)   1.125 us|  }
> > 
> > How do I know which process has performed those allocations? I know that
> > CPU0 should be running x-termi-4083 but what is running on other CPUs?
> > 
> > Let me explain my usecase I am very interested in. Say I that a usespace
> > application is not performing well. I would like to see some statistics
> > about memory allocations performed for that app - are there few outliers
> > or the allocation stalls increase gradually? Where do we spend time during
> > that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?
> > 
> > To answer those questions I need to track particular events (alocation,
> > reclaim, compaction) to the process and know how long each step
> > took. Maybe we can reconstruct something from the above output but it is
> > a major PITA.  If we either hard start/stop pairs for each step (which
> > we already do have for reclaim, compaction AFAIR) then this is an easy
> > scripting. Another option would be to have only a single tracepoint for
> > each step with a timing information.
> > 
> > See my point?
> 
> Yes, if we want to know what processes are running on what CPUs,
> echo funcgraph-proc > trace_options in the tracing directory should give us
> what we want.

Interesting.
$ cat /debug/tracing/available_tracers 
function_graph preemptirqsoff preemptoff irqsoff function nop

Do I have to configure anything specially? And if I do why isn't it any
better to simply add a start tracepoint and make this available also to
older kernels?
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-21 Thread Michal Hocko
On Thu 20-10-16 18:10:37, Janani Ravichandran wrote:
> Michal,
> 
> > On Oct 18, 2016, at 8:13 AM, Michal Hocko  wrote:
> > 
> >> 
> > 
> > yes, function_graph tracer will give you _some_ information but it will
> > not have the context you are looking for, right? See the following
> > example
> > 
> > --
> > 0) x-www-b-22756  =>  x-termi-4083 
> > --
> > 
> > 0)   |  __alloc_pages_nodemask() {
> > 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
> > order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 0)   3.328 us|  }
> > 3)   |  __alloc_pages_nodemask() {
> > 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
> > order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 3)   1.011 us|  }
> > 0)   |  __alloc_pages_nodemask() {
> > 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
> > order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 0)   0.587 us|  }
> > 3)   |  __alloc_pages_nodemask() {
> > 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
> > order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 3)   1.125 us|  }
> > 
> > How do I know which process has performed those allocations? I know that
> > CPU0 should be running x-termi-4083 but what is running on other CPUs?
> > 
> > Let me explain my usecase I am very interested in. Say I that a usespace
> > application is not performing well. I would like to see some statistics
> > about memory allocations performed for that app - are there few outliers
> > or the allocation stalls increase gradually? Where do we spend time during
> > that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?
> > 
> > To answer those questions I need to track particular events (alocation,
> > reclaim, compaction) to the process and know how long each step
> > took. Maybe we can reconstruct something from the above output but it is
> > a major PITA.  If we either hard start/stop pairs for each step (which
> > we already do have for reclaim, compaction AFAIR) then this is an easy
> > scripting. Another option would be to have only a single tracepoint for
> > each step with a timing information.
> > 
> > See my point?
> 
> Yes, if we want to know what processes are running on what CPUs,
> echo funcgraph-proc > trace_options in the tracing directory should give us
> what we want.

Interesting.
$ cat /debug/tracing/available_tracers 
function_graph preemptirqsoff preemptoff irqsoff function nop

Do I have to configure anything specially? And if I do why isn't it any
better to simply add a start tracepoint and make this available also to
older kernels?
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-20 Thread Janani Ravichandran
Michal,

> On Oct 18, 2016, at 8:13 AM, Michal Hocko  wrote:
> 
>> 
> 
> yes, function_graph tracer will give you _some_ information but it will
> not have the context you are looking for, right? See the following
> example
> 
> --
> 0) x-www-b-22756  =>  x-termi-4083 
> --
> 
> 0)   |  __alloc_pages_nodemask() {
> 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
> order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 0)   3.328 us|  }
> 3)   |  __alloc_pages_nodemask() {
> 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
> order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 3)   1.011 us|  }
> 0)   |  __alloc_pages_nodemask() {
> 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
> order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 0)   0.587 us|  }
> 3)   |  __alloc_pages_nodemask() {
> 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
> order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 3)   1.125 us|  }
> 
> How do I know which process has performed those allocations? I know that
> CPU0 should be running x-termi-4083 but what is running on other CPUs?
> 
> Let me explain my usecase I am very interested in. Say I that a usespace
> application is not performing well. I would like to see some statistics
> about memory allocations performed for that app - are there few outliers
> or the allocation stalls increase gradually? Where do we spend time during
> that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?
> 
> To answer those questions I need to track particular events (alocation,
> reclaim, compaction) to the process and know how long each step
> took. Maybe we can reconstruct something from the above output but it is
> a major PITA.  If we either hard start/stop pairs for each step (which
> we already do have for reclaim, compaction AFAIR) then this is an easy
> scripting. Another option would be to have only a single tracepoint for
> each step with a timing information.
> 
> See my point?

Yes, if we want to know what processes are running on what CPUs,
echo funcgraph-proc > trace_options in the tracing directory should give us
what we want.

The bash script which is part of this patch does this kind of setup for you.
As a result, the output you get is something like what you see here:

https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_no_threshold.txt

Does this answer your question? Let me know if otherwise.

Janani.

> -- 
> Michal Hocko
> SUSE Labs



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-20 Thread Janani Ravichandran
Michal,

> On Oct 18, 2016, at 8:13 AM, Michal Hocko  wrote:
> 
>> 
> 
> yes, function_graph tracer will give you _some_ information but it will
> not have the context you are looking for, right? See the following
> example
> 
> --
> 0) x-www-b-22756  =>  x-termi-4083 
> --
> 
> 0)   |  __alloc_pages_nodemask() {
> 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
> order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 0)   3.328 us|  }
> 3)   |  __alloc_pages_nodemask() {
> 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
> order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 3)   1.011 us|  }
> 0)   |  __alloc_pages_nodemask() {
> 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
> order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 0)   0.587 us|  }
> 3)   |  __alloc_pages_nodemask() {
> 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
> order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 3)   1.125 us|  }
> 
> How do I know which process has performed those allocations? I know that
> CPU0 should be running x-termi-4083 but what is running on other CPUs?
> 
> Let me explain my usecase I am very interested in. Say I that a usespace
> application is not performing well. I would like to see some statistics
> about memory allocations performed for that app - are there few outliers
> or the allocation stalls increase gradually? Where do we spend time during
> that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?
> 
> To answer those questions I need to track particular events (alocation,
> reclaim, compaction) to the process and know how long each step
> took. Maybe we can reconstruct something from the above output but it is
> a major PITA.  If we either hard start/stop pairs for each step (which
> we already do have for reclaim, compaction AFAIR) then this is an easy
> scripting. Another option would be to have only a single tracepoint for
> each step with a timing information.
> 
> See my point?

Yes, if we want to know what processes are running on what CPUs,
echo funcgraph-proc > trace_options in the tracing directory should give us
what we want.

The bash script which is part of this patch does this kind of setup for you.
As a result, the output you get is something like what you see here:

https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_no_threshold.txt

Does this answer your question? Let me know if otherwise.

Janani.

> -- 
> Michal Hocko
> SUSE Labs



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-18 Thread Michal Hocko
On Mon 17-10-16 13:31:57, Janani Ravichandran wrote:
> 
> > On Oct 17, 2016, at 1:24 PM, Janani Ravichandran 
> >  wrote:
> > 
> > 
> > On Sun, Oct 16, 2016 at 3:33 AM, Michal Hocko  > > wrote:
> > 
> > trace_mm_page_alloc will tell you details about the allocation, like
> > gfp mask, order but it doesn't tell you how long the allocation took at
> > its current form. So either you have to note jiffies at the allocation
> > start and then add the end-start in the trace point or we really need
> > another trace point to note the start. The later has an advantage that
> > we do not add unnecessary load for jiffies when the tracepoint is
> > disabled.
> 
> The function graph tracer can tell us how long alloc_pages_nodemask() took.
> Can’t that, combined with the context information given by trace_mm_page_alloc
> give us what we want? Correct me if I am wrong.

yes, function_graph tracer will give you _some_ information but it will
not have the context you are looking for, right? See the following
example

 --
 0) x-www-b-22756  =>  x-termi-4083 
 --

 0)   |  __alloc_pages_nodemask() {
 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 0)   3.328 us|  }
 3)   |  __alloc_pages_nodemask() {
 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 3)   1.011 us|  }
 0)   |  __alloc_pages_nodemask() {
 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 0)   0.587 us|  }
 3)   |  __alloc_pages_nodemask() {
 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 3)   1.125 us|  }

How do I know which process has performed those allocations? I know that
CPU0 should be running x-termi-4083 but what is running on other CPUs?

Let me explain my usecase I am very interested in. Say I that a usespace
application is not performing well. I would like to see some statistics
about memory allocations performed for that app - are there few outliers
or the allocation stalls increase gradually? Where do we spend time during
that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?

To answer those questions I need to track particular events (alocation,
reclaim, compaction) to the process and know how long each step
took. Maybe we can reconstruct something from the above output but it is
a major PITA.  If we either hard start/stop pairs for each step (which
we already do have for reclaim, compaction AFAIR) then this is an easy
scripting. Another option would be to have only a single tracepoint for
each step with a timing information.

See my point?
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-18 Thread Michal Hocko
On Mon 17-10-16 13:31:57, Janani Ravichandran wrote:
> 
> > On Oct 17, 2016, at 1:24 PM, Janani Ravichandran 
> >  wrote:
> > 
> > 
> > On Sun, Oct 16, 2016 at 3:33 AM, Michal Hocko  > > wrote:
> > 
> > trace_mm_page_alloc will tell you details about the allocation, like
> > gfp mask, order but it doesn't tell you how long the allocation took at
> > its current form. So either you have to note jiffies at the allocation
> > start and then add the end-start in the trace point or we really need
> > another trace point to note the start. The later has an advantage that
> > we do not add unnecessary load for jiffies when the tracepoint is
> > disabled.
> 
> The function graph tracer can tell us how long alloc_pages_nodemask() took.
> Can’t that, combined with the context information given by trace_mm_page_alloc
> give us what we want? Correct me if I am wrong.

yes, function_graph tracer will give you _some_ information but it will
not have the context you are looking for, right? See the following
example

 --
 0) x-www-b-22756  =>  x-termi-4083 
 --

 0)   |  __alloc_pages_nodemask() {
 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 0)   3.328 us|  }
 3)   |  __alloc_pages_nodemask() {
 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 3)   1.011 us|  }
 0)   |  __alloc_pages_nodemask() {
 0)   |  /* mm_page_alloc: page=ea000411b380 pfn=1066702 
order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 0)   0.587 us|  }
 3)   |  __alloc_pages_nodemask() {
 3)   |  /* mm_page_alloc: page=ea0008f1f6c0 pfn=2344923 
order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 3)   1.125 us|  }

How do I know which process has performed those allocations? I know that
CPU0 should be running x-termi-4083 but what is running on other CPUs?

Let me explain my usecase I am very interested in. Say I that a usespace
application is not performing well. I would like to see some statistics
about memory allocations performed for that app - are there few outliers
or the allocation stalls increase gradually? Where do we spend time during
that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?

To answer those questions I need to track particular events (alocation,
reclaim, compaction) to the process and know how long each step
took. Maybe we can reconstruct something from the above output but it is
a major PITA.  If we either hard start/stop pairs for each step (which
we already do have for reclaim, compaction AFAIR) then this is an easy
scripting. Another option would be to have only a single tracepoint for
each step with a timing information.

See my point?
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-16 Thread Michal Hocko
On Sat 15-10-16 19:31:22, Janani Ravichandran wrote:
> 
> > On Oct 11, 2016, at 10:43 AM, Janani Ravichandran 
> >  wrote:
> > 
> > Alright. I’ll add a starting tracepoint, change the script accordingly and 
> > send a v2. Thanks!
> > 
> I looked at it again and I think that the context information we need 
> can be obtained from the tracepoint trace_mm_page_alloc in 
> alloc_pages_nodemask().

trace_mm_page_alloc will tell you details about the allocation, like
gfp mask, order but it doesn't tell you how long the allocation took at
its current form. So either you have to note jiffies at the allocation
start and then add the end-start in the trace point or we really need
another trace point to note the start. The later has an advantage that
we do not add unnecessary load for jiffies when the tracepoint is
disabled.
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-16 Thread Michal Hocko
On Sat 15-10-16 19:31:22, Janani Ravichandran wrote:
> 
> > On Oct 11, 2016, at 10:43 AM, Janani Ravichandran 
> >  wrote:
> > 
> > Alright. I’ll add a starting tracepoint, change the script accordingly and 
> > send a v2. Thanks!
> > 
> I looked at it again and I think that the context information we need 
> can be obtained from the tracepoint trace_mm_page_alloc in 
> alloc_pages_nodemask().

trace_mm_page_alloc will tell you details about the allocation, like
gfp mask, order but it doesn't tell you how long the allocation took at
its current form. So either you have to note jiffies at the allocation
start and then add the end-start in the trace point or we really need
another trace point to note the start. The later has an advantage that
we do not add unnecessary load for jiffies when the tracepoint is
disabled.
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-15 Thread Janani Ravichandran

> On Oct 11, 2016, at 10:43 AM, Janani Ravichandran  
> wrote:
> 
> Alright. I’ll add a starting tracepoint, change the script accordingly and 
> send a v2. Thanks!
> 
I looked at it again and I think that the context information we need 
can be obtained from the tracepoint trace_mm_page_alloc in 
alloc_pages_nodemask().

I’ll include that tracepoint in the script and send it along with the other
changes you suggested, if you’re fine with it.

Thanks!
Janani.
>> 
> 



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-15 Thread Janani Ravichandran

> On Oct 11, 2016, at 10:43 AM, Janani Ravichandran  
> wrote:
> 
> Alright. I’ll add a starting tracepoint, change the script accordingly and 
> send a v2. Thanks!
> 
I looked at it again and I think that the context information we need 
can be obtained from the tracepoint trace_mm_page_alloc in 
alloc_pages_nodemask().

I’ll include that tracepoint in the script and send it along with the other
changes you suggested, if you’re fine with it.

Thanks!
Janani.
>> 
> 



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-11 Thread Janani Ravichandran
Hi Michal,

> 
Extremely sorry for the delayed response.

> Then I really think that we need a starting trace point. I think that
> having the full context information is really helpful in order to
> understand latencies induced by allocations.
> — 

Alright. I’ll add a starting tracepoint, change the script accordingly and 
send a v2. Thanks!

Regards,
Janani.
> 



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-11 Thread Janani Ravichandran
Hi Michal,

> 
Extremely sorry for the delayed response.

> Then I really think that we need a starting trace point. I think that
> having the full context information is really helpful in order to
> understand latencies induced by allocations.
> — 

Alright. I’ll add a starting tracepoint, change the script accordingly and 
send a v2. Thanks!

Regards,
Janani.
> 



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-06 Thread Michal Hocko
On Fri 23-09-16 10:07:09, Michal Hocko wrote:
> On Thu 22-09-16 11:30:36, Janani Ravichandran wrote:
> > 
> > > On Sep 19, 2016, at 5:42 AM, Michal Hocko  wrote:
> > > 
> > > On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> > >> 
> > >>> On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
> > >> 
> > >> I’m using the function graph tracer to see how long 
> > >> __alloc_pages_nodemask()
> > >> took.
> > > 
> > > How can you map the function graph tracer to a specif context? Let's say
> > > I would like to know why a particular allocation took so long. Would
> > > that be possible?
> > 
> > Maybe not. If the latencies are due to direct reclaim or memory compaction, 
> > you
> > get some information from the tracepoints (like 
> > mm_vmscan_direct_reclaim_begin,
> > mm_compaction_begin, etc). But otherwise, you don’t get any context 
> > information. 
> > Function graph only gives the time spent in alloc_pages_nodemask() in that 
> > case.
> 
> Then I really think that we need a starting trace point. I think that
> having the full context information is really helpful in order to
> understand latencies induced by allocations.

Are you planning to pursue this path?
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-10-06 Thread Michal Hocko
On Fri 23-09-16 10:07:09, Michal Hocko wrote:
> On Thu 22-09-16 11:30:36, Janani Ravichandran wrote:
> > 
> > > On Sep 19, 2016, at 5:42 AM, Michal Hocko  wrote:
> > > 
> > > On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> > >> 
> > >>> On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
> > >> 
> > >> I’m using the function graph tracer to see how long 
> > >> __alloc_pages_nodemask()
> > >> took.
> > > 
> > > How can you map the function graph tracer to a specif context? Let's say
> > > I would like to know why a particular allocation took so long. Would
> > > that be possible?
> > 
> > Maybe not. If the latencies are due to direct reclaim or memory compaction, 
> > you
> > get some information from the tracepoints (like 
> > mm_vmscan_direct_reclaim_begin,
> > mm_compaction_begin, etc). But otherwise, you don’t get any context 
> > information. 
> > Function graph only gives the time spent in alloc_pages_nodemask() in that 
> > case.
> 
> Then I really think that we need a starting trace point. I think that
> having the full context information is really helpful in order to
> understand latencies induced by allocations.

Are you planning to pursue this path?
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-23 Thread Michal Hocko
On Thu 22-09-16 11:30:36, Janani Ravichandran wrote:
> 
> > On Sep 19, 2016, at 5:42 AM, Michal Hocko  wrote:
> > 
> > On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> >> 
> >>> On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
> >> 
> >> I’m using the function graph tracer to see how long 
> >> __alloc_pages_nodemask()
> >> took.
> > 
> > How can you map the function graph tracer to a specif context? Let's say
> > I would like to know why a particular allocation took so long. Would
> > that be possible?
> 
> Maybe not. If the latencies are due to direct reclaim or memory compaction, 
> you
> get some information from the tracepoints (like 
> mm_vmscan_direct_reclaim_begin,
> mm_compaction_begin, etc). But otherwise, you don’t get any context 
> information. 
> Function graph only gives the time spent in alloc_pages_nodemask() in that 
> case.

Then I really think that we need a starting trace point. I think that
having the full context information is really helpful in order to
understand latencies induced by allocations.
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-23 Thread Michal Hocko
On Thu 22-09-16 11:30:36, Janani Ravichandran wrote:
> 
> > On Sep 19, 2016, at 5:42 AM, Michal Hocko  wrote:
> > 
> > On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> >> 
> >>> On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
> >> 
> >> I’m using the function graph tracer to see how long 
> >> __alloc_pages_nodemask()
> >> took.
> > 
> > How can you map the function graph tracer to a specif context? Let's say
> > I would like to know why a particular allocation took so long. Would
> > that be possible?
> 
> Maybe not. If the latencies are due to direct reclaim or memory compaction, 
> you
> get some information from the tracepoints (like 
> mm_vmscan_direct_reclaim_begin,
> mm_compaction_begin, etc). But otherwise, you don’t get any context 
> information. 
> Function graph only gives the time spent in alloc_pages_nodemask() in that 
> case.

Then I really think that we need a starting trace point. I think that
having the full context information is really helpful in order to
understand latencies induced by allocations.
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-22 Thread Janani Ravichandran

> On Sep 19, 2016, at 5:42 AM, Michal Hocko  wrote:
> 
> On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
>> 
>>> On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
>> 
>> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
>> took.
> 
> How can you map the function graph tracer to a specif context? Let's say
> I would like to know why a particular allocation took so long. Would
> that be possible?

Maybe not. If the latencies are due to direct reclaim or memory compaction, you
get some information from the tracepoints (like mm_vmscan_direct_reclaim_begin,
mm_compaction_begin, etc). But otherwise, you don’t get any context 
information. 
Function graph only gives the time spent in alloc_pages_nodemask() in that case.


Regards,
Janani.
> 
> -- 
> Michal Hocko
> SUSE Labs



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-22 Thread Janani Ravichandran

> On Sep 19, 2016, at 5:42 AM, Michal Hocko  wrote:
> 
> On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
>> 
>>> On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
>> 
>> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
>> took.
> 
> How can you map the function graph tracer to a specif context? Let's say
> I would like to know why a particular allocation took so long. Would
> that be possible?

Maybe not. If the latencies are due to direct reclaim or memory compaction, you
get some information from the tracepoints (like mm_vmscan_direct_reclaim_begin,
mm_compaction_begin, etc). But otherwise, you don’t get any context 
information. 
Function graph only gives the time spent in alloc_pages_nodemask() in that case.


Regards,
Janani.
> 
> -- 
> Michal Hocko
> SUSE Labs



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-19 Thread Michal Hocko
On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> 
> > On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
> > 
> > Hi,
> 
> Hello Michal,
> 
> > I am sorry I didn't follow up on the previous submission.
> 
> That’s alright :)
> 
> > I find this
> > _really_ helpful. It is great that you could build on top of existing
> > tracepoints but one thing is not entirely clear to me. Without a begin
> > marker in __alloc_pages_nodemask we cannot really tell how long the
> > whole allocation took, which would be extremely useful. Or do you use
> > any graph tracer tricks to deduce that?
> 
> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
> took.

How can you map the function graph tracer to a specif context? Let's say
I would like to know why a particular allocation took so long. Would
that be possible?

-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-19 Thread Michal Hocko
On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> 
> > On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
> > 
> > Hi,
> 
> Hello Michal,
> 
> > I am sorry I didn't follow up on the previous submission.
> 
> That’s alright :)
> 
> > I find this
> > _really_ helpful. It is great that you could build on top of existing
> > tracepoints but one thing is not entirely clear to me. Without a begin
> > marker in __alloc_pages_nodemask we cannot really tell how long the
> > whole allocation took, which would be extremely useful. Or do you use
> > any graph tracer tricks to deduce that?
> 
> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
> took.

How can you map the function graph tracer to a specif context? Let's say
I would like to know why a particular allocation took so long. Would
that be possible?

-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-13 Thread Janani Ravichandran

> On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
> 
> Hi,

Hello Michal,

> I am sorry I didn't follow up on the previous submission.

That’s alright :)

> I find this
> _really_ helpful. It is great that you could build on top of existing
> tracepoints but one thing is not entirely clear to me. Without a begin
> marker in __alloc_pages_nodemask we cannot really tell how long the
> whole allocation took, which would be extremely useful. Or do you use
> any graph tracer tricks to deduce that?

I’m using the function graph tracer to see how long __alloc_pages_nodemask()
took.


> There is a note in your
> changelog but I cannot seem to find that in the changelog. And FWIW I
> would be open to adding a tracepoint like that. It would make our life
> so much easier…

The line
echo __alloc_pages_nodemask > set_ftrace_filter in setup_alloc_trace.sh
sets __alloc_pages_nodemask as a function graph filter and this should help
us observe how long the function took.

> 
> On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
> [...]
>> allocation_postprocess.py is a script which reads from trace_pipe. It
>> does the following to filter out info from tracepoints that may not
>> be important:
>> 
>> 1. Displays mm_vmscan_direct_reclaim_begin and
>> mm_vmscan_direct_reclaim_end only when try_to_free_pages has
>> exceeded the threshold.
>> 2. Displays mm_compaction_begin and mm_compaction_end only when
>> compact_zone has exceeded the threshold.
>> 3. Displays mm_compaction_try_to_compat_pages only when
>> try_to_compact_pages has exceeded the threshold.
>> 4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
>> the time elapsed between them exceeds the threshold.
>> 5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
>> has exceeded the threshold.
>> 
>> When CTRL+C is pressed, the script shows the times taken by the
>> shrinkers. However, currently it is not possible to differentiate among
>> the
>> superblock shrinkers.
>> 
>> Sample output:
>> ^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
>> 0.278000 ms
>> ext4_es_scan : total time = 0.97 ms, max latency = 0.129000 ms
>> scan_shadow_nodes : total time = 1.15 ms, max latency = 0.175000 ms
>> super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
>> deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
>> ms
> 
> Would it be possible to group those per the context?

Absolutely!
> I mean a single
> allocation/per-process drop down values rather than mixing all those
> values together? For example if I see that a process is talling due to
> direct reclaim I would love to see what is the worst case allocation
> stall and what is the most probable source of that stall. Mixing kswapd
> traces would be misleading here.
> 

True. I’ll do that and send a v2. Thanks for the suggestions!

Janani
> 



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-13 Thread Janani Ravichandran

> On Sep 12, 2016, at 8:16 AM, Michal Hocko  wrote:
> 
> Hi,

Hello Michal,

> I am sorry I didn't follow up on the previous submission.

That’s alright :)

> I find this
> _really_ helpful. It is great that you could build on top of existing
> tracepoints but one thing is not entirely clear to me. Without a begin
> marker in __alloc_pages_nodemask we cannot really tell how long the
> whole allocation took, which would be extremely useful. Or do you use
> any graph tracer tricks to deduce that?

I’m using the function graph tracer to see how long __alloc_pages_nodemask()
took.


> There is a note in your
> changelog but I cannot seem to find that in the changelog. And FWIW I
> would be open to adding a tracepoint like that. It would make our life
> so much easier…

The line
echo __alloc_pages_nodemask > set_ftrace_filter in setup_alloc_trace.sh
sets __alloc_pages_nodemask as a function graph filter and this should help
us observe how long the function took.

> 
> On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
> [...]
>> allocation_postprocess.py is a script which reads from trace_pipe. It
>> does the following to filter out info from tracepoints that may not
>> be important:
>> 
>> 1. Displays mm_vmscan_direct_reclaim_begin and
>> mm_vmscan_direct_reclaim_end only when try_to_free_pages has
>> exceeded the threshold.
>> 2. Displays mm_compaction_begin and mm_compaction_end only when
>> compact_zone has exceeded the threshold.
>> 3. Displays mm_compaction_try_to_compat_pages only when
>> try_to_compact_pages has exceeded the threshold.
>> 4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
>> the time elapsed between them exceeds the threshold.
>> 5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
>> has exceeded the threshold.
>> 
>> When CTRL+C is pressed, the script shows the times taken by the
>> shrinkers. However, currently it is not possible to differentiate among
>> the
>> superblock shrinkers.
>> 
>> Sample output:
>> ^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
>> 0.278000 ms
>> ext4_es_scan : total time = 0.97 ms, max latency = 0.129000 ms
>> scan_shadow_nodes : total time = 1.15 ms, max latency = 0.175000 ms
>> super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
>> deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
>> ms
> 
> Would it be possible to group those per the context?

Absolutely!
> I mean a single
> allocation/per-process drop down values rather than mixing all those
> values together? For example if I see that a process is talling due to
> direct reclaim I would love to see what is the worst case allocation
> stall and what is the most probable source of that stall. Mixing kswapd
> traces would be misleading here.
> 

True. I’ll do that and send a v2. Thanks for the suggestions!

Janani
> 



Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-12 Thread Michal Hocko
Hi,
I am sorry I didn't follow up on the previous submission. I find this
_really_ helpful. It is great that you could build on top of existing
tracepoints but one thing is not entirely clear to me. Without a begin
marker in __alloc_pages_nodemask we cannot really tell how long the
whole allocation took, which would be extremely useful. Or do you use
any graph tracer tricks to deduce that? There is a note in your
changelog but I cannot seem to find that in the changelog. And FWIW I
would be open to adding a tracepoint like that. It would make our life
so much easier...

On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
[...]
> allocation_postprocess.py is a script which reads from trace_pipe. It
> does the following to filter out info from tracepoints that may not
> be important:
> 
> 1. Displays mm_vmscan_direct_reclaim_begin and
> mm_vmscan_direct_reclaim_end only when try_to_free_pages has
> exceeded the threshold.
> 2. Displays mm_compaction_begin and mm_compaction_end only when
> compact_zone has exceeded the threshold.
> 3. Displays mm_compaction_try_to_compat_pages only when
> try_to_compact_pages has exceeded the threshold.
> 4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
> the time elapsed between them exceeds the threshold.
> 5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
> has exceeded the threshold.
> 
> When CTRL+C is pressed, the script shows the times taken by the
> shrinkers. However, currently it is not possible to differentiate among
> the
> superblock shrinkers.
> 
> Sample output:
> ^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
> 0.278000 ms
> ext4_es_scan : total time = 0.97 ms, max latency = 0.129000 ms
> scan_shadow_nodes : total time = 1.15 ms, max latency = 0.175000 ms
> super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
> deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
> ms

Would it be possible to group those per the context? I mean a single
allocation/per-process drop down values rather than mixing all those
values together? For example if I see that a process is talling due to
direct reclaim I would love to see what is the worst case allocation
stall and what is the most probable source of that stall. Mixing kswapd
traces would be misleading here.

Thanks!
-- 
Michal Hocko
SUSE Labs


Re: [RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-12 Thread Michal Hocko
Hi,
I am sorry I didn't follow up on the previous submission. I find this
_really_ helpful. It is great that you could build on top of existing
tracepoints but one thing is not entirely clear to me. Without a begin
marker in __alloc_pages_nodemask we cannot really tell how long the
whole allocation took, which would be extremely useful. Or do you use
any graph tracer tricks to deduce that? There is a note in your
changelog but I cannot seem to find that in the changelog. And FWIW I
would be open to adding a tracepoint like that. It would make our life
so much easier...

On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
[...]
> allocation_postprocess.py is a script which reads from trace_pipe. It
> does the following to filter out info from tracepoints that may not
> be important:
> 
> 1. Displays mm_vmscan_direct_reclaim_begin and
> mm_vmscan_direct_reclaim_end only when try_to_free_pages has
> exceeded the threshold.
> 2. Displays mm_compaction_begin and mm_compaction_end only when
> compact_zone has exceeded the threshold.
> 3. Displays mm_compaction_try_to_compat_pages only when
> try_to_compact_pages has exceeded the threshold.
> 4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
> the time elapsed between them exceeds the threshold.
> 5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
> has exceeded the threshold.
> 
> When CTRL+C is pressed, the script shows the times taken by the
> shrinkers. However, currently it is not possible to differentiate among
> the
> superblock shrinkers.
> 
> Sample output:
> ^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
> 0.278000 ms
> ext4_es_scan : total time = 0.97 ms, max latency = 0.129000 ms
> scan_shadow_nodes : total time = 1.15 ms, max latency = 0.175000 ms
> super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
> deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
> ms

Would it be possible to group those per the context? I mean a single
allocation/per-process drop down values rather than mixing all those
values together? For example if I see that a process is talling due to
direct reclaim I would love to see what is the worst case allocation
stall and what is the most probable source of that stall. Mixing kswapd
traces would be misleading here.

Thanks!
-- 
Michal Hocko
SUSE Labs


[RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-11 Thread Janani Ravichandran
The objective of this patch is to help users observe latencies in memory
allocation.
The function graph tracer is great for seeing how long functions took to
execute. And often, tracepoints, along with the tracer help understand
situations better. However, while it is possible to set a threshold for
function graph and have only the functions that exceed a certain
threshold appear in the output (by echoing the threshold value in
tracing_thresh in the tracing directory), there is no method to filter
out tracepoints that are not associated with functions whose latencies
exceed the threshold.

When the threshold is set high, it is possible that a lot of information
that is of little interest to the user is printed from the tracepoints.
Limiting this information can help reduce disk I/O significantly.

This patch deals with latencies in memory allocation and more
specifically, direct reclaim and compaction.

setup_alloc_trace.sh is a bash script which handles the initial the
setup of
function graph, specifies which functions to include in the output and
enables some tracepoints of interest. Upon exit, it clears all the
values set.

The functions traced currently are __alloc_pages_nodemask,
try_to_free_pages, mem_cgroup_soft_limit_reclaim, shrink_node,
shrink_node_memcg, shrink_slab, shrink_active_list,
shrink_inactive_list, compact_zone and try_to_compact_pages.

The tracepoints enabled are mm_shrink_slab_start,
mm_slab_slab_end, mm_vmscan_direct_reclaim_begin,
mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive,
mm_compaction_begin, mm_compation_end,
mm_compaction_try_to_compact_pages.

More functions can be traced as desired by making changes to
setup_alloc_trace.sh accordingly.

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among
the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.97 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.15 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Usage:

# ./setup_alloc_trace.sh -t 134 -s /sys/kernel/debug/tracing > file.dat

Where -t represents threshold (134 us in this case) and -s represents
the path to the tracing diretory. The default is
/sys/kernel/debug/tracing, which is used when no path is specified.
The threshold on the other hand, must be set.

Ideas/ comments/ suggestions are welcome, escpecially on adherence to
the python coding style followed by the Linux community and the
functions enabled to be traced.

Thanks.

Signed-off-by: Janani Ravichandran 
---
 scripts/tracing/allocation_postprocess.py | 267 ++
 scripts/tracing/setup_alloc_trace.sh  |  88 ++
 2 files changed, 355 insertions(+)
 create mode 100755 scripts/tracing/allocation_postprocess.py
 create mode 100755 scripts/tracing/setup_alloc_trace.sh

diff --git a/scripts/tracing/allocation_postprocess.py 
b/scripts/tracing/allocation_postprocess.py
new file mode 100755
index 000..2f65457
--- /dev/null
+++ b/scripts/tracing/allocation_postprocess.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+# python 2.7
+
+"""
+This script uses function graph and some of the existing
+tracepoints to help people observe how long page allocations and some
+functions in the direct reclaim and compaction paths take.
+The functions and tracepoints enabled can be seen in setup_alloc_trace.sh.
+It reads from trace_pipe of the tracing directory and prints
+only those tracepoints and functions which are associated
+with latencies greater than the threshold specified.
+When CTRL+C is pressed, the times spent in the various shrinkers are displayed.
+The setup of trace is done in setup_alloc_trace.sh, from where this script is
+invoked.
+"""
+
+import argparse
+import re
+import sys
+import signal
+
+from collections import defaultdict
+
+# Constants for tracepoints
+
+DIRECT_RECLAIM_BEGIN= 1
+DIRECT_RECLAIM_END  = 2
+SHRINK_SLAB_START   = 3
+SHRINK_INACTIVE_LIST= 5
+TRY_TO_COMPACT

[RFC] scripts: Include postprocessing script for memory allocation tracing

2016-09-11 Thread Janani Ravichandran
The objective of this patch is to help users observe latencies in memory
allocation.
The function graph tracer is great for seeing how long functions took to
execute. And often, tracepoints, along with the tracer help understand
situations better. However, while it is possible to set a threshold for
function graph and have only the functions that exceed a certain
threshold appear in the output (by echoing the threshold value in
tracing_thresh in the tracing directory), there is no method to filter
out tracepoints that are not associated with functions whose latencies
exceed the threshold.

When the threshold is set high, it is possible that a lot of information
that is of little interest to the user is printed from the tracepoints.
Limiting this information can help reduce disk I/O significantly.

This patch deals with latencies in memory allocation and more
specifically, direct reclaim and compaction.

setup_alloc_trace.sh is a bash script which handles the initial the
setup of
function graph, specifies which functions to include in the output and
enables some tracepoints of interest. Upon exit, it clears all the
values set.

The functions traced currently are __alloc_pages_nodemask,
try_to_free_pages, mem_cgroup_soft_limit_reclaim, shrink_node,
shrink_node_memcg, shrink_slab, shrink_active_list,
shrink_inactive_list, compact_zone and try_to_compact_pages.

The tracepoints enabled are mm_shrink_slab_start,
mm_slab_slab_end, mm_vmscan_direct_reclaim_begin,
mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive,
mm_compaction_begin, mm_compation_end,
mm_compaction_try_to_compact_pages.

More functions can be traced as desired by making changes to
setup_alloc_trace.sh accordingly.

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among
the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.97 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.15 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Usage:

# ./setup_alloc_trace.sh -t 134 -s /sys/kernel/debug/tracing > file.dat

Where -t represents threshold (134 us in this case) and -s represents
the path to the tracing diretory. The default is
/sys/kernel/debug/tracing, which is used when no path is specified.
The threshold on the other hand, must be set.

Ideas/ comments/ suggestions are welcome, escpecially on adherence to
the python coding style followed by the Linux community and the
functions enabled to be traced.

Thanks.

Signed-off-by: Janani Ravichandran 
---
 scripts/tracing/allocation_postprocess.py | 267 ++
 scripts/tracing/setup_alloc_trace.sh  |  88 ++
 2 files changed, 355 insertions(+)
 create mode 100755 scripts/tracing/allocation_postprocess.py
 create mode 100755 scripts/tracing/setup_alloc_trace.sh

diff --git a/scripts/tracing/allocation_postprocess.py 
b/scripts/tracing/allocation_postprocess.py
new file mode 100755
index 000..2f65457
--- /dev/null
+++ b/scripts/tracing/allocation_postprocess.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+# python 2.7
+
+"""
+This script uses function graph and some of the existing
+tracepoints to help people observe how long page allocations and some
+functions in the direct reclaim and compaction paths take.
+The functions and tracepoints enabled can be seen in setup_alloc_trace.sh.
+It reads from trace_pipe of the tracing directory and prints
+only those tracepoints and functions which are associated
+with latencies greater than the threshold specified.
+When CTRL+C is pressed, the times spent in the various shrinkers are displayed.
+The setup of trace is done in setup_alloc_trace.sh, from where this script is
+invoked.
+"""
+
+import argparse
+import re
+import sys
+import signal
+
+from collections import defaultdict
+
+# Constants for tracepoints
+
+DIRECT_RECLAIM_BEGIN= 1
+DIRECT_RECLAIM_END  = 2
+SHRINK_SLAB_START   = 3
+SHRINK_INACTIVE_LIST= 5
+TRY_TO_COMPACT  = 6
+COMPACTION_BEGIN

[RFC] scripts: Include postprocessing script for memory allocation tracing

2016-08-19 Thread Janani Ravichandran
The objective of this patch is to help users observe latencies in memory
allocation.
The function graph tracer is great for seeing how long functions took to
execute. And often, tracepoints, along with the tracer help understand
situations better. However, while it is possible to set a threshold for
function graph and have only the functions that exceed a certain
threshold appear in the output (by echoing the threshold value in
tracing_thresh in the tracing directory), there is no method to filter
out tracepoints that are not associated with functions whose latencies
exceed the threshold.

When the threshold is set high, it is possible that a lot of information
that is of little interest to the user is printed from the tracepoints.
Limiting this information can help reduce disk I/O significantly.

This patch deals with latencies in memory allocation and more
specifically, direct reclaim and compaction.

setup_alloc_trace.sh is a bash script which handles the initial the setup of
function graph, specifies which functions to include in the output and
enables some tracepoints of interest. Upon exit, it clears all the
values set.

The functions traced currently are __alloc_pages_nodemask,
try_to_free_pages, mem_cgroup_soft_limit_reclaim, shrink_node,
shrink_node_memcg, shrink_slab, shrink_active_list,
shrink_inactive_list, compact_zone and try_to_compact_pages.

The tracepoints enabled are mm_shrink_slab_start,
mm_slab_slab_end, mm_vmscan_direct_reclaim_begin,
mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive,
mm_compaction_begin, mm_compation_end,
mm_compaction_try_to_compact_pages.

More functions can be traced as desired by making changes to
setup_alloc_trace.sh accordingly.

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.97 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.15 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Usage:

# ./setup_alloc_trace.sh -t 134 -s /sys/kernel/debug/tracing > file.dat

Where -t represents threshold (134 us in this case) and -s represents
the path to the tracing diretory. The default is
/sys/kernel/debug/tracing, which is used when no path is specified.
The threshold on the other hand, must be set.

Ideas/ comments/ suggestions are welcome, escpecially on adherence to
the python coding style followed by the Linux community and the
functions enabled to be traced.

Thanks.

Signed-off-by: Janani Ravichandran 
---
 scripts/tracing/allocation_postprocess.py | 267 ++
 scripts/tracing/setup_alloc_trace.sh  |  88 ++
 2 files changed, 355 insertions(+)
 create mode 100755 scripts/tracing/allocation_postprocess.py
 create mode 100755 scripts/tracing/setup_alloc_trace.sh

diff --git a/scripts/tracing/allocation_postprocess.py 
b/scripts/tracing/allocation_postprocess.py
new file mode 100755
index 000..2f65457
--- /dev/null
+++ b/scripts/tracing/allocation_postprocess.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+# python 2.7
+
+"""
+This script uses function graph and some of the existing
+tracepoints to help people observe how long page allocations and some
+functions in the direct reclaim and compaction paths take.
+The functions and tracepoints enabled can be seen in setup_alloc_trace.sh.
+It reads from trace_pipe of the tracing directory and prints
+only those tracepoints and functions which are associated
+with latencies greater than the threshold specified.
+When CTRL+C is pressed, the times spent in the various shrinkers are displayed.
+The setup of trace is done in setup_alloc_trace.sh, from where this script is
+invoked.
+"""
+
+import argparse
+import re
+import sys
+import signal
+
+from collections import defaultdict
+
+# Constants for tracepoints
+
+DIRECT_RECLAIM_BEGIN= 1
+DIRECT_RECLAIM_END  = 2
+SHRINK_SLAB_START   = 3
+SHRINK_INACTIVE_LIST= 5
+TRY_TO_COMPACT

[RFC] scripts: Include postprocessing script for memory allocation tracing

2016-08-19 Thread Janani Ravichandran
The objective of this patch is to help users observe latencies in memory
allocation.
The function graph tracer is great for seeing how long functions took to
execute. And often, tracepoints, along with the tracer help understand
situations better. However, while it is possible to set a threshold for
function graph and have only the functions that exceed a certain
threshold appear in the output (by echoing the threshold value in
tracing_thresh in the tracing directory), there is no method to filter
out tracepoints that are not associated with functions whose latencies
exceed the threshold.

When the threshold is set high, it is possible that a lot of information
that is of little interest to the user is printed from the tracepoints.
Limiting this information can help reduce disk I/O significantly.

This patch deals with latencies in memory allocation and more
specifically, direct reclaim and compaction.

setup_alloc_trace.sh is a bash script which handles the initial the setup of
function graph, specifies which functions to include in the output and
enables some tracepoints of interest. Upon exit, it clears all the
values set.

The functions traced currently are __alloc_pages_nodemask,
try_to_free_pages, mem_cgroup_soft_limit_reclaim, shrink_node,
shrink_node_memcg, shrink_slab, shrink_active_list,
shrink_inactive_list, compact_zone and try_to_compact_pages.

The tracepoints enabled are mm_shrink_slab_start,
mm_slab_slab_end, mm_vmscan_direct_reclaim_begin,
mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive,
mm_compaction_begin, mm_compation_end,
mm_compaction_try_to_compact_pages.

More functions can be traced as desired by making changes to
setup_alloc_trace.sh accordingly.

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.97 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.15 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Usage:

# ./setup_alloc_trace.sh -t 134 -s /sys/kernel/debug/tracing > file.dat

Where -t represents threshold (134 us in this case) and -s represents
the path to the tracing diretory. The default is
/sys/kernel/debug/tracing, which is used when no path is specified.
The threshold on the other hand, must be set.

Ideas/ comments/ suggestions are welcome, escpecially on adherence to
the python coding style followed by the Linux community and the
functions enabled to be traced.

Thanks.

Signed-off-by: Janani Ravichandran 
---
 scripts/tracing/allocation_postprocess.py | 267 ++
 scripts/tracing/setup_alloc_trace.sh  |  88 ++
 2 files changed, 355 insertions(+)
 create mode 100755 scripts/tracing/allocation_postprocess.py
 create mode 100755 scripts/tracing/setup_alloc_trace.sh

diff --git a/scripts/tracing/allocation_postprocess.py 
b/scripts/tracing/allocation_postprocess.py
new file mode 100755
index 000..2f65457
--- /dev/null
+++ b/scripts/tracing/allocation_postprocess.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+# python 2.7
+
+"""
+This script uses function graph and some of the existing
+tracepoints to help people observe how long page allocations and some
+functions in the direct reclaim and compaction paths take.
+The functions and tracepoints enabled can be seen in setup_alloc_trace.sh.
+It reads from trace_pipe of the tracing directory and prints
+only those tracepoints and functions which are associated
+with latencies greater than the threshold specified.
+When CTRL+C is pressed, the times spent in the various shrinkers are displayed.
+The setup of trace is done in setup_alloc_trace.sh, from where this script is
+invoked.
+"""
+
+import argparse
+import re
+import sys
+import signal
+
+from collections import defaultdict
+
+# Constants for tracepoints
+
+DIRECT_RECLAIM_BEGIN= 1
+DIRECT_RECLAIM_END  = 2
+SHRINK_SLAB_START   = 3
+SHRINK_INACTIVE_LIST= 5
+TRY_TO_COMPACT  = 6
+COMPACTION_BEGIN