Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-11 Thread Qais Yousef
On 01/11/21 15:04, Peter Zijlstra wrote:
> On Mon, Jan 04, 2021 at 06:26:42PM +, Qais Yousef wrote:
> 
> > So I have a proper patch for that now, that actually turned out to be really
> > tiny once you untangle exactly what is missing.
> > 
> > Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> > that?
> 
> In order to use these you need to rely on BTF to get anything useful
> done right? And anything that relies on BTF cannot be ABI.

Yes. To decode struct rq for instance one has to either hardcode it in their
program or use BTF to get the definition dynamically.

The worry is if we modify the function signature of the tracepoint. Alexei did
confirm this can't be an ABI and I'm adding additional documentation to make
this crystal clear.

Thanks

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-11 Thread Peter Zijlstra
On Mon, Jan 04, 2021 at 06:26:42PM +, Qais Yousef wrote:

> So I have a proper patch for that now, that actually turned out to be really
> tiny once you untangle exactly what is missing.
> 
> Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> that?

In order to use these you need to rely on BTF to get anything useful
done right? And anything that relies on BTF cannot be ABI.


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-07 Thread Qais Yousef
On 01/06/21 15:42, Andrii Nakryiko wrote:
> On Wed, Jan 6, 2021 at 3:27 AM Qais Yousef  wrote:
> >
> > On 01/05/21 08:44, Alexei Starovoitov wrote:
> > > > Any pointer to an example test I could base this on?
> > >
> > > selftests/bpf/
> >
> > I was hoping for something more elaborate. I thought there's something 
> > already
> > there that do some verification for raw tracepoint that I could either 
> > extend
> > or replicate. Otherwise this could end up being a time sink for me and I'm 
> > not
> > keen on jumping down this rabbit hole.
> 
> One way would be to add either another custom tracepoint definition to
> a test module or modify the existing one to be a bare tracepoint. See
> links below.
> 
> If it's easy to trigger those tracepoints from user-space on demand,
> writing a similar (to module_attach) selftest for in-kernel tracepoint
> is trivial.
> 
>   [0] 
> https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
>   [1] 
> https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_module_attach.c#L12-L18
>   [2] 
> https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/module_attach.c

Thanks a lot Andrii. That will make it much easier to figure out something.

Cheers

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-06 Thread Andrii Nakryiko
On Wed, Jan 6, 2021 at 3:27 AM Qais Yousef  wrote:
>
> On 01/05/21 08:44, Alexei Starovoitov wrote:
> > > Any pointer to an example test I could base this on?
> >
> > selftests/bpf/
>
> I was hoping for something more elaborate. I thought there's something already
> there that do some verification for raw tracepoint that I could either extend
> or replicate. Otherwise this could end up being a time sink for me and I'm not
> keen on jumping down this rabbit hole.

One way would be to add either another custom tracepoint definition to
a test module or modify the existing one to be a bare tracepoint. See
links below.

If it's easy to trigger those tracepoints from user-space on demand,
writing a similar (to module_attach) selftest for in-kernel tracepoint
is trivial.

  [0] 
https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
  [1] 
https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_module_attach.c#L12-L18
  [2] 
https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/module_attach.c

>
> > > > - add a doc with contents from commit log.
> > >
> > > You're referring to the ABI part of the changelog, right?
> > >
> > > > The "Does bpf make things into an abi ?" question keeps coming back
> > > > over and over again.
> > > > Everytime we have the same answer that No, bpf cannot bake things into 
> > > > abi.
> > > > I think once it's spelled out somewhere in Documentation/ it would be 
> > > > easier to
> > > > repeat this message.
> > >
> > > How about a new Documentation/bpf/ABI.rst? I can write something up 
> > > initially
> > > for us to discuss in detail when I post.
> >
> > There is Documentation/bpf/bpf_design_QA.rst
> > and we already have this text in there that was added back in 2017:
> >
> > Q: Does BPF have a stable ABI?
> > --
> > A: YES. BPF instructions, arguments to BPF programs, set of helper
> > functions and their arguments, recognized return codes are all part
> > of ABI. However there is one specific exception to tracing programs
> > which are using helpers like bpf_probe_read() to walk kernel internal
> > data structures and compile with kernel internal headers. Both of these
> > kernel internals are subject to change and can break with newer kernels
> > such that the program needs to be adapted accordingly.
> >
> > I'm suggesting to add an additional section to this Q/A doc to include
> > more or less
> > the same text you had in the commit log.
>
> Works for me.
>
> Thanks
>
> --
> Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-06 Thread Qais Yousef
On 01/05/21 08:44, Alexei Starovoitov wrote:
> > Any pointer to an example test I could base this on?
> 
> selftests/bpf/

I was hoping for something more elaborate. I thought there's something already
there that do some verification for raw tracepoint that I could either extend
or replicate. Otherwise this could end up being a time sink for me and I'm not
keen on jumping down this rabbit hole.

> > > - add a doc with contents from commit log.
> >
> > You're referring to the ABI part of the changelog, right?
> >
> > > The "Does bpf make things into an abi ?" question keeps coming back
> > > over and over again.
> > > Everytime we have the same answer that No, bpf cannot bake things into 
> > > abi.
> > > I think once it's spelled out somewhere in Documentation/ it would be 
> > > easier to
> > > repeat this message.
> >
> > How about a new Documentation/bpf/ABI.rst? I can write something up 
> > initially
> > for us to discuss in detail when I post.
> 
> There is Documentation/bpf/bpf_design_QA.rst
> and we already have this text in there that was added back in 2017:
> 
> Q: Does BPF have a stable ABI?
> --
> A: YES. BPF instructions, arguments to BPF programs, set of helper
> functions and their arguments, recognized return codes are all part
> of ABI. However there is one specific exception to tracing programs
> which are using helpers like bpf_probe_read() to walk kernel internal
> data structures and compile with kernel internal headers. Both of these
> kernel internals are subject to change and can break with newer kernels
> such that the program needs to be adapted accordingly.
> 
> I'm suggesting to add an additional section to this Q/A doc to include
> more or less
> the same text you had in the commit log.

Works for me.

Thanks

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-05 Thread Alexei Starovoitov
On Tue, Jan 5, 2021 at 3:39 AM Qais Yousef  wrote:
>
> On 01/04/21 10:59, Alexei Starovoitov wrote:
> > > > > I did have a patch that allowed that. It might be worth trying to 
> > > > > upstream it.
> > > > > It just required a new macro which could be problematic.
> > > > >
> > > > > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > > > >
> > > > > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> > > > >
> > > >
> > > > Yeah, that could work. I meant there was no way to do it with what was 
> > > > there :)
> > > >
> > > > In our initial attempts at using BPF to get at nr_running (which I was 
> > > > not
> > > > involved in and don't have all the details...) there were issues being 
> > > > able to
> > > > keep up and losing events.  That may have been an implementation issue, 
> > > > but
> > > > using the module and trace-cmd doesn't have that problem. Hopefully you 
> > > > don't
> > > > see that using RAW_TRACEPOINTs.
> > >
> > > So I have a proper patch for that now, that actually turned out to be 
> > > really
> > > tiny once you untangle exactly what is missing.
> > >
> > > Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns 
> > > about
> > > that?
> > >
> > > Thanks
> > >
> > > --
> > > Qais Yousef
> > >
> > > -->8--
> > >
> > > From cf81de8c7db03d62730939aa902579339e2fc859 Mon Sep 17 00:00:00 2001
> > > From: Qais Yousef 
> > > Date: Wed, 30 Dec 2020 22:20:34 +
> > > Subject: [PATCH] trace: bpf: Allow bpf to attach to bare tracepoints
> > >
> > > Some subsystems only have bare tracepoints (a tracepoint with no
> > > associated trace event) to avoid the problem of trace events being an
> > > ABI that can't be changed.
> > >
> > > From bpf presepective, bare tracepoints are what it calls
> > > RAW_TRACEPOINT().
> > >
> > > Since bpf assumed there's 1:1 mapping, it relied on hooking to
> > > DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
> > > bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
> > > no knowledge about their existence.
> > >
> > > By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
> > > DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.
> > >
> > > Enabling that comes with the contract that changes to raw tracepoints
> > > don't constitute a regression if they break existing bpf programs.
> > > We need the ability to continue to morph and modify these raw
> > > tracepoints without worrying about any ABI.
> > >
> > > Signed-off-by: Qais Yousef 
> > > ---
> > >  include/trace/bpf_probe.h | 12 ++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > index cd74bffed5c6..a23be89119aa 100644
> > > --- a/include/trace/bpf_probe.h
> > > +++ b/include/trace/bpf_probe.h
> > > @@ -55,8 +55,7 @@
> > >  /* tracepoints with more than 12 arguments will hit build error */
> > >  #define CAST_TO_U64(...) CONCATENATE(__CAST, 
> > > COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > >
> > > -#undef DECLARE_EVENT_CLASS
> > > -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> > > +#define __BPF_DECLARE_TRACE(call, proto, args) \
> > >  static notrace void\
> > >  __bpf_trace_##call(void *__data, proto)  
> > >   \
> > >  {  \
> > > @@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto)
> > >   \
> > > CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, 
> > > CAST_TO_U64(args));  \
> > >  }
> > >
> > > +#undef DECLARE_EVENT_CLASS
> > > +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> > > +   __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
> > > +
> > >  /*
> > >   * This part is compiled out, it is only here as a build time check
> > >   * to make sure that if the tracepoint handling changes, the
> > > @@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), 
> > > PARAMS(args), size)
> > >  #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
> > > DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
> > >
> > > +#undef DECLARE_TRACE
> > > +#define DECLARE_TRACE(call, proto, args)   \
> > > +   __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))  \
> > > +   __DEFINE_EVENT(call, call, PARAMS(proto), PARAMS(args), 0)
> > > +
> > >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> >
> > The patch looks fine to me.
> > Please add a few things:
> > - selftests to make sure it gets routinely tested with bpf CI.
>
> Any pointer to an example test I could base this on?

selftests/bpf/

> > - add a doc with contents from commit log.
>
> You're referring to the ABI part of the changelog, right?
>
> > 

Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-05 Thread Qais Yousef
On 01/04/21 10:59, Alexei Starovoitov wrote:
> > > > I did have a patch that allowed that. It might be worth trying to 
> > > > upstream it.
> > > > It just required a new macro which could be problematic.
> > > >
> > > > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > > >
> > > > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> > > >
> > >
> > > Yeah, that could work. I meant there was no way to do it with what was 
> > > there :)
> > >
> > > In our initial attempts at using BPF to get at nr_running (which I was not
> > > involved in and don't have all the details...) there were issues being 
> > > able to
> > > keep up and losing events.  That may have been an implementation issue, 
> > > but
> > > using the module and trace-cmd doesn't have that problem. Hopefully you 
> > > don't
> > > see that using RAW_TRACEPOINTs.
> >
> > So I have a proper patch for that now, that actually turned out to be really
> > tiny once you untangle exactly what is missing.
> >
> > Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> > that?
> >
> > Thanks
> >
> > --
> > Qais Yousef
> >
> > -->8--
> >
> > From cf81de8c7db03d62730939aa902579339e2fc859 Mon Sep 17 00:00:00 2001
> > From: Qais Yousef 
> > Date: Wed, 30 Dec 2020 22:20:34 +
> > Subject: [PATCH] trace: bpf: Allow bpf to attach to bare tracepoints
> >
> > Some subsystems only have bare tracepoints (a tracepoint with no
> > associated trace event) to avoid the problem of trace events being an
> > ABI that can't be changed.
> >
> > From bpf presepective, bare tracepoints are what it calls
> > RAW_TRACEPOINT().
> >
> > Since bpf assumed there's 1:1 mapping, it relied on hooking to
> > DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
> > bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
> > no knowledge about their existence.
> >
> > By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
> > DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.
> >
> > Enabling that comes with the contract that changes to raw tracepoints
> > don't constitute a regression if they break existing bpf programs.
> > We need the ability to continue to morph and modify these raw
> > tracepoints without worrying about any ABI.
> >
> > Signed-off-by: Qais Yousef 
> > ---
> >  include/trace/bpf_probe.h | 12 ++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > index cd74bffed5c6..a23be89119aa 100644
> > --- a/include/trace/bpf_probe.h
> > +++ b/include/trace/bpf_probe.h
> > @@ -55,8 +55,7 @@
> >  /* tracepoints with more than 12 arguments will hit build error */
> >  #define CAST_TO_U64(...) CONCATENATE(__CAST, 
> > COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> >
> > -#undef DECLARE_EVENT_CLASS
> > -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> > +#define __BPF_DECLARE_TRACE(call, proto, args) \
> >  static notrace void\
> >  __bpf_trace_##call(void *__data, proto)
> > \
> >  {  \
> > @@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto)  
> > \
> > CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, 
> > CAST_TO_U64(args));  \
> >  }
> >
> > +#undef DECLARE_EVENT_CLASS
> > +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> > +   __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
> > +
> >  /*
> >   * This part is compiled out, it is only here as a build time check
> >   * to make sure that if the tracepoint handling changes, the
> > @@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), 
> > PARAMS(args), size)
> >  #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
> > DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
> >
> > +#undef DECLARE_TRACE
> > +#define DECLARE_TRACE(call, proto, args)   \
> > +   __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))  \
> > +   __DEFINE_EVENT(call, call, PARAMS(proto), PARAMS(args), 0)
> > +
> >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> 
> The patch looks fine to me.
> Please add a few things:
> - selftests to make sure it gets routinely tested with bpf CI.

Any pointer to an example test I could base this on?

> - add a doc with contents from commit log.

You're referring to the ABI part of the changelog, right?

> The "Does bpf make things into an abi ?" question keeps coming back
> over and over again.
> Everytime we have the same answer that No, bpf cannot bake things into abi.
> I think once it's spelled out somewhere in Documentation/ it would be easier 
> to
> repeat this message.

How about a new Documentation/bpf/ABI.rst? 

Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-04 Thread Alexei Starovoitov
On Mon, Jan 4, 2021 at 10:29 AM Qais Yousef  wrote:
>
> On 09/08/20 09:19, Phil Auld wrote:
> > Hi Quais,
> >
> > On Mon, Sep 07, 2020 at 12:02:24PM +0100 Qais Yousef wrote:
> > > On 09/02/20 09:54, Phil Auld wrote:
> > > > >
> > > > > I think this decoupling is not necessary. The natural place for those
> > > > > scheduler trace_event based on trace_points extension files is
> > > > > kernel/sched/ and here the internal sched.h can just be included.
> > > > >
> > > > > If someone really wants to build this as an out-of-tree module there 
> > > > > is
> > > > > an easy way to make kernel/sched/sched.h visible.
> > > > >
> > > >
> > > > It's not so much that we really _want_ to do this in an external module.
> > > > But we aren't adding more trace events and my (limited) knowledge of
> > > > BPF let me to the conclusion that its raw tracepoint functionality
> > > > requires full events.  I didn't see any other way to do it.
> > >
> > > I did have a patch that allowed that. It might be worth trying to 
> > > upstream it.
> > > It just required a new macro which could be problematic.
> > >
> > > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > >
> > > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> > >
> >
> > Yeah, that could work. I meant there was no way to do it with what was 
> > there :)
> >
> > In our initial attempts at using BPF to get at nr_running (which I was not
> > involved in and don't have all the details...) there were issues being able 
> > to
> > keep up and losing events.  That may have been an implementation issue, but
> > using the module and trace-cmd doesn't have that problem. Hopefully you 
> > don't
> > see that using RAW_TRACEPOINTs.
>
> So I have a proper patch for that now, that actually turned out to be really
> tiny once you untangle exactly what is missing.
>
> Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> that?
>
> Thanks
>
> --
> Qais Yousef
>
> -->8--
>
> From cf81de8c7db03d62730939aa902579339e2fc859 Mon Sep 17 00:00:00 2001
> From: Qais Yousef 
> Date: Wed, 30 Dec 2020 22:20:34 +
> Subject: [PATCH] trace: bpf: Allow bpf to attach to bare tracepoints
>
> Some subsystems only have bare tracepoints (a tracepoint with no
> associated trace event) to avoid the problem of trace events being an
> ABI that can't be changed.
>
> From bpf presepective, bare tracepoints are what it calls
> RAW_TRACEPOINT().
>
> Since bpf assumed there's 1:1 mapping, it relied on hooking to
> DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
> bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
> no knowledge about their existence.
>
> By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
> DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.
>
> Enabling that comes with the contract that changes to raw tracepoints
> don't constitute a regression if they break existing bpf programs.
> We need the ability to continue to morph and modify these raw
> tracepoints without worrying about any ABI.
>
> Signed-off-by: Qais Yousef 
> ---
>  include/trace/bpf_probe.h | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index cd74bffed5c6..a23be89119aa 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -55,8 +55,7 @@
>  /* tracepoints with more than 12 arguments will hit build error */
>  #define CAST_TO_U64(...) CONCATENATE(__CAST, 
> COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
>
> -#undef DECLARE_EVENT_CLASS
> -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> +#define __BPF_DECLARE_TRACE(call, proto, args) \
>  static notrace void\
>  __bpf_trace_##call(void *__data, proto)  
>   \
>  {  \
> @@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto)
>   \
> CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, 
> CAST_TO_U64(args));  \
>  }
>
> +#undef DECLARE_EVENT_CLASS
> +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> +   __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
> +
>  /*
>   * This part is compiled out, it is only here as a build time check
>   * to make sure that if the tracepoint handling changes, the
> @@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), 
> PARAMS(args), size)
>  #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
> DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
>
> +#undef DECLARE_TRACE
> +#define DECLARE_TRACE(call, proto, args)   \
> +   __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))  \
> +   __DEFINE_EVENT(call, call, 

Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-04 Thread Qais Yousef
On 09/08/20 09:19, Phil Auld wrote:
> Hi Quais,
> 
> On Mon, Sep 07, 2020 at 12:02:24PM +0100 Qais Yousef wrote:
> > On 09/02/20 09:54, Phil Auld wrote:
> > > > 
> > > > I think this decoupling is not necessary. The natural place for those
> > > > scheduler trace_event based on trace_points extension files is
> > > > kernel/sched/ and here the internal sched.h can just be included.
> > > > 
> > > > If someone really wants to build this as an out-of-tree module there is
> > > > an easy way to make kernel/sched/sched.h visible.
> > > >
> > > 
> > > It's not so much that we really _want_ to do this in an external module.
> > > But we aren't adding more trace events and my (limited) knowledge of
> > > BPF let me to the conclusion that its raw tracepoint functionality
> > > requires full events.  I didn't see any other way to do it.
> > 
> > I did have a patch that allowed that. It might be worth trying to upstream 
> > it.
> > It just required a new macro which could be problematic.
> > 
> > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > 
> > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> >
> 
> Yeah, that could work. I meant there was no way to do it with what was there 
> :)
> 
> In our initial attempts at using BPF to get at nr_running (which I was not
> involved in and don't have all the details...) there were issues being able to
> keep up and losing events.  That may have been an implementation issue, but
> using the module and trace-cmd doesn't have that problem. Hopefully you don't
> see that using RAW_TRACEPOINTs.

So I have a proper patch for that now, that actually turned out to be really
tiny once you untangle exactly what is missing.

Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
that?

Thanks

--
Qais Yousef

-->8--

>From cf81de8c7db03d62730939aa902579339e2fc859 Mon Sep 17 00:00:00 2001
From: Qais Yousef 
Date: Wed, 30 Dec 2020 22:20:34 +
Subject: [PATCH] trace: bpf: Allow bpf to attach to bare tracepoints

Some subsystems only have bare tracepoints (a tracepoint with no
associated trace event) to avoid the problem of trace events being an
ABI that can't be changed.

>From bpf presepective, bare tracepoints are what it calls
RAW_TRACEPOINT().

Since bpf assumed there's 1:1 mapping, it relied on hooking to
DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
no knowledge about their existence.

By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.

Enabling that comes with the contract that changes to raw tracepoints
don't constitute a regression if they break existing bpf programs.
We need the ability to continue to morph and modify these raw
tracepoints without worrying about any ABI.

Signed-off-by: Qais Yousef 
---
 include/trace/bpf_probe.h | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index cd74bffed5c6..a23be89119aa 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -55,8 +55,7 @@
 /* tracepoints with more than 12 arguments will hit build error */
 #define CAST_TO_U64(...) CONCATENATE(__CAST, 
COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
 
-#undef DECLARE_EVENT_CLASS
-#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+#define __BPF_DECLARE_TRACE(call, proto, args) \
 static notrace void\
 __bpf_trace_##call(void *__data, proto)
\
 {  \
@@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto)  
\
CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  
\
 }
 
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+   __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
+
 /*
  * This part is compiled out, it is only here as a build time check
  * to make sure that if the tracepoint handling changes, the
@@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), 
PARAMS(args), size)
 #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
 
+#undef DECLARE_TRACE
+#define DECLARE_TRACE(call, proto, args)   \
+   __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))  \
+   __DEFINE_EVENT(call, call, PARAMS(proto), PARAMS(args), 0)
+
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
 #undef DEFINE_EVENT_WRITABLE
-- 
2.25.1



Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2021-01-04 Thread Qais Yousef
On 09/07/20 13:13, pet...@infradead.org wrote:
> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> > IMHO the above is a hack. Out-of-tree modules should rely on public headers 
> > and
> > exported functions only. What you propose means that people who want to use
> > these tracepoints in meaningful way must have a prebuilt kernel handy. 
> > Which is
> > maybe true for us who work in the embedded world. But users who run normal
> > distro kernels (desktop/servers) will fail to build against
> 
> But this isn't really aimed at regular users. We're aiming this at
> developers (IIUC) so I dont really see this as a problem.
> 
> > FWIW, I did raise this concern with Peter in 2019 OSPM and he was okay with 
> > the
> > exports as it's still not a contract and they can disappear anytime we want.
> > Migrating to using BTF is the right way forward IMO. I don't think what we 
> > have
> > here is out-of-control yet. Though I agree they're annoying.
> 
> Right, we're hiding behind the explicit lack of ABI for modules.
> 
> Anyway, CTF/BTF/random other crap that isn't DWARFs should work fine to
> replace all this muck. Just no idea what the state of any of that is.

So I have updated my reference module to remove the dependency on those
functions:


https://github.com/qais-yousef/tracepoints-helpers/tree/pelt-tps-v3-create-events/sched_tp

When building against a prebuilt kernel tree, I use pahole + vmlinux to generate
vmlinux.h with all the internal struct we depend on. The kernel must be
compiled with CONFIG_DEBUG_INFO for pahole to work.

When building against a running kernel (ie: exported headers only available),
we try to use /sys/kernel/btf/vmlinux to generate vmlinux.h.

One outstanding issue is that pahole + BTF don't generate alignment info so
while the module compiles but I get a crash when I load the module and enable
one of the tracepoints. bpftool generates alignment info with BTF, but it dumps
everything which leads to another set of problems.

Barring fixing the BTF issue which I'm talking with Arnaldo about, I think we
should be in good position to remove these exported functions soon.

In the module we have to maintain helper functions (see sched_tp_helpers.h) but
I think that's much easier than maintaining out of tree copy of the structs.

It also enabled me to add uclamp trace events which had dependency on internal
details that I wasn't keen on exporting in mainline.

Is this a good/better compromise?

Thanks

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-08 Thread Qais Yousef
On 09/08/20 09:19, Phil Auld wrote:
> Hi Quais,
> 
> On Mon, Sep 07, 2020 at 12:02:24PM +0100 Qais Yousef wrote:
> > On 09/02/20 09:54, Phil Auld wrote:
> > > > 
> > > > I think this decoupling is not necessary. The natural place for those
> > > > scheduler trace_event based on trace_points extension files is
> > > > kernel/sched/ and here the internal sched.h can just be included.
> > > > 
> > > > If someone really wants to build this as an out-of-tree module there is
> > > > an easy way to make kernel/sched/sched.h visible.
> > > >
> > > 
> > > It's not so much that we really _want_ to do this in an external module.
> > > But we aren't adding more trace events and my (limited) knowledge of
> > > BPF let me to the conclusion that its raw tracepoint functionality
> > > requires full events.  I didn't see any other way to do it.
> > 
> > I did have a patch that allowed that. It might be worth trying to upstream 
> > it.
> > It just required a new macro which could be problematic.
> > 
> > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > 
> > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> >
> 
> Yeah, that could work. I meant there was no way to do it with what was there 
> :)
> 
> In our initial attempts at using BPF to get at nr_running (which I was not
> involved in and don't have all the details...) there were issues being able to
> keep up and losing events.  That may have been an implementation issue, but
> using the module and trace-cmd doesn't have that problem. Hopefully you don't
> see that using RAW_TRACEPOINTs.

I haven't played with that since then tbh. I use BPF every now and then, but
rather simplistically. So if I encoutnered such a problem I wouldn't have
noticed.

> Fwiw, I don't think these little helper routines are all that hard to 
> maintain.
> If something changes in those fields, which seems moderately unlikely at least
> for many of them, the compiler will complain.
> 
> And I agree with you about preferring to use the public headers for the 
> module.
> I think we can work around it though, if needed.

+1

Cheers

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-08 Thread Qais Yousef
On 09/08/20 13:17, Dietmar Eggemann wrote:
> On 07/09/2020 16:51, Qais Yousef wrote:
> > On 09/07/20 13:13, pet...@infradead.org wrote:
> >> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> >>> IMHO the above is a hack. Out-of-tree modules should rely on public 
> >>> headers and
> >>> exported functions only. What you propose means that people who want to 
> >>> use
> >>> these tracepoints in meaningful way must have a prebuilt kernel handy. 
> >>> Which is
> >>> maybe true for us who work in the embedded world. But users who run normal
> >>> distro kernels (desktop/servers) will fail to build against
> >>
> >> But this isn't really aimed at regular users. We're aiming this at
> >> developers (IIUC) so I dont really see this as a problem.
> 
> This is what I thought as well. All these helpers can be coded directly
> in these tracepoint-2-traceevent (tp-2-te) converters. As long as they
> are build from within kernel/sched/ there is no issue with the export
> via kernel/sched/sched.h. Otherwise this little trick would be necessary.
> But since it is a tool for developers I guess we can assume that they
> can build it from within kernel/sched/.

I think this will reduce the usefulness of these tracepoints. But if you really
want to remove them, I am certainly not strongly attached to them and they were
meant to be removable anyway. So fine by me :-)

Cheers

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-08 Thread Dietmar Eggemann
On 08/09/2020 17:17, Qais Yousef wrote:
> On 09/08/20 13:17, Dietmar Eggemann wrote:
>> On 07/09/2020 16:51, Qais Yousef wrote:
>>> On 09/07/20 13:13, pet...@infradead.org wrote:
 On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> IMHO the above is a hack. Out-of-tree modules should rely on public 
> headers and
> exported functions only. What you propose means that people who want to 
> use
> these tracepoints in meaningful way must have a prebuilt kernel handy. 
> Which is
> maybe true for us who work in the embedded world. But users who run normal
> distro kernels (desktop/servers) will fail to build against

 But this isn't really aimed at regular users. We're aiming this at
 developers (IIUC) so I dont really see this as a problem.
>>
>> This is what I thought as well. All these helpers can be coded directly
>> in these tracepoint-2-traceevent (tp-2-te) converters. As long as they
>> are build from within kernel/sched/ there is no issue with the export
>> via kernel/sched/sched.h. Otherwise this little trick would be necessary.
>> But since it is a tool for developers I guess we can assume that they
>> can build it from within kernel/sched/.
> 
> I think this will reduce the usefulness of these tracepoints. But if you 
> really
> want to remove them, I am certainly not strongly attached to them and they 
> were
> meant to be removable anyway. So fine by me :-)

I would like to see them go. Less stuff to maintain. And as we see with
the new cpu_capacity tp there are always more helper functions coming.

IMHO, the ability to build those modules via public headers is less
important since they are meant for developers.


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-08 Thread Phil Auld
Hi Quais,

On Mon, Sep 07, 2020 at 12:02:24PM +0100 Qais Yousef wrote:
> On 09/02/20 09:54, Phil Auld wrote:
> > > 
> > > I think this decoupling is not necessary. The natural place for those
> > > scheduler trace_event based on trace_points extension files is
> > > kernel/sched/ and here the internal sched.h can just be included.
> > > 
> > > If someone really wants to build this as an out-of-tree module there is
> > > an easy way to make kernel/sched/sched.h visible.
> > >
> > 
> > It's not so much that we really _want_ to do this in an external module.
> > But we aren't adding more trace events and my (limited) knowledge of
> > BPF let me to the conclusion that its raw tracepoint functionality
> > requires full events.  I didn't see any other way to do it.
> 
> I did have a patch that allowed that. It might be worth trying to upstream it.
> It just required a new macro which could be problematic.
> 
> https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> 
> With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
>

Yeah, that could work. I meant there was no way to do it with what was there :)

In our initial attempts at using BPF to get at nr_running (which I was not
involved in and don't have all the details...) there were issues being able to
keep up and losing events.  That may have been an implementation issue, but
using the module and trace-cmd doesn't have that problem. Hopefully you don't
see that using RAW_TRACEPOINTs.


Fwiw, I don't think these little helper routines are all that hard to maintain.
If something changes in those fields, which seems moderately unlikely at least
for many of them, the compiler will complain.

And I agree with you about preferring to use the public headers for the module.
I think we can work around it though, if needed.


Cheers,
Phil

> Cheers
> 
> --
> Qais Yousef
> 

-- 



Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-08 Thread Dietmar Eggemann
On 07/09/2020 16:51, Qais Yousef wrote:
> On 09/07/20 13:13, pet...@infradead.org wrote:
>> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
>>> IMHO the above is a hack. Out-of-tree modules should rely on public headers 
>>> and
>>> exported functions only. What you propose means that people who want to use
>>> these tracepoints in meaningful way must have a prebuilt kernel handy. 
>>> Which is
>>> maybe true for us who work in the embedded world. But users who run normal
>>> distro kernels (desktop/servers) will fail to build against
>>
>> But this isn't really aimed at regular users. We're aiming this at
>> developers (IIUC) so I dont really see this as a problem.

This is what I thought as well. All these helpers can be coded directly
in these tracepoint-2-traceevent (tp-2-te) converters. As long as they
are build from within kernel/sched/ there is no issue with the export
via kernel/sched/sched.h. Otherwise this little trick would be necessary.
But since it is a tool for developers I guess we can assume that they
can build it from within kernel/sched/.

I tested:

https://lkml.kernel.org/r/20200907091717.26116-1-dietmar.eggem...@arm.com

with our EAS integration which provides one of these tp-2-t2 converter
(sched_tp.c).

http://linux-arm.org/git?p=linux-power.git;a=shortlog;h=refs/heads/topic/tracepoints

[...]


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-07 Thread Qais Yousef
On 09/02/20 09:54, Phil Auld wrote:
> > 
> > I think this decoupling is not necessary. The natural place for those
> > scheduler trace_event based on trace_points extension files is
> > kernel/sched/ and here the internal sched.h can just be included.
> > 
> > If someone really wants to build this as an out-of-tree module there is
> > an easy way to make kernel/sched/sched.h visible.
> >
> 
> It's not so much that we really _want_ to do this in an external module.
> But we aren't adding more trace events and my (limited) knowledge of
> BPF let me to the conclusion that its raw tracepoint functionality
> requires full events.  I didn't see any other way to do it.

I did have a patch that allowed that. It might be worth trying to upstream it.
It just required a new macro which could be problematic.

https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b

With the above I could attach using bpf::RAW_TRACEPOINT mechanism.

Cheers

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-07 Thread peterz
On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> IMHO the above is a hack. Out-of-tree modules should rely on public headers 
> and
> exported functions only. What you propose means that people who want to use
> these tracepoints in meaningful way must have a prebuilt kernel handy. Which 
> is
> maybe true for us who work in the embedded world. But users who run normal
> distro kernels (desktop/servers) will fail to build against

But this isn't really aimed at regular users. We're aiming this at
developers (IIUC) so I dont really see this as a problem.

> FWIW, I did raise this concern with Peter in 2019 OSPM and he was okay with 
> the
> exports as it's still not a contract and they can disappear anytime we want.
> Migrating to using BTF is the right way forward IMO. I don't think what we 
> have
> here is out-of-control yet. Though I agree they're annoying.

Right, we're hiding behind the explicit lack of ABI for modules.

Anyway, CTF/BTF/random other crap that isn't DWARFs should work fine to
replace all this muck. Just no idea what the state of any of that is.


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-07 Thread Qais Yousef
On 09/07/20 13:13, pet...@infradead.org wrote:
> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> > IMHO the above is a hack. Out-of-tree modules should rely on public headers 
> > and
> > exported functions only. What you propose means that people who want to use
> > these tracepoints in meaningful way must have a prebuilt kernel handy. 
> > Which is
> > maybe true for us who work in the embedded world. But users who run normal
> > distro kernels (desktop/servers) will fail to build against
> 
> But this isn't really aimed at regular users. We're aiming this at
> developers (IIUC) so I dont really see this as a problem.
> 
> > FWIW, I did raise this concern with Peter in 2019 OSPM and he was okay with 
> > the
> > exports as it's still not a contract and they can disappear anytime we want.
> > Migrating to using BTF is the right way forward IMO. I don't think what we 
> > have
> > here is out-of-control yet. Though I agree they're annoying.
> 
> Right, we're hiding behind the explicit lack of ABI for modules.
> 
> Anyway, CTF/BTF/random other crap that isn't DWARFs should work fine to
> replace all this muck. Just no idea what the state of any of that is.

So I was thinking of having a function that allows a module to read member of
struct rq (or any struct for that matters), but I think that's the harder
(though neater) way around.

Just compiled a kernel with CONFIG_DEBUG_INFO_BTF_INFO; and doing

$ pahole rq
struct rq {
raw_spinlock_t lock; /* 0 4 */
unsigned int   nr_running;   /* 4 4 */
long unsigned int  last_blocked_load_update_tick; /* 8 
8 */
unsigned int   has_blocked_load; /*16 4 */

/* XXX 12 bytes hole, try to pack */

call_single_data_t nohz_csd; /*3232 */
/* --- cacheline 1 boundary (64 bytes) --- */
unsigned int   nohz_tick_stopped;/*64 4 */
atomic_t   nohz_flags;   /*68 4 */
unsigned int   ttwu_pending; /*72 4 */

/* XXX 4 bytes hole, try to pack */

u64nr_switches;  /*80 8 */
.
.
.
}

dumps the struct rq {...}; which means one can easily use that to autogenerate
a header containing the structs they care about accessing for their running
kernel.

pahole automatically knows how to find /sys/kernel/btf/vmlinux to parse the
debug info btw.

The only caveat is that one has to recompile the module for each running
kernel; but that's acceptable I think. Not sure how many allow loading a module
that's not compiled for that particular kernel version anyway.

Note to try this you'll need pahole v1.16 or newer. And compiling pahole on
Ubuntu is a pain. I had to create a fedora docker image to compile it in.

So I think we have this already solved. Though not sure how to document it..

Thanks

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-07 Thread Qais Yousef
Hi Dietmar

On 09/02/20 12:44, Dietmar Eggemann wrote:
> + Phil Auld 
> 
> On 28/08/2020 19:26, Qais Yousef wrote:
> > On 08/28/20 19:10, Dietmar Eggemann wrote:
> >> On 28/08/2020 12:27, Qais Yousef wrote:
> >>> On 08/28/20 10:00, vincent.donnef...@arm.com wrote:
>  From: Vincent Donnefort 
> 
> [...]
> 
> >> Can you remind me why we have all these helper functions like
> >> sched_trace_rq_cpu_capacity?
> > 
> > struct rq is defined in kernel/sched/sched.h. It's not exported. Exporting
> > these helper functions was the agreement to help modules trace internal 
> > info.
> > By passing generic info you decouple the tracepoint from giving specific 
> > info
> > and allow the modules to extract all the info they need from the same
> > tracepoint. IE: if you need more than just cpu_capacity from this 
> > tracepoint,
> > you can get that without having to continuously add extra arguments 
> > everytime
> > you need an extra piece of info. Unless this info is not in the rq of 
> > course.
> 
> I think this decoupling is not necessary. The natural place for those
> scheduler trace_event based on trace_points extension files is
> kernel/sched/ and here the internal sched.h can just be included.
> 
> If someone really wants to build this as an out-of-tree module there is
> an easy way to make kernel/sched/sched.h visible.
> 
> CFLAGS_sched_tp.o := -I$KERNEL_SRC/kernel/sched
> 
> all:
> make -C $KERNEL_SRC M=$(PWD) modules

Sorry for the late response. Was on holiday.

IMHO the above is a hack. Out-of-tree modules should rely on public headers and
exported functions only. What you propose means that people who want to use
these tracepoints in meaningful way must have a prebuilt kernel handy. Which is
maybe true for us who work in the embedded world. But users who run normal
distro kernels (desktop/servers) will fail to build against
`/lib/modules/$(uname -r)/build` where that internal header file is not
exported. IOW, we're putting extra hoops for a large class of users here to be
able to access these internal data. They have to maintain their out-of-tree
definition of these structures.

FWIW, I did raise this concern with Peter in 2019 OSPM and he was okay with the
exports as it's still not a contract and they can disappear anytime we want.
Migrating to using BTF is the right way forward IMO. I don't think what we have
here is out-of-control yet. Though I agree they're annoying.

Cheers

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-02 Thread Phil Auld
On Wed, Sep 02, 2020 at 12:44:42PM +0200 Dietmar Eggemann wrote:
> + Phil Auld 
>

Thanks Dietmar.


> On 28/08/2020 19:26, Qais Yousef wrote:
> > On 08/28/20 19:10, Dietmar Eggemann wrote:
> >> On 28/08/2020 12:27, Qais Yousef wrote:
> >>> On 08/28/20 10:00, vincent.donnef...@arm.com wrote:
>  From: Vincent Donnefort 
> 
> [...]
> 
> >> Can you remind me why we have all these helper functions like
> >> sched_trace_rq_cpu_capacity?
> > 
> > struct rq is defined in kernel/sched/sched.h. It's not exported. Exporting
> > these helper functions was the agreement to help modules trace internal 
> > info.
> > By passing generic info you decouple the tracepoint from giving specific 
> > info
> > and allow the modules to extract all the info they need from the same
> > tracepoint. IE: if you need more than just cpu_capacity from this 
> > tracepoint,
> > you can get that without having to continuously add extra arguments 
> > everytime
> > you need an extra piece of info. Unless this info is not in the rq of 
> > course.
> 
> I think this decoupling is not necessary. The natural place for those
> scheduler trace_event based on trace_points extension files is
> kernel/sched/ and here the internal sched.h can just be included.
> 
> If someone really wants to build this as an out-of-tree module there is
> an easy way to make kernel/sched/sched.h visible.
>

It's not so much that we really _want_ to do this in an external module.
But we aren't adding more trace events and my (limited) knowledge of
BPF let me to the conclusion that its raw tracepoint functionality
requires full events.  I didn't see any other way to do it.

We could put sched_tp in the tree under a debug CONFIG :)

> CFLAGS_sched_tp.o := -I$KERNEL_SRC/kernel/sched
> 
> all:
> make -C $KERNEL_SRC M=$(PWD) modules
> 
> This allowed me to build our trace_event extension module (sched_tp.c,
> sched_events.h) out-of-tree and I was able to get rid of all the
> sched_trace_foo() functions (in fair.c, include/linux/sched.h) and code
> there content directly in foo.c
> 
> There are two things we would need exported from the kernel:
> 
> (1) cfs_rq_tg_path() to print the path of a taskgroup cfs_rq or se.
> 
> (2) sched_uclamp_used so uclamp_rq_util_with() can be used in
> sched_events.h.
> 
> I put Phil Auld on cc because of his trace_point
> sched_update_nr_running_tp. I think Phil was using sched_tp as a base so
> I can't see an issue why we can't also remove sched_trace_rq_nr_running().
>

Our Perf team is now actively using this in downstream, using sched_tp, and
finding it very useful.

But I have no problem if this is all simpler in the kernel tree.

> >> In case we would let the extra code (which transforms trace points into
> >> trace events) know the internals of struct rq we could handle those
> >> things in the TRACE_EVENT and/or the register_trace_##name(void
> >> (*probe)(data_proto), void *data) thing.
> >> We always said when the internal things will change this extra code will
> >> break. So that's not an issue.
> > 
> > The problem is that you need to export struct rq in a public header. Which 
> > we
> > don't want to do. I have been trying to find out how to use BTF so we can
> > remove these functions. Haven't gotten far away yet - but it should be 
> > doable
> > and it's a question of me finding enough time to understand what was 
> > currently
> > done and if I can re-use something or need to come up with extra 
> > infrastructure
> > first.
> 
> Let's keep the footprint of these trace points as small as possible in
> the scheduler code.
> 
> I'm putting the changes I described above in our monthly EAS integration
> right now and when this worked out nicely I will share the patches on lkml.
> 

Sounds good, thanks!


Cheers,
Phil

-- 



Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-09-02 Thread Dietmar Eggemann
+ Phil Auld 

On 28/08/2020 19:26, Qais Yousef wrote:
> On 08/28/20 19:10, Dietmar Eggemann wrote:
>> On 28/08/2020 12:27, Qais Yousef wrote:
>>> On 08/28/20 10:00, vincent.donnef...@arm.com wrote:
 From: Vincent Donnefort 

[...]

>> Can you remind me why we have all these helper functions like
>> sched_trace_rq_cpu_capacity?
> 
> struct rq is defined in kernel/sched/sched.h. It's not exported. Exporting
> these helper functions was the agreement to help modules trace internal info.
> By passing generic info you decouple the tracepoint from giving specific info
> and allow the modules to extract all the info they need from the same
> tracepoint. IE: if you need more than just cpu_capacity from this tracepoint,
> you can get that without having to continuously add extra arguments everytime
> you need an extra piece of info. Unless this info is not in the rq of course.

I think this decoupling is not necessary. The natural place for those
scheduler trace_event based on trace_points extension files is
kernel/sched/ and here the internal sched.h can just be included.

If someone really wants to build this as an out-of-tree module there is
an easy way to make kernel/sched/sched.h visible.

CFLAGS_sched_tp.o := -I$KERNEL_SRC/kernel/sched

all:
make -C $KERNEL_SRC M=$(PWD) modules

This allowed me to build our trace_event extension module (sched_tp.c,
sched_events.h) out-of-tree and I was able to get rid of all the
sched_trace_foo() functions (in fair.c, include/linux/sched.h) and code
there content directly in foo.c

There are two things we would need exported from the kernel:

(1) cfs_rq_tg_path() to print the path of a taskgroup cfs_rq or se.

(2) sched_uclamp_used so uclamp_rq_util_with() can be used in
sched_events.h.

I put Phil Auld on cc because of his trace_point
sched_update_nr_running_tp. I think Phil was using sched_tp as a base so
I can't see an issue why we can't also remove sched_trace_rq_nr_running().

>> In case we would let the extra code (which transforms trace points into
>> trace events) know the internals of struct rq we could handle those
>> things in the TRACE_EVENT and/or the register_trace_##name(void
>> (*probe)(data_proto), void *data) thing.
>> We always said when the internal things will change this extra code will
>> break. So that's not an issue.
> 
> The problem is that you need to export struct rq in a public header. Which we
> don't want to do. I have been trying to find out how to use BTF so we can
> remove these functions. Haven't gotten far away yet - but it should be doable
> and it's a question of me finding enough time to understand what was currently
> done and if I can re-use something or need to come up with extra 
> infrastructure
> first.

Let's keep the footprint of these trace points as small as possible in
the scheduler code.

I'm putting the changes I described above in our monthly EAS integration
right now and when this worked out nicely I will share the patches on lkml.


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-08-28 Thread Qais Yousef
On 08/28/20 19:10, Dietmar Eggemann wrote:
> On 28/08/2020 12:27, Qais Yousef wrote:
> > On 08/28/20 10:00, vincent.donnef...@arm.com wrote:
> >> From: Vincent Donnefort 
> >>
> >> rq->cpu_capacity is a key element in several scheduler parts, such as EAS
> >> task placement and load balancing. Tracking this value enables testing
> >> and/or debugging by a toolkit.
> >>
> >> Signed-off-by: Vincent Donnefort 
> >>
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> > 
> > [...]
> > 
> >> +int sched_trace_rq_cpu_capacity(struct rq *rq)
> >> +{
> >> +  return rq ?
> >> +#ifdef CONFIG_SMP
> >> +  rq->cpu_capacity
> >> +#else
> >> +  SCHED_CAPACITY_SCALE
> >> +#endif
> >> +  : -1;
> >> +}
> >> +EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
> >> +
> > 
> > The placement of this #ifdef looks odd to me. But FWIW
> > 
> > Reviewed-by: Qais Yousef 
> 
> Returning -1 for cpu_capacity? It makes sense for sched_trace_rq_cpu()
> but for cpu_capacity?

If rq is NULL you return -1, an error the way I read it. rq is passed as an
argument, so better ensure we handle NULL and not blindly dereference rq and
crash.

> 
> Can you remind me why we have all these helper functions like
> sched_trace_rq_cpu_capacity?

struct rq is defined in kernel/sched/sched.h. It's not exported. Exporting
these helper functions was the agreement to help modules trace internal info.
By passing generic info you decouple the tracepoint from giving specific info
and allow the modules to extract all the info they need from the same
tracepoint. IE: if you need more than just cpu_capacity from this tracepoint,
you can get that without having to continuously add extra arguments everytime
you need an extra piece of info. Unless this info is not in the rq of course.

> 
> In case we would let the extra code (which transforms trace points into
> trace events) know the internals of struct rq we could handle those
> things in the TRACE_EVENT and/or the register_trace_##name(void
> (*probe)(data_proto), void *data) thing.
> We always said when the internal things will change this extra code will
> break. So that's not an issue.

The problem is that you need to export struct rq in a public header. Which we
don't want to do. I have been trying to find out how to use BTF so we can
remove these functions. Haven't gotten far away yet - but it should be doable
and it's a question of me finding enough time to understand what was currently
done and if I can re-use something or need to come up with extra infrastructure
first.

Thanks

--
Qais Yousef


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-08-28 Thread Dietmar Eggemann
On 28/08/2020 12:27, Qais Yousef wrote:
> On 08/28/20 10:00, vincent.donnef...@arm.com wrote:
>> From: Vincent Donnefort 
>>
>> rq->cpu_capacity is a key element in several scheduler parts, such as EAS
>> task placement and load balancing. Tracking this value enables testing
>> and/or debugging by a toolkit.
>>
>> Signed-off-by: Vincent Donnefort 
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
> 
> [...]
> 
>> +int sched_trace_rq_cpu_capacity(struct rq *rq)
>> +{
>> +return rq ?
>> +#ifdef CONFIG_SMP
>> +rq->cpu_capacity
>> +#else
>> +SCHED_CAPACITY_SCALE
>> +#endif
>> +: -1;
>> +}
>> +EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
>> +
> 
> The placement of this #ifdef looks odd to me. But FWIW
> 
> Reviewed-by: Qais Yousef 

Returning -1 for cpu_capacity? It makes sense for sched_trace_rq_cpu()
but for cpu_capacity?

Can you remind me why we have all these helper functions like
sched_trace_rq_cpu_capacity?

In case we would let the extra code (which transforms trace points into
trace events) know the internals of struct rq we could handle those
things in the TRACE_EVENT and/or the register_trace_##name(void
(*probe)(data_proto), void *data) thing.
We always said when the internal things will change this extra code will
break. So that's not an issue.


Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-08-28 Thread Qais Yousef
On 08/28/20 10:00, vincent.donnef...@arm.com wrote:
> From: Vincent Donnefort 
> 
> rq->cpu_capacity is a key element in several scheduler parts, such as EAS
> task placement and load balancing. Tracking this value enables testing
> and/or debugging by a toolkit.
> 
> Signed-off-by: Vincent Donnefort 
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h

[...]

> +int sched_trace_rq_cpu_capacity(struct rq *rq)
> +{
> + return rq ?
> +#ifdef CONFIG_SMP
> + rq->cpu_capacity
> +#else
> + SCHED_CAPACITY_SCALE
> +#endif
> + : -1;
> +}
> +EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
> +

The placement of this #ifdef looks odd to me. But FWIW

Reviewed-by: Qais Yousef 

Cheers

--
Qais Yousef


[PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity

2020-08-28 Thread vincent . donnefort
From: Vincent Donnefort 

rq->cpu_capacity is a key element in several scheduler parts, such as EAS
task placement and load balancing. Tracking this value enables testing
and/or debugging by a toolkit.

Signed-off-by: Vincent Donnefort 

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 021ad7b..7e19d59 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2055,6 +2055,7 @@ const struct sched_avg *sched_trace_rq_avg_dl(struct rq 
*rq);
 const struct sched_avg *sched_trace_rq_avg_irq(struct rq *rq);
 
 int sched_trace_rq_cpu(struct rq *rq);
+int sched_trace_rq_cpu_capacity(struct rq *rq);
 int sched_trace_rq_nr_running(struct rq *rq);
 
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 8ab48b3..f94ddd1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -630,6 +630,10 @@ DECLARE_TRACE(pelt_se_tp,
TP_PROTO(struct sched_entity *se),
TP_ARGS(se));
 
+DECLARE_TRACE(sched_cpu_capacity_tp,
+   TP_PROTO(struct rq *rq),
+   TP_ARGS(rq));
+
 DECLARE_TRACE(sched_overutilized_tp,
TP_PROTO(struct root_domain *rd, bool overutilized),
TP_ARGS(rd, overutilized));
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 06b0a40..e468271 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -36,6 +36,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_rt_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_dl_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_irq_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_se_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_cpu_capacity_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 44f7a0b..27f4392 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8116,6 +8116,8 @@ static void update_cpu_capacity(struct sched_domain *sd, 
int cpu)
capacity = 1;
 
cpu_rq(cpu)->cpu_capacity = capacity;
+   trace_sched_cpu_capacity_tp(cpu_rq(cpu));
+
sdg->sgc->capacity = capacity;
sdg->sgc->min_capacity = capacity;
sdg->sgc->max_capacity = capacity;
@@ -11318,6 +11320,18 @@ int sched_trace_rq_cpu(struct rq *rq)
 }
 EXPORT_SYMBOL_GPL(sched_trace_rq_cpu);
 
+int sched_trace_rq_cpu_capacity(struct rq *rq)
+{
+   return rq ?
+#ifdef CONFIG_SMP
+   rq->cpu_capacity
+#else
+   SCHED_CAPACITY_SCALE
+#endif
+   : -1;
+}
+EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
+
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd)
 {
 #ifdef CONFIG_SMP
-- 
2.7.4