Re: [PATCH v2 bpf-next 1/4] tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE ioctl

2019-08-13 Thread Daniel Xu
On Tue, Aug 13, 2019, at 2:47 PM, Song Liu wrote:
> On Fri, Aug 9, 2019 at 2:48 PM Daniel Xu  wrote:
> >
> > It's useful to know [uk]probe's nmissed and nhit stats. For example with
> > tracing tools, it's important to know when events may have been lost.
> > debugfs currently exposes a control file to get this information, but
> > it is not compatible with probes registered with the perf API.
> >
> > While bpf programs may be able to manually count nhit, there is no way
> > to gather nmissed. In other words, it is currently not possible to
> > retrieve information about FD-based probes.
> >
> > This patch adds a new ioctl that lets users query nmissed (as well as
> > nhit for completeness). We currently only add support for [uk]probes
> > but leave the possibility open for other probes like tracepoint.
> >
> > Signed-off-by: Daniel Xu 
> > ---
> [...]
> 
> > +int perf_uprobe_event_query(struct perf_event *event, void __user *info)
> > +{
> > +   struct perf_event_query_probe __user *uquery = info;
> > +   struct perf_event_query_probe query = {};
> > +   struct trace_event_call *call = event->tp_event;
> > +   struct trace_uprobe *tu = (struct trace_uprobe *)call->data;
> > +   u64 nmissed, nhit;
> > +
> > +   if (!capable(CAP_SYS_ADMIN))
> > +   return -EPERM;
> > +   if (copy_from_user(, uquery, sizeof(query)))
> > +   return -EFAULT;
> > +
> > +   nhit = tu->nhit;
> > +   nmissed = 0;
> 
> Blindly return 0 is a little weird. Maybe return 0x so
> that the user
> can tell this is not a valid 0. Or some other idea?
> 
> Thanks,
> Song
>

My (maybe flawed) understanding is that uprobes cannot really miss the same way
a kprobe can. From skimming the code a little, it seems the main reason kprobes
can miss is when the processing of one kprobe results in hitting another kprobe.
The latter cannot be handled for whatever reason. The same cannot really happen
for uprobes as kernel doesn't call into userspace. That's why I made it 0 (that 
and
the fact I didn't see any accounting for uprobe misses).

cc Srikar who authored the uprobe patches.

Srikar, do you mind clarifying if uprobes can miss?

Thanks,
Daniel


Re: [PATCH v2 bpf-next 1/4] tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE ioctl

2019-08-13 Thread Song Liu
On Fri, Aug 9, 2019 at 2:48 PM Daniel Xu  wrote:
>
> It's useful to know [uk]probe's nmissed and nhit stats. For example with
> tracing tools, it's important to know when events may have been lost.
> debugfs currently exposes a control file to get this information, but
> it is not compatible with probes registered with the perf API.
>
> While bpf programs may be able to manually count nhit, there is no way
> to gather nmissed. In other words, it is currently not possible to
> retrieve information about FD-based probes.
>
> This patch adds a new ioctl that lets users query nmissed (as well as
> nhit for completeness). We currently only add support for [uk]probes
> but leave the possibility open for other probes like tracepoint.
>
> Signed-off-by: Daniel Xu 
> ---
[...]

> +int perf_uprobe_event_query(struct perf_event *event, void __user *info)
> +{
> +   struct perf_event_query_probe __user *uquery = info;
> +   struct perf_event_query_probe query = {};
> +   struct trace_event_call *call = event->tp_event;
> +   struct trace_uprobe *tu = (struct trace_uprobe *)call->data;
> +   u64 nmissed, nhit;
> +
> +   if (!capable(CAP_SYS_ADMIN))
> +   return -EPERM;
> +   if (copy_from_user(, uquery, sizeof(query)))
> +   return -EFAULT;
> +
> +   nhit = tu->nhit;
> +   nmissed = 0;

Blindly return 0 is a little weird. Maybe return 0x so
that the user
can tell this is not a valid 0. Or some other idea?

Thanks,
Song


Re: [PATCH v2 bpf-next 1/4] tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE ioctl

2019-08-13 Thread Andrii Nakryiko
On Mon, Aug 12, 2019 at 5:39 PM Daniel Xu  wrote:
>
> On Mon, Aug 12, 2019, at 8:57 AM, Andrii Nakryiko wrote:
> > On Fri, Aug 9, 2019 at 2:47 PM Daniel Xu  wrote:
> > >
> > > It's useful to know [uk]probe's nmissed and nhit stats. For example with
> > > tracing tools, it's important to know when events may have been lost.
> > > debugfs currently exposes a control file to get this information, but
> > > it is not compatible with probes registered with the perf API.
> > >
> > > While bpf programs may be able to manually count nhit, there is no way
> > > to gather nmissed. In other words, it is currently not possible to
> > > retrieve information about FD-based probes.
> > >
> > > This patch adds a new ioctl that lets users query nmissed (as well as
> > > nhit for completeness). We currently only add support for [uk]probes
> > > but leave the possibility open for other probes like tracepoint.
> > >
> > > Signed-off-by: Daniel Xu 
> > > ---
> > >  include/linux/trace_events.h| 12 
> > >  include/uapi/linux/perf_event.h | 19 +++
> > >  kernel/events/core.c| 20 
> > >  kernel/trace/trace_kprobe.c | 23 +++
> > >  kernel/trace/trace_uprobe.c | 23 +++
> > >  5 files changed, 97 insertions(+)
> > >
> [...]
> > > +   struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> > > +   u64 nmissed, nhit;
> > > +
> > > +   if (!capable(CAP_SYS_ADMIN))
> > > +   return -EPERM;
> > > +   if (copy_from_user(, uquery, sizeof(query)))

Not sure why we are reading that struct in, if we never use that? With
size as a first argument (see below about compatiblity), I'd also read
just first 4 or 8 bytes only.

> >
> > what about forward/backward compatibility? Didn't you have a size
> > field for perf_event_query_probe?
>
> I initially did, yes. But after thinking about it more, I'm not convinced it
> is necessary. It seems the last change to the debugfs counterpart was in
> the initial comit cd7e7bd5e4, ~10 years ago. I cannot think of any other
> information that would be useful off the top of my head, so I figured it'd
> be best if we didn't make more complicated something that doesn't seem
> likely to change. If we really needed something else, I figured adding
> another ioctl is pretty cheap.
>
> If you (or anyone) feels strongly about adding it back, I can make it a
> u64 so there's no holes.

Debugfs is not stable API, so I guess that matters less. I think we
should support this forward/backward compatibility mechanism that
kernel implements for a lot of other stable APIs.

>
> >
> > > +   return -EFAULT;
> > > +
> > > +   nhit = trace_kprobe_nhit(tk);
> > > +   nmissed = tk->rp.kp.nmissed;
> > > +
> > > +   if (put_user(nmissed, >nmissed) ||
> > > +   put_user(nhit, >nhit))
> >
> > Wouldn't it be nicer to just do one user put for entire struct (or at
> > least relevant part of it with backward/forward compatibility?).
>
> Not sure how that didn't occur to me. Thanks.

Once you add back size field for compatibility, doing it with one call
will make it easier to write only first N requested bytes.


Re: [PATCH v2 bpf-next 1/4] tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE ioctl

2019-08-12 Thread Daniel Xu
On Mon, Aug 12, 2019, at 8:57 AM, Andrii Nakryiko wrote:
> On Fri, Aug 9, 2019 at 2:47 PM Daniel Xu  wrote:
> >
> > It's useful to know [uk]probe's nmissed and nhit stats. For example with
> > tracing tools, it's important to know when events may have been lost.
> > debugfs currently exposes a control file to get this information, but
> > it is not compatible with probes registered with the perf API.
> >
> > While bpf programs may be able to manually count nhit, there is no way
> > to gather nmissed. In other words, it is currently not possible to
> > retrieve information about FD-based probes.
> >
> > This patch adds a new ioctl that lets users query nmissed (as well as
> > nhit for completeness). We currently only add support for [uk]probes
> > but leave the possibility open for other probes like tracepoint.
> >
> > Signed-off-by: Daniel Xu 
> > ---
> >  include/linux/trace_events.h| 12 
> >  include/uapi/linux/perf_event.h | 19 +++
> >  kernel/events/core.c| 20 
> >  kernel/trace/trace_kprobe.c | 23 +++
> >  kernel/trace/trace_uprobe.c | 23 +++
> >  5 files changed, 97 insertions(+)
> >
[...]
> > +   struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> > +   u64 nmissed, nhit;
> > +
> > +   if (!capable(CAP_SYS_ADMIN))
> > +   return -EPERM;
> > +   if (copy_from_user(, uquery, sizeof(query)))
> 
> what about forward/backward compatibility? Didn't you have a size
> field for perf_event_query_probe?

I initially did, yes. But after thinking about it more, I'm not convinced it
is necessary. It seems the last change to the debugfs counterpart was in
the initial comit cd7e7bd5e4, ~10 years ago. I cannot think of any other
information that would be useful off the top of my head, so I figured it'd
be best if we didn't make more complicated something that doesn't seem
likely to change. If we really needed something else, I figured adding
another ioctl is pretty cheap.

If you (or anyone) feels strongly about adding it back, I can make it a
u64 so there's no holes.

> 
> > +   return -EFAULT;
> > +
> > +   nhit = trace_kprobe_nhit(tk);
> > +   nmissed = tk->rp.kp.nmissed;
> > +
> > +   if (put_user(nmissed, >nmissed) ||
> > +   put_user(nhit, >nhit))
> 
> Wouldn't it be nicer to just do one user put for entire struct (or at
> least relevant part of it with backward/forward compatibility?).

Not sure how that didn't occur to me. Thanks.


Re: [PATCH v2 bpf-next 1/4] tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE ioctl

2019-08-12 Thread Andrii Nakryiko
On Fri, Aug 9, 2019 at 2:47 PM Daniel Xu  wrote:
>
> It's useful to know [uk]probe's nmissed and nhit stats. For example with
> tracing tools, it's important to know when events may have been lost.
> debugfs currently exposes a control file to get this information, but
> it is not compatible with probes registered with the perf API.
>
> While bpf programs may be able to manually count nhit, there is no way
> to gather nmissed. In other words, it is currently not possible to
> retrieve information about FD-based probes.
>
> This patch adds a new ioctl that lets users query nmissed (as well as
> nhit for completeness). We currently only add support for [uk]probes
> but leave the possibility open for other probes like tracepoint.
>
> Signed-off-by: Daniel Xu 
> ---
>  include/linux/trace_events.h| 12 
>  include/uapi/linux/perf_event.h | 19 +++
>  kernel/events/core.c| 20 
>  kernel/trace/trace_kprobe.c | 23 +++
>  kernel/trace/trace_uprobe.c | 23 +++
>  5 files changed, 97 insertions(+)
>
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 5150436783e8..61558f19696a 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -586,6 +586,12 @@ extern int bpf_get_kprobe_info(const struct perf_event 
> *event,
>u32 *fd_type, const char **symbol,
>u64 *probe_offset, u64 *probe_addr,
>bool perf_type_tracepoint);
> +extern int perf_kprobe_event_query(struct perf_event *event, void __user 
> *info);
> +#else
> +int perf_kprobe_event_query(struct perf_event *event, void __user *info)
> +{
> +   return -EOPNOTSUPP;
> +}
>  #endif
>  #ifdef CONFIG_UPROBE_EVENTS
>  extern int  perf_uprobe_init(struct perf_event *event,
> @@ -594,6 +600,12 @@ extern void perf_uprobe_destroy(struct perf_event 
> *event);
>  extern int bpf_get_uprobe_info(const struct perf_event *event,
>u32 *fd_type, const char **filename,
>u64 *probe_offset, bool perf_type_tracepoint);
> +extern int perf_uprobe_event_query(struct perf_event *event, void __user 
> *info);
> +#else
> +int perf_uprobe_event_query(struct perf_event *event, void __user *info)
> +{
> +   return -EOPNOTSUPP;
> +}
>  #endif
>  extern int  ftrace_profile_set_filter(struct perf_event *event, int event_id,
>  char *filter_str);
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 7198ddd0c6b1..65faa9b2a3b4 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -447,6 +447,24 @@ struct perf_event_query_bpf {
> __u32   ids[0];
>  };
>
> +/*
> + * Structure used by below PERF_EVENT_IOC_QUERY_PROBE command
> + * to query information about the probe attached to the perf
> + * event. Currently only supports [uk]probes.
> + */
> +struct perf_event_query_probe {
> +   /*
> +* Set by the kernel to indicate number of times this probe
> +* was temporarily disabled
> +*/
> +   __u64   nmissed;
> +   /*
> +* Set by the kernel to indicate number of times this probe
> +* was hit
> +*/
> +   __u64   nhit;
> +};
> +
>  /*
>   * Ioctls that can be done on a perf event fd:
>   */
> @@ -462,6 +480,7 @@ struct perf_event_query_bpf {
>  #define PERF_EVENT_IOC_PAUSE_OUTPUT_IOW('$', 9, __u32)
>  #define PERF_EVENT_IOC_QUERY_BPF   _IOWR('$', 10, struct 
> perf_event_query_bpf *)
>  #define PERF_EVENT_IOC_MODIFY_ATTRIBUTES   _IOW('$', 11, struct 
> perf_event_attr *)
> +#define PERF_EVENT_IOC_QUERY_PROBE _IOR('$', 12, struct 
> perf_event_query_probe *)
>
>  enum perf_event_ioc_flags {
> PERF_IOC_FLAG_GROUP = 1U << 0,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 026a14541a38..3e0fe6eaaad0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5060,6 +5060,8 @@ static int perf_event_set_filter(struct perf_event 
> *event, void __user *arg);
>  static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd);
>  static int perf_copy_attr(struct perf_event_attr __user *uattr,
>   struct perf_event_attr *attr);
> +static int perf_probe_event_query(struct perf_event *event,
> +   void __user *info);
>
>  static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned 
> long arg)
>  {
> @@ -5143,6 +5145,10 @@ static long _perf_ioctl(struct perf_event *event, 
> unsigned int cmd, unsigned lon
>
> return perf_event_modify_attr(event,  _attr);
> }
> +#if defined(CONFIG_KPROBE_EVENTS) || defined(CONFIG_UPROBE_EVENTS)
> +   case PERF_EVENT_IOC_QUERY_PROBE:
> +   return 

[PATCH v2 bpf-next 1/4] tracing/probe: Add PERF_EVENT_IOC_QUERY_PROBE ioctl

2019-08-09 Thread Daniel Xu
It's useful to know [uk]probe's nmissed and nhit stats. For example with
tracing tools, it's important to know when events may have been lost.
debugfs currently exposes a control file to get this information, but
it is not compatible with probes registered with the perf API.

While bpf programs may be able to manually count nhit, there is no way
to gather nmissed. In other words, it is currently not possible to
retrieve information about FD-based probes.

This patch adds a new ioctl that lets users query nmissed (as well as
nhit for completeness). We currently only add support for [uk]probes
but leave the possibility open for other probes like tracepoint.

Signed-off-by: Daniel Xu 
---
 include/linux/trace_events.h| 12 
 include/uapi/linux/perf_event.h | 19 +++
 kernel/events/core.c| 20 
 kernel/trace/trace_kprobe.c | 23 +++
 kernel/trace/trace_uprobe.c | 23 +++
 5 files changed, 97 insertions(+)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 5150436783e8..61558f19696a 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -586,6 +586,12 @@ extern int bpf_get_kprobe_info(const struct perf_event 
*event,
   u32 *fd_type, const char **symbol,
   u64 *probe_offset, u64 *probe_addr,
   bool perf_type_tracepoint);
+extern int perf_kprobe_event_query(struct perf_event *event, void __user 
*info);
+#else
+int perf_kprobe_event_query(struct perf_event *event, void __user *info)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
 extern int  perf_uprobe_init(struct perf_event *event,
@@ -594,6 +600,12 @@ extern void perf_uprobe_destroy(struct perf_event *event);
 extern int bpf_get_uprobe_info(const struct perf_event *event,
   u32 *fd_type, const char **filename,
   u64 *probe_offset, bool perf_type_tracepoint);
+extern int perf_uprobe_event_query(struct perf_event *event, void __user 
*info);
+#else
+int perf_uprobe_event_query(struct perf_event *event, void __user *info)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 extern int  ftrace_profile_set_filter(struct perf_event *event, int event_id,
 char *filter_str);
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7198ddd0c6b1..65faa9b2a3b4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -447,6 +447,24 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
+/*
+ * Structure used by below PERF_EVENT_IOC_QUERY_PROBE command
+ * to query information about the probe attached to the perf
+ * event. Currently only supports [uk]probes.
+ */
+struct perf_event_query_probe {
+   /*
+* Set by the kernel to indicate number of times this probe
+* was temporarily disabled
+*/
+   __u64   nmissed;
+   /*
+* Set by the kernel to indicate number of times this probe
+* was hit
+*/
+   __u64   nhit;
+};
+
 /*
  * Ioctls that can be done on a perf event fd:
  */
@@ -462,6 +480,7 @@ struct perf_event_query_bpf {
 #define PERF_EVENT_IOC_PAUSE_OUTPUT_IOW('$', 9, __u32)
 #define PERF_EVENT_IOC_QUERY_BPF   _IOWR('$', 10, struct 
perf_event_query_bpf *)
 #define PERF_EVENT_IOC_MODIFY_ATTRIBUTES   _IOW('$', 11, struct 
perf_event_attr *)
+#define PERF_EVENT_IOC_QUERY_PROBE _IOR('$', 12, struct 
perf_event_query_probe *)
 
 enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 026a14541a38..3e0fe6eaaad0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5060,6 +5060,8 @@ static int perf_event_set_filter(struct perf_event 
*event, void __user *arg);
 static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd);
 static int perf_copy_attr(struct perf_event_attr __user *uattr,
  struct perf_event_attr *attr);
+static int perf_probe_event_query(struct perf_event *event,
+   void __user *info);
 
 static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned 
long arg)
 {
@@ -5143,6 +5145,10 @@ static long _perf_ioctl(struct perf_event *event, 
unsigned int cmd, unsigned lon
 
return perf_event_modify_attr(event,  _attr);
}
+#if defined(CONFIG_KPROBE_EVENTS) || defined(CONFIG_UPROBE_EVENTS)
+   case PERF_EVENT_IOC_QUERY_PROBE:
+   return perf_probe_event_query(event, (void __user *)arg);
+#endif
default:
return -ENOTTY;
}
@@ -8833,6 +8839,20 @@ static inline void perf_tp_register(void)
 #endif
 }
 
+static int perf_probe_event_query(struct perf_event *event,
+   void __user