Re: [PATCH bpf-next RFC v3 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call

2019-07-05 Thread Y Song
On Wed, Jul 3, 2019 at 10:03 AM Brian Vazquez  wrote:
>
> This introduces a new command to retrieve a variable number of entries
> from a bpf map wrapping the existing bpf methods:
> map_get_next_key and map_lookup_elem
>
> To start dumping the map from the beginning you must specify NULL as
> the prev_key.
>
> The new API returns 0 when it successfully copied all the elements
> requested or it copied less because there weren't more elements to
> retrieved (err == -ENOENT). In last scenario err will be masked to 0.
>
> On a successful call buf and buf_len will contain correct data and in
> case prev_key was provided (not for the first walk, since prev_key is
> NULL) it will contain the last_key copied into the buf which will simplify
> next call.
>
> Only when it can't find a single element it will return -ENOENT meaning
> that the map has been entirely walked. When an error is return buf,
> buf_len and prev_key shouldn't be read nor used.
>
> Because maps can be called from userspace and kernel code, this function
> can have a scenario where the next_key was found but by the time we
> try to retrieve the value the element is not there, in this case the
> function continues and tries to get a new next_key value, skipping the
> deleted key. If at some point the function find itself trap in a loop,
> it will return -EINTR.
>
> The function will try to fit as much as possible in the buf provided and
> will return -EINVAL if buf_len is smaller than elem_size.
>
> QUEUE and STACK maps are not supported.
>
> Note that map_dump doesn't guarantee that reading the entire table is
> consistent since this function is always racing with kernel and user code
> but the same behaviour is found when the entire table is walked using
> the current interfaces: map_get_next_key + map_lookup_elem.
> It is also important to note that when a locked map the lock is grabbed for
> 1 entry at the time, meaning that the buf returned might or might not be
> consistent.

First, thanks for the RFC. I do think there are use cases where
batch dumping helps.
Some comments below.

>
> Suggested-by: Stanislav Fomichev 
> Signed-off-by: Brian Vazquez 
> ---
>  include/uapi/linux/bpf.h |   9 +++
>  kernel/bpf/syscall.c | 118 +++
>  2 files changed, 127 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index cffea1826a1f..cc589570a639 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -106,6 +106,7 @@ enum bpf_cmd {
> BPF_TASK_FD_QUERY,
> BPF_MAP_LOOKUP_AND_DELETE_ELEM,
> BPF_MAP_FREEZE,
> +   BPF_MAP_DUMP,
>  };
>
>  enum bpf_map_type {
> @@ -388,6 +389,14 @@ union bpf_attr {
> __u64   flags;
> };
>
> +   struct { /* struct used by BPF_MAP_DUMP command */
> +   __u32   map_fd;
> +   __aligned_u64   prev_key;
> +   __aligned_u64   buf;
> +   __aligned_u64   buf_len; /* input/output: len of buf */
> +   __u64   flags;
> +   } dump;

Maybe you can swap map_fd and flags?
This way, you won't have hole right after map_fd?

> +
> struct { /* anonymous struct used by BPF_PROG_LOAD command */
> __u32   prog_type;  /* one of enum bpf_prog_type 
> */
> __u32   insn_cnt;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index d200d2837ade..78d55463fc76 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1097,6 +1097,121 @@ static int map_get_next_key(union bpf_attr *attr)
> return err;
>  }
>
> +/* last field in 'union bpf_attr' used by this command */
> +#define BPF_MAP_DUMP_LAST_FIELD dump.buf_len
> +
> +static int map_dump(union bpf_attr *attr)
> +{
> +   void __user *ukey = u64_to_user_ptr(attr->dump.prev_key);
> +   void __user *ubuf = u64_to_user_ptr(attr->dump.buf);
> +   u32 __user *ubuf_len = u64_to_user_ptr(attr->dump.buf_len);
> +   int ufd = attr->dump.map_fd;
> +   struct bpf_map *map;
> +   void *buf, *prev_key, *key, *value;
> +   u32 value_size, elem_size, buf_len, cp_len;
> +   struct fd f;
> +   int err;
> +   bool first_key = false;
> +
> +   if (CHECK_ATTR(BPF_MAP_DUMP))
> +   return -EINVAL;
> +
> +   attr->flags = 0;

Why do you want attr->flags? This is to modify anonumous struct used by
BPF_MAP_*_ELEM commands.

> +   if (attr->dump.flags & ~BPF_F_LOCK)
> +   return -EINVAL;
> +
> +   f = fdget(ufd);
> +   map = __bpf_map_get(f);
> +   if (IS_ERR(map))
> +   return PTR_ERR(map);
> +   if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
> +   err = -EPERM;
> +   goto err_put;
> +   }
> +
> +   if ((attr->dump.flags & BPF_F_LOCK) &&
> +   !map_value_has_spin_lock(map)) {
> +   err = -EINVAL;
> +   goto err_put;
> +   }
> +
> +

Re: [PATCH bpf-next v4 1/4] bpf: sock ops: add netns ino and dev in bpf context

2019-05-24 Thread Y Song
On Fri, May 24, 2019 at 9:01 AM Iago López Galeiras  wrote:
>
> From: Alban Crequy 
>
> sockops programs can now access the network namespace inode and device
> via (struct bpf_sock_ops)->netns_ino and ->netns_dev. This can be useful
> to apply different policies on different network namespaces.
>
> In the unlikely case where network namespaces are not compiled in
> (CONFIG_NET_NS=n), the verifier will return netns_dev as usual and will
> return 0 for netns_ino.
>
> The generated BPF bytecode for netns_ino is loading the correct inode
> number at the time of execution.
>
> However, the generated BPF bytecode for netns_dev is loading an
> immediate value determined at BPF-load-time by looking at the initial
> network namespace. In practice, this works because all netns currently
> use the same virtual device. If this was to change, this code would need
> to be updated too.
>
> Co-authored-by: Iago López Galeiras 
> Signed-off-by: Alban Crequy 
> Signed-off-by: Iago López Galeiras 
>
> ---
>
> Changes since v1:
> - add netns_dev (review from Alexei)
>
> Changes since v2:
> - replace __u64 by u64 in kernel code (review from Y Song)
> - remove unneeded #else branch: program would be rejected in
>   is_valid_access (review from Y Song)
> - allow partial reads (
> Changes since v3:
> - return netns_dev unconditionally and set netns_ino to 0 if
>   CONFIG_NET_NS is not enabled (review from Jakub Kicinski)
> - use bpf_ctx_record_field_size and bpf_ctx_narrow_access_ok instead of
>   manually deal with partial reads (review from Y Song)
> - update commit message to reflect new code and remove note about
>   partial reads since it was discussed in the review
> - use bpf_ctx_range() and offsetofend()
> ---
>  include/uapi/linux/bpf.h |  2 ++
>  net/core/filter.c| 70 
>  2 files changed, 72 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 63e0cf66f01a..e64066a09a5f 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3261,6 +3261,8 @@ struct bpf_sock_ops {
> __u32 sk_txhash;
> __u64 bytes_received;
> __u64 bytes_acked;
> +   __u64 netns_dev;
> +   __u64 netns_ino;
>  };
>
>  /* Definitions for bpf_sock_ops_cb_flags */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 55bfc941d17a..2b1552a8dd74 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -76,6 +76,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  /**
>   * sk_filter_trim_cap - run a packet through a socket filter
> @@ -6822,6 +6824,18 @@ static bool sock_ops_is_valid_access(int off, int size,
> }
> } else {
> switch (off) {
> +   case bpf_ctx_range(struct bpf_sock_ops, netns_dev):
> +   if (off >= offsetofend(struct bpf_sock_ops, 
> netns_dev))
> +   return false;

This is not needed.
#define bpf_ctx_range(TYPE, MEMBER)
 \
offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1
off should never be >= offsetofend(struct bpf_sock_ops, netns_dev).

> +
> +   bpf_ctx_record_field_size(info, sizeof(u64));
> +   if (!bpf_ctx_narrow_access_ok(off, size, sizeof(u64)))
> +   return false;
> +   break;
> +   case offsetof(struct bpf_sock_ops, netns_ino):
> +   if (size != sizeof(u64))
> +   return false;
> +   break;
> case bpf_ctx_range_till(struct bpf_sock_ops, bytes_received,
> bytes_acked):
> if (size != sizeof(__u64))
> @@ -7739,6 +7753,11 @@ static u32 sock_addr_convert_ctx_access(enum 
> bpf_access_type type,
> return insn - insn_buf;
>  }
>
> +static struct ns_common *sockops_netns_cb(void *private_data)
> +{
> +   return &init_net.ns;
> +}
> +
>  static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>const struct bpf_insn *si,
>struct bpf_insn *insn_buf,
> @@ -7747,6 +7766,10 @@ static u32 sock_ops_convert_ctx_access(enum 
> bpf_access_type type,
>  {
> struct bpf_insn *insn = insn_buf;
> int off;
> +   struct inode *ns_inode;
> +   struct path ns_path;
> +   u64 netns_dev;
> +   void *res;
>
>  /* Helper macro for adding read access to tcp_sock or sock fields. */
>  #define SOCK_OPS_GET_FIE

Re: [PATCH v2] tools/bpf: fix perf build error with uClibc (seen on ARC)

2019-05-02 Thread Y Song
On Thu, May 2, 2019 at 8:57 AM Vineet Gupta  wrote:
>
> When build perf for ARC recently, there was a build failure due to lack
> of __NR_bpf.
>
> | Auto-detecting system features:
> |
> | ... get_cpuid: [ OFF ]
> | ...   bpf: [ on  ]
> |
> | #  error __NR_bpf not defined. libbpf does not support your arch.
> ^
> | bpf.c: In function 'sys_bpf':
> | bpf.c:66:17: error: '__NR_bpf' undeclared (first use in this function)
> |  return syscall(__NR_bpf, cmd, attr, size);
> | ^~~~
> | sys_bpf
>
> Signed-off-by: Vineet Gupta 

Acked-by: Yonghong Song 

> ---
> v1 -> v2
>   - Only add syscall nr for ARC, as asm-generic won't work with arm/sh [Y 
> Song]
> ---
>  tools/lib/bpf/bpf.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 9cd015574e83..d82edadf7589 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -46,6 +46,8 @@
>  #  define __NR_bpf 349
>  # elif defined(__s390__)
>  #  define __NR_bpf 351
> +# elif defined(__arc__)
> +#  define __NR_bpf 280
>  # else
>  #  error __NR_bpf not defined. libbpf does not support your arch.
>  # endif
> --
> 2.7.4
>


Re: [PATCH V2] bpf: fix warning about using plain integer as NULL

2019-03-07 Thread Y Song
On Thu, Mar 7, 2019 at 10:12 PM Bo YU  wrote:
>
> Sparse warning below:
>
> sudo make C=2 CF=-D__CHECK_ENDIAN__ M=net/bpf/
> CHECK   net/bpf//test_run.c
> net/bpf//test_run.c:19:77: warning: Using plain integer as NULL pointer
> ./include/linux/bpf-cgroup.h:295:77: warning: Using plain integer as NULL 
> pointer
>
> Fixes: 8bad74f9840f ("bpf: extend cgroup bpf core to allow multiple
> cgroup storage types")
>
> Acked-by: Yonghong Song 
> Signed-off-by: Bo YU 

There should be no empty line between tags ("Fixes" and "Acked-by").
"Fixes" tag should be all in one line. My previous reply may be
wrapped by the email client, which is not my intention.

> ---
> V2: Add fix-up tag from Yonghong
> ---
>  include/linux/bpf-cgroup.h | 2 +-
>  net/bpf/test_run.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 695b2a880d9a..a4c644c1c091 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -292,7 +292,7 @@ static inline int bpf_cgroup_storage_assign(struct 
> bpf_prog *prog,
>  static inline void bpf_cgroup_storage_release(struct bpf_prog *prog,
>   struct bpf_map *map) {}
>  static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
> -   struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return 
> 0; }
> +   struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return 
> NULL; }
>  static inline void bpf_cgroup_storage_free(
> struct bpf_cgroup_storage *storage) {}
>  static inline int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void 
> *key,
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index da7051d62727..fab142b796ef 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -16,7 +16,7 @@
>  static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
> u32 *retval, u32 *time)
>  {
> -   struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 
> };
> +   struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 
> NULL };
> enum bpf_cgroup_storage_type stype;
> u64 time_start, time_spent = 0;
> int ret = 0;
> --
> 2.11.0
>


Re: bpf: BPF_PROG_TEST_RUN leads to unkillable process

2019-02-04 Thread Y Song
On Mon, Feb 4, 2019 at 9:49 AM Stanislav Fomichev  wrote:
>
> On 02/01, Dmitry Vyukov wrote:
> > Hello,
> >
> > The following program leads to an unkillable process that eats CPU in
> > an infinite loop in BPF_PROG_TEST_RUN syscall. But kernel does not
> > self-detect cpu/rcu/task stalls either. The program contains max
> > number of repetitions, but as far as I see the intention is that it
> > should be killable. I see that bpf_test_run() checks for
> > signal_pending(current), but it does so only if need_resched() is also
> > set. Can need_resched() be not set for prolonged periods of time?
> > /proc/pid/stack is empty, not sure what other info I can provide.
> There is a bunch of places in the kernel where we do the same nested check:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c#n12059
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/char/hw_random/s390-trng.c#n80
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/char/random.c#n1049
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/s390/crypto/prng.c#n470
>
> So it's not something unusual we do. OTOH, in the kernel/bpf/verifier.c
> do_check() we do signal_pending() and need_resched() sequentially. In
> theory, it should not hurt to do them in sequence. Any thoughts about
> the patch below? I think we also need to properly return -ERESTARTSYS
> when returning from signal_pending().

I think return value -ERESTARTSYS should be okay.
For the test_run attributes,

struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
__u32   prog_fd;
__u32   retval;
__u32   data_size_in;   /* input: len of data_in */
__u32   data_size_out;  /* input/output: len of data_out
 *   returns ENOSPC if data_out
 *   is too small.
 */
__aligned_u64   data_in;
__aligned_u64   data_out;
__u32   repeat;
__u32   duration;
} test;

The field data_size_out could be changed during the system call.
But that only happens at bpf_test_finish(). At the time when
-ERESTARTSYS is returned, no attributes have been changed.

>
> --
>
> From ce360c909ce4f3caf8eb69f2ad5ce0d3eee1515d Mon Sep 17 00:00:00 2001
> Message-Id: 
> 
> From: Stanislav Fomichev 
> Date: Mon, 4 Feb 2019 09:17:37 -0800
> Subject: [PATCH bpf] bpf/test_run: properly handle signal_pending
>
> Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0x
> makes process unkillable. Let's move signal_pending out of need_resched
> and properly return -ERESTARTSYS to the userspace.
>
> In the kernel/bpf/verifier.c do_check() we do:
> if (signal_pending())
> ...
> if (need_resched())
> ...
>
> Reported-by: syzbot 
> Signed-off-by: Stanislav Fomichev 
> ---
>  net/bpf/test_run.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index fa2644d276ef..a891c60cf248 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -28,12 +28,13 @@ static __always_inline u32 bpf_test_run_one(struct 
> bpf_prog *prog, void *ctx,
> return ret;
>  }
>
> -static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
> *ret,
> -   u32 *time)
> +static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
> +   u32 *retval, u32 *time)
>  {
> struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 
> };
> enum bpf_cgroup_storage_type stype;
> u64 time_start, time_spent = 0;
> +   int ret = 0;
> u32 i;
>
> for_each_cgroup_storage_type(stype) {
> @@ -50,10 +51,12 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, 
> u32 repeat, u32 *ret,
> repeat = 1;
> time_start = ktime_get_ns();
> for (i = 0; i < repeat; i++) {
> -   *ret = bpf_test_run_one(prog, ctx, storage);
> +   *retval = bpf_test_run_one(prog, ctx, storage);
> +   if (signal_pending(current)) {
> +   ret = -ERESTARTSYS;
> +   break;
> +   }
> if (need_resched()) {
> -   if (signal_pending(current))
> -   break;
> time_spent += ktime_get_ns() - time_start;
> cond_resched();
> time_start = ktime_get_ns();
> @@ -66,7 +69,7 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, 
> u32 repeat, u32 *ret,
> for_each_cgroup_storage_type(stype)
> bpf_c

Re: [PATCH 2/4] net/bpf: refactor freeing of executable allocations

2018-11-18 Thread Y Song
On Sun, Nov 18, 2018 at 3:55 PM Ard Biesheuvel
 wrote:
>
> On Sat, 17 Nov 2018 at 23:47, Y Song  wrote:
> >
> > On Sat, Nov 17, 2018 at 6:58 PM Ard Biesheuvel
> >  wrote:
> > >
> > > All arch overrides of the __weak bpf_jit_free() amount to the same
> > > thing: the allocated memory was never mapped read-only, and so
> > > it does not have to be remapped to read-write before being freed.
> > >
> > > So in preparation of permitting arches to serve allocations for BPF
> > > JIT programs from other regions than the module region, refactor
> > > the existing bpf_jit_free() implementations to use the shared code
> > > where possible, and only specialize the remap and free operations.
> > >
> > > Signed-off-by: Ard Biesheuvel 
> > > ---
> > >  arch/mips/net/bpf_jit.c   |  7 ++-
> > >  arch/powerpc/net/bpf_jit_comp.c   |  7 ++-
> > >  arch/powerpc/net/bpf_jit_comp64.c |  9 +++--
> > >  arch/sparc/net/bpf_jit_comp_32.c  |  7 ++-
> > >  kernel/bpf/core.c | 15 +--
> > >  5 files changed, 14 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> > > index 1b69897274a1..5696bd7dccc7 100644
> > > --- a/arch/mips/net/bpf_jit.c
> > > +++ b/arch/mips/net/bpf_jit.c
> > > @@ -1261,10 +1261,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> > > kfree(ctx.offsets);
> > >  }
> > >
> > > -void bpf_jit_free(struct bpf_prog *fp)
> > > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > >  {
> > > -   if (fp->jited)
> > > -   bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > > -
> > > -   bpf_prog_unlock_free(fp);
> > > +   module_memfree(hdr);
> > >  }
> > > diff --git a/arch/powerpc/net/bpf_jit_comp.c 
> > > b/arch/powerpc/net/bpf_jit_comp.c
> > > index a1ea1ea6b40d..5b5ce4a1b44b 100644
> > > --- a/arch/powerpc/net/bpf_jit_comp.c
> > > +++ b/arch/powerpc/net/bpf_jit_comp.c
> > > @@ -680,10 +680,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> > > return;
> > >  }
> > >
> > > -void bpf_jit_free(struct bpf_prog *fp)
> > > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > >  {
> > > -   if (fp->jited)
> > > -   bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > > -
> > > -   bpf_prog_unlock_free(fp);
> > > +   module_memfree(hdr);
> > >  }
> > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> > > b/arch/powerpc/net/bpf_jit_comp64.c
> > > index 84c8f013a6c6..f64f1294bd62 100644
> > > --- a/arch/powerpc/net/bpf_jit_comp64.c
> > > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > > @@ -1021,11 +1021,8 @@ struct bpf_prog *bpf_int_jit_compile(struct 
> > > bpf_prog *fp)
> > > return fp;
> > >  }
> > >
> > > -/* Overriding bpf_jit_free() as we don't set images read-only. */
> > > -void bpf_jit_free(struct bpf_prog *fp)
> > > +/* Overriding bpf_jit_binary_free() as we don't set images read-only. */
> > > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > >  {
> > > -   if (fp->jited)
> > > -   bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > > -
> > > -   bpf_prog_unlock_free(fp);
> > > +   module_memfree(hdr);
> > >  }
> > > diff --git a/arch/sparc/net/bpf_jit_comp_32.c 
> > > b/arch/sparc/net/bpf_jit_comp_32.c
> > > index 01bda6bc9e7f..589950d152cc 100644
> > > --- a/arch/sparc/net/bpf_jit_comp_32.c
> > > +++ b/arch/sparc/net/bpf_jit_comp_32.c
> > > @@ -756,10 +756,7 @@ cond_branch:   f_offset = 
> > > addrs[i + filter[i].jf];
> > > return;
> > >  }
> > >
> > > -void bpf_jit_free(struct bpf_prog *fp)
> > > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > >  {
> > > -   if (fp->jited)
> > > -   bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > > -
> > > -   bpf_prog_unlock_free(fp);
> > > +   module_memfree(hdr);
> > >  }
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 1a796e0799ec..29f766dac203 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -646,25 +646,20 @@ bpf_jit_binary_alloc(unsi

Re: [PATCH bpf] bpf: allocate local storage buffers using GFP_ATOMIC

2018-11-16 Thread Y Song
On Wed, Nov 14, 2018 at 6:01 PM Roman Gushchin  wrote:
>
> Naresh reported an issue with the non-atomic memory allocation of
> cgroup local storage buffers:
>
> [   73.047526] BUG: sleeping function called from invalid context at
> /srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
> [   73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: 
> test_cgroup_sto
> [   73.068342] INFO: lockdep is turned off.
> [   73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
> 4.20.0-rc2-next-20181113 #1
> [   73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.0b 07/27/2017
> [   73.088018] Call Trace:
> [   73.090463]  dump_stack+0x70/0xa5
> [   73.093783]  ___might_sleep+0x152/0x240
> [   73.097619]  __might_sleep+0x4a/0x80
> [   73.101191]  __kmalloc_node+0x1cf/0x2f0
> [   73.105031]  ? cgroup_storage_update_elem+0x46/0x90
> [   73.109909]  cgroup_storage_update_elem+0x46/0x90
>
> cgroup_storage_update_elem() (as well as other update map update
> callbacks) is called with disabled preemption, so GFP_ATOMIC
> allocation should be used: e.g. alloc_htab_elem() in hashtab.c.
>
> Reported-by: Naresh Kamboju 
> Tested-by: Naresh Kamboju 
> Signed-off-by: Roman Gushchin 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 

Acked-by: Yonghong Song 

> ---
>  kernel/bpf/local_storage.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)


Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-25 Thread Y Song
On Fri, May 25, 2018 at 8:21 AM, Alban Crequy  wrote:
> On Wed, May 23, 2018 at 4:34 AM Y Song  wrote:
>
>> I did a quick prototyping and the above interface seems working fine.
>
> Thanks! I gave your kernel patch & userspace program a try and it works for
> me on cgroup-v2.
>
> Also, I found out how to get my containers to use both cgroup-v1 and
> cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's
> '--exec-opt native.cgroupdriver=systemd' option). So I should be able to
> use the BPF helper function without having to add support for all the
> cgroup-v1 hierarchies.

Great. Will submit a formal patch soon.

>
>> The kernel change:
>> ===
>
>> [yhs@localhost bpf-next]$ git diff
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97446bbe2ca5..669b7383fddb 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1976,7 +1976,8 @@ union bpf_attr {
>>  FN(fib_lookup), \
>>  FN(sock_hash_update),   \
>>  FN(msg_redirect_hash),  \
>> -   FN(sk_redirect_hash),
>> +   FN(sk_redirect_hash),   \
>> +   FN(get_current_cgroup_id),
>
>>   /* integer value in 'imm' field of BPF_CALL instruction selects which
> helper
>>* function eBPF program intends to call
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ce2cbbff27e4..e11e3298f911 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -493,6 +493,21 @@ static const struct bpf_func_proto
>> bpf_current_task_under_cgroup_proto = {
>>  .arg2_type  = ARG_ANYTHING,
>>   };
>
>> +BPF_CALL_0(bpf_get_current_cgroup_id)
>> +{
>> +   struct cgroup *cgrp = task_dfl_cgroup(current);
>> +   if (!cgrp)
>> +   return -EINVAL;
>> +
>> +   return cgrp->kn->id.id;
>> +}
>> +
>> +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>> +   .func   = bpf_get_current_cgroup_id,
>> +   .gpl_only   = false,
>> +   .ret_type   = RET_INTEGER,
>> +};
>> +
>>   BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
>> const void *, unsafe_ptr)
>>   {
>> @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
>> struct bpf_prog *prog)
>>  return &bpf_get_prandom_u32_proto;
>>  case BPF_FUNC_probe_read_str:
>>  return &bpf_probe_read_str_proto;
>> +   case BPF_FUNC_get_current_cgroup_id:
>> +   return &bpf_get_current_cgroup_id_proto;
>>  default:
>>  return NULL;
>>  }
>
>> The following program can be used to print out a cgroup id given a cgroup
> path.
>> [yhs@localhost cg]$ cat get_cgroup_id.c
>> #define _GNU_SOURCE
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>
>> int main(int argc, char **argv)
>> {
>>  int dirfd, err, flags, mount_id, fhsize;
>>  struct file_handle *fhp;
>>  char *pathname;
>
>>  if (argc != 2) {
>>  printf("usage: %s \n", argv[0]);
>>  return 1;
>>  }
>
>>  pathname = argv[1];
>>  dirfd = AT_FDCWD;
>>  flags = 0;
>
>>  fhsize = sizeof(*fhp);
>>  fhp = malloc(fhsize);
>>  if (!fhp)
>>  return 1;
>
>>  err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>>  if (err >= 0) {
>>  printf("error\n");
>>  return 1;
>>  }
>
>>  fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
>>  fhp = realloc(fhp, fhsize);
>>  if (!fhp)
>>  return 1;
>
>>  err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>>  if (err < 0)
>>  perror("name_to_handle_at");
>>  else {
>>  int i;
>
>>  printf("dir = %s, mount_id = %d\n", pathname, mount_id);
>>  printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
>>  fhp->handle_type);
>>  if (fhp->handle_bytes != 8)
>>  return 1;
>
>>  printf("cgroup_id = 0x%llx\n", *(unsigned long long
> *)fhp->f_handle);
>>  }
>
>>  return 0;
>> }
>> [yhs@localhost cg]$
>
>&

Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-22 Thread Y Song
On Tue, May 22, 2018 at 8:35 PM, Alexei Starovoitov
 wrote:
> On Tue, May 22, 2018 at 08:33:24PM -0700, Y Song wrote:
>> +   struct cgroup *cgrp = task_dfl_cgroup(current);
>> +   if (!cgrp)
>> +   return -EINVAL;
>
> why this check is needed?

No reason :-) Originally I am concerned whether it is possible cgrp
could be NULL.
By looking at the code, it SEEMS to me that it could not be NULL, but I am not
100% sure (as I am not a cgroup expert). Since you are asking,
probably means it cannot be NULL, so will remove it in formal upstream patch.


Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-22 Thread Y Song
I did a quick prototyping and the above interface seems working fine.

The kernel change:
===

[yhs@localhost bpf-next]$ git diff
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 97446bbe2ca5..669b7383fddb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1976,7 +1976,8 @@ union bpf_attr {
FN(fib_lookup), \
FN(sock_hash_update),   \
FN(msg_redirect_hash),  \
-   FN(sk_redirect_hash),
+   FN(sk_redirect_hash),   \
+   FN(get_current_cgroup_id),

 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ce2cbbff27e4..e11e3298f911 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -493,6 +493,21 @@ static const struct bpf_func_proto
bpf_current_task_under_cgroup_proto = {
.arg2_type  = ARG_ANYTHING,
 };

+BPF_CALL_0(bpf_get_current_cgroup_id)
+{
+   struct cgroup *cgrp = task_dfl_cgroup(current);
+   if (!cgrp)
+   return -EINVAL;
+
+   return cgrp->kn->id.id;
+}
+
+static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
+   .func   = bpf_get_current_cgroup_id,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+};
+
 BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
   const void *, unsafe_ptr)
 {
@@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
struct bpf_prog *prog)
return &bpf_get_prandom_u32_proto;
case BPF_FUNC_probe_read_str:
return &bpf_probe_read_str_proto;
+   case BPF_FUNC_get_current_cgroup_id:
+   return &bpf_get_current_cgroup_id_proto;
default:
return NULL;
}

The following program can be used to print out a cgroup id given a cgroup path.
[yhs@localhost cg]$ cat get_cgroup_id.c
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
int dirfd, err, flags, mount_id, fhsize;
struct file_handle *fhp;
char *pathname;

if (argc != 2) {
printf("usage: %s \n", argv[0]);
return 1;
}

pathname = argv[1];
dirfd = AT_FDCWD;
flags = 0;

fhsize = sizeof(*fhp);
fhp = malloc(fhsize);
if (!fhp)
return 1;

err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
if (err >= 0) {
printf("error\n");
return 1;
}

fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
fhp = realloc(fhp, fhsize);
if (!fhp)
return 1;

err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
if (err < 0)
perror("name_to_handle_at");
else {
int i;

printf("dir = %s, mount_id = %d\n", pathname, mount_id);
printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
fhp->handle_type);
if (fhp->handle_bytes != 8)
return 1;

printf("cgroup_id = 0x%llx\n", *(unsigned long long *)fhp->f_handle);
}

return 0;
}
[yhs@localhost cg]$

Given a cgroup path, the user can get cgroup_id and use it in their bpf
program for filtering purpose.

I run a simple program t.c
   int main() { while(1) sleep(1); return 0; }
in the cgroup v2 directory /home/yhs/tmp/yhs
   none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)

$ ./get_cgroup_id /home/yhs/tmp/yhs
dir = /home/yhs/tmp/yhs, mount_id = 124
handle_bytes = 8, handle_type = 1
cgroup_id = 0x106b2

// the below command to get cgroup_id from the kernel for the
// process compiled with t.c and ran under /home/yhs/tmp/yhs:
$ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
PID TID COMMFUNC -
40674067a.out   __x64_sys_nanosleep cgid = 106b2
40674067a.out   __x64_sys_nanosleep cgid = 106b2
40674067a.out   __x64_sys_nanosleep cgid = 106b2
^C[yhs@localhost tools]$

The kernel and user space cgid matches. Will provide a
formal patch later.




On Mon, May 21, 2018 at 5:24 PM, Y Song  wrote:
> On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
>  wrote:
>> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
>>>
>>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
>>> +{
>>> + // TODO: pick the correct hierarchy instead of the mem controller
>>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
>>> +
>>> + if (unlikely(!cgrp))
>>> + return -EINVAL;
>>> + if (unlikely(hierarchy))
>>> + return -EINVAL;
>>> +

Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-21 Thread Y Song
On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
 wrote:
> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
>>
>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
>> +{
>> + // TODO: pick the correct hierarchy instead of the mem controller
>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
>> +
>> + if (unlikely(!cgrp))
>> + return -EINVAL;
>> + if (unlikely(hierarchy))
>> + return -EINVAL;
>> + if (unlikely(flags))
>> + return -EINVAL;
>> +
>> + return cgrp->kn->id.ino;
>
> ino only is not enough to identify cgroup. It needs generation number too.
> I don't quite see how hierarchy and flags can be used in the future.
> Also why limit it to memcg?
>
> How about something like this instead:
>
> BPF_CALL_2(bpf_get_current_cgroup_id)
> {
> struct cgroup *cgrp = task_dfl_cgroup(current);
>
> return cgrp->kn->id.id;
> }
> The user space can use fhandle api to get the same 64-bit id.

I think this should work. This will also be useful to bcc as user
space can encode desired id
in the bpf program and compared that id to the current cgroup id, so we can have
cgroup level tracing (esp. stat collection) support. To cope with
cgroup hierarchy, user can use
cgroup-array based approach or explicitly compare against multiple cgroup id's.


Re: [PATCH v4 3/3] bpf: add selftest for lirc_mode2 type program

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 1:17 PM, Y Song  wrote:
> On Fri, May 18, 2018 at 7:07 AM, Sean Young  wrote:
>> This is simple test over rc-loopback.
>>
>> Signed-off-by: Sean Young 
>
> Acked-by: Yonghong Song 

Just one minor thing. You need to add "test_lirc_mode2_user"
in tools/testing/selftests/bpf/.gitignore
so it will not show up when you do "git status".

If the patch needs respin, you can add this in the new revision.
Otherwise, I think a followup patch to fix this should be fine.

>
>> ---
>>  tools/bpf/bpftool/prog.c  |   1 +
>>  tools/include/uapi/linux/bpf.h|  53 -
>>  tools/include/uapi/linux/lirc.h   | 217 ++
>>  tools/lib/bpf/libbpf.c|   1 +
>>  tools/testing/selftests/bpf/Makefile  |   8 +-
>>  tools/testing/selftests/bpf/bpf_helpers.h |   6 +
>>  .../testing/selftests/bpf/test_lirc_mode2.sh  |  28 +++
>>  .../selftests/bpf/test_lirc_mode2_kern.c  |  23 ++
>>  .../selftests/bpf/test_lirc_mode2_user.c  | 154 +
>>  9 files changed, 487 insertions(+), 4 deletions(-)
>>  create mode 100644 tools/include/uapi/linux/lirc.h
>>  create mode 100755 tools/testing/selftests/bpf/test_lirc_mode2.sh
>>  create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_kern.c
>>  create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_user.c
>>
>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>> index 9bdfdf2d3fbe..07f1ace39a46 100644
>> --- a/tools/bpf/bpftool/prog.c
>> +++ b/tools/bpf/bpftool/prog.c
>> @@ -71,6 +71,7 @@ static const char * const prog_type_name[] = {
>> [BPF_PROG_TYPE_SK_MSG]  = "sk_msg",
>> [BPF_PROG_TYPE_RAW_TRACEPOINT]  = "raw_tracepoint",
>> [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
>> +   [BPF_PROG_TYPE_LIRC_MODE2]  = "lirc_mode2",
>>  };
>>
>>  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index d94d333a8225..8227832b713e 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -141,6 +141,7 @@ enum bpf_prog_type {
>> BPF_PROG_TYPE_SK_MSG,
>> BPF_PROG_TYPE_RAW_TRACEPOINT,
>> BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
>> +   BPF_PROG_TYPE_LIRC_MODE2,
>>  };
>>
>>  enum bpf_attach_type {
>> @@ -158,6 +159,7 @@ enum bpf_attach_type {
>> BPF_CGROUP_INET6_CONNECT,
>> BPF_CGROUP_INET4_POST_BIND,
>> BPF_CGROUP_INET6_POST_BIND,
>> +   BPF_LIRC_MODE2,
>> __MAX_BPF_ATTACH_TYPE
>>  };
>>
>> @@ -1902,6 +1904,53 @@ union bpf_attr {
>>   * egress otherwise). This is the only flag supported for now.
>>   * Return
>>   * **SK_PASS** on success, or **SK_DROP** on error.
>> + *
>> + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
>> + * Description
>> + * This helper is used in programs implementing IR decoding, to
>> + * report a successfully decoded key press with *scancode*,
>> + * *toggle* value in the given *protocol*. The scancode will be
>> + * translated to a keycode using the rc keymap, and reported as
>> + * an input key down event. After a period a key up event is
>> + * generated. This period can be extended by calling either
>> + * **bpf_rc_keydown** () with the same values, or calling
>> + * **bpf_rc_repeat** ().
>> + *
>> + * Some protocols include a toggle bit, in case the button
>> + * was released and pressed again between consecutive scancodes
>> + *
>> + * The *ctx* should point to the lirc sample as passed into
>> + * the program.
>> + *
>> + * The *protocol* is the decoded protocol number (see
>> + * **enum rc_proto** for some predefined values).
>> + *
>> + * This helper is only available is the kernel was compiled with
>> + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
>> + * "**y**".
>> + *
>> + * Return
>> + * 0
>> + *
>> + * int bpf_rc_repeat(void *ctx)
>> + * Description
>> + * This helper is used in programs implementing IR decoding, to
>> + * report a successfully decoded 

Re: [PATCH v4 3/3] bpf: add selftest for lirc_mode2 type program

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 7:07 AM, Sean Young  wrote:
> This is simple test over rc-loopback.
>
> Signed-off-by: Sean Young 

Acked-by: Yonghong Song 

> ---
>  tools/bpf/bpftool/prog.c  |   1 +
>  tools/include/uapi/linux/bpf.h|  53 -
>  tools/include/uapi/linux/lirc.h   | 217 ++
>  tools/lib/bpf/libbpf.c|   1 +
>  tools/testing/selftests/bpf/Makefile  |   8 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |   6 +
>  .../testing/selftests/bpf/test_lirc_mode2.sh  |  28 +++
>  .../selftests/bpf/test_lirc_mode2_kern.c  |  23 ++
>  .../selftests/bpf/test_lirc_mode2_user.c  | 154 +
>  9 files changed, 487 insertions(+), 4 deletions(-)
>  create mode 100644 tools/include/uapi/linux/lirc.h
>  create mode 100755 tools/testing/selftests/bpf/test_lirc_mode2.sh
>  create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_kern.c
>  create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_user.c
>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 9bdfdf2d3fbe..07f1ace39a46 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -71,6 +71,7 @@ static const char * const prog_type_name[] = {
> [BPF_PROG_TYPE_SK_MSG]  = "sk_msg",
> [BPF_PROG_TYPE_RAW_TRACEPOINT]  = "raw_tracepoint",
> [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
> +   [BPF_PROG_TYPE_LIRC_MODE2]  = "lirc_mode2",
>  };
>
>  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d94d333a8225..8227832b713e 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -141,6 +141,7 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_SK_MSG,
> BPF_PROG_TYPE_RAW_TRACEPOINT,
> BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> +   BPF_PROG_TYPE_LIRC_MODE2,
>  };
>
>  enum bpf_attach_type {
> @@ -158,6 +159,7 @@ enum bpf_attach_type {
> BPF_CGROUP_INET6_CONNECT,
> BPF_CGROUP_INET4_POST_BIND,
> BPF_CGROUP_INET6_POST_BIND,
> +   BPF_LIRC_MODE2,
> __MAX_BPF_ATTACH_TYPE
>  };
>
> @@ -1902,6 +1904,53 @@ union bpf_attr {
>   * egress otherwise). This is the only flag supported for now.
>   * Return
>   * **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
> + * Description
> + * This helper is used in programs implementing IR decoding, to
> + * report a successfully decoded key press with *scancode*,
> + * *toggle* value in the given *protocol*. The scancode will be
> + * translated to a keycode using the rc keymap, and reported as
> + * an input key down event. After a period a key up event is
> + * generated. This period can be extended by calling either
> + * **bpf_rc_keydown** () with the same values, or calling
> + * **bpf_rc_repeat** ().
> + *
> + * Some protocols include a toggle bit, in case the button
> + * was released and pressed again between consecutive scancodes
> + *
> + * The *ctx* should point to the lirc sample as passed into
> + * the program.
> + *
> + * The *protocol* is the decoded protocol number (see
> + * **enum rc_proto** for some predefined values).
> + *
> + * This helper is only available is the kernel was compiled with
> + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> + * "**y**".
> + *
> + * Return
> + * 0
> + *
> + * int bpf_rc_repeat(void *ctx)
> + * Description
> + * This helper is used in programs implementing IR decoding, to
> + * report a successfully decoded repeat key message. This delays
> + * the generation of a key up event for previously generated
> + * key down event.
> + *
> + * Some IR protocols like NEC have a special IR message for
> + * repeating last button, for when a button is held down.
> + *
> + * The *ctx* should point to the lirc sample as passed into
> + * the program.
> + *
> + * This helper is only available is the kernel was compiled with
> + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> + * "**y**".
> + *
> + * Return
> + * 0
>   */
>  #define __BPF_FUNC_MAPPER(FN)  \
> FN(unspec), \
> @@ -1976,7 +2025,9 @@ union bpf_attr {
> FN(fib_lookup), \
> FN(sock_hash_update),   \
> FN(msg_redirect_hash),  \
> -   FN(sk_redirect_hash),
> +   FN(sk_redirect_hash),   \
> +   FN(rc_repeat),  \
> +   F

Re: [PATCH v4 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 7:07 AM, Sean Young  wrote:
> Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
>
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.
>
> Signed-off-by: Sean Young 

Acked-by: Yonghong Song 

> ---
>  drivers/media/rc/Kconfig|  13 ++
>  drivers/media/rc/Makefile   |   1 +
>  drivers/media/rc/bpf-lirc.c | 308 
>  drivers/media/rc/lirc_dev.c |  30 
>  drivers/media/rc/rc-core-priv.h |  22 +++
>  drivers/media/rc/rc-ir-raw.c|  12 +-
>  include/linux/bpf_rcdev.h   |  30 
>  include/linux/bpf_types.h   |   3 +
>  include/uapi/linux/bpf.h|  53 +-
>  kernel/bpf/syscall.c|   7 +
>  10 files changed, 476 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/media/rc/bpf-lirc.c
>  create mode 100644 include/linux/bpf_rcdev.h
>
> diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> index eb2c3b6eca7f..d5b35a6ba899 100644
> --- a/drivers/media/rc/Kconfig
> +++ b/drivers/media/rc/Kconfig
> @@ -25,6 +25,19 @@ config LIRC
>passes raw IR to and from userspace, which is needed for
>IR transmitting (aka "blasting") and for the lirc daemon.
>
> +config BPF_LIRC_MODE2
> +   bool "Support for eBPF programs attached to lirc devices"
> +   depends on BPF_SYSCALL
> +   depends on RC_CORE=y
> +   depends on LIRC
> +   help
> +  Allow attaching eBPF programs to a lirc device using the bpf(2)
> +  syscall command BPF_PROG_ATTACH. This is supported for raw IR
> +  receivers.
> +
> +  These eBPF programs can be used to decode IR into scancodes, for
> +  IR protocols not supported by the kernel decoders.
> +
>  menuconfig RC_DECODERS
> bool "Remote controller decoders"
> depends on RC_CORE
> diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> index 2e1c87066f6c..e0340d043fe8 100644
> --- a/drivers/media/rc/Makefile
> +++ b/drivers/media/rc/Makefile
> @@ -5,6 +5,7 @@ obj-y += keymaps/
>  obj-$(CONFIG_RC_CORE) += rc-core.o
>  rc-core-y := rc-main.o rc-ir-raw.o
>  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> +rc-core-$(CONFIG_BPF_LIRC_MODE2) += bpf-lirc.o
>  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
> new file mode 100644
> index ..c9673df2d9cd
> --- /dev/null
> +++ b/drivers/media/rc/bpf-lirc.c
> @@ -0,0 +1,308 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// bpf-lirc.c - handles bpf
> +//
> +// Copyright (C) 2018 Sean Young 
> +
> +#include 
> +#include 
> +#include 
> +#include "rc-core-priv.h"
> +
> +/*
> + * BPF interface for raw IR
> + */
> +const struct bpf_prog_ops lirc_mode2_prog_ops = {
> +};
> +
> +BPF_CALL_1(bpf_rc_repeat, u32*, sample)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample);
> +
> +   rc_repeat(ctrl->dev);
> +
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_repeat_proto = {
> +   .func  = bpf_rc_repeat,
> +   .gpl_only  = true, /* rc_repeat is EXPORT_SYMBOL_GPL */
> +   .ret_type  = RET_INTEGER,
> +   .arg1_type = ARG_PTR_TO_CTX,
> +};
> +
> +/*
> + * Currently rc-core does not support 64-bit scancodes, but there are many
> + * known protocols with more than 32 bits. So, define the interface as u64
> + * as a future-proof.
> + */
> +BPF_CALL_4(bpf_rc_keydown, u32*, sample, u32, protocol, u64, scancode,
> +  u32, toggle)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample);
> +
> +   rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
> +
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_keydown_proto = {
> +   .func  = bpf_rc_keydown,
> +   .gpl_only  = true, /* rc_keydown is EXPORT_SYMBOL_GPL */
> +   .ret_type  = RET_INTEGER,
> +   .arg1_type = ARG_PTR_TO_CTX,
> +   .arg2_type = ARG_ANYTHING,
> +   .arg3_type = ARG_ANYTHING,
> +   .arg4_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto *
> +lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> +   switch (func_id) {
> +   case BPF_FUNC_rc_repeat:
> +   return &rc_repeat_proto;
> +   case BPF_FUNC_rc_keydown:
> +   return &rc_keydown_proto;
> +   case BPF_FUNC_map_lookup_elem:
> +   return &bpf_map_lookup_elem_proto;
> +   case BPF_FUNC_map_update_elem:
> +   return &bpf_map_update_elem_proto;
> +   case BPF_FUNC_map_delete_elem:
> +

Re: [PATCH v4 1/3] bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not found

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 7:07 AM, Sean Young  wrote:
> This makes is it possible for bpf prog detach to return -ENOENT.
>
> Signed-off-by: Sean Young 

Acked-by: Yonghong Song 


Re: [PATCH v3 1/2] media: rc: introduce BPF_PROG_RAWIR_EVENT

2018-05-17 Thread Y Song
On Thu, May 17, 2018 at 2:45 PM, Sean Young  wrote:
> Hi,
>
> Again thanks for a thoughtful review. This will definitely will improve
> the code.
>
> On Thu, May 17, 2018 at 10:02:52AM -0700, Y Song wrote:
>> On Wed, May 16, 2018 at 2:04 PM, Sean Young  wrote:
>> > Add support for BPF_PROG_RAWIR_EVENT. This type of BPF program can call
>> > rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
>> > that the last key should be repeated.
>> >
>> > The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
>> > the target_fd must be the /dev/lircN device.
>> >
>> > Signed-off-by: Sean Young 
>> > ---
>> >  drivers/media/rc/Kconfig   |  13 ++
>> >  drivers/media/rc/Makefile  |   1 +
>> >  drivers/media/rc/bpf-rawir-event.c | 363 +
>> >  drivers/media/rc/lirc_dev.c|  24 ++
>> >  drivers/media/rc/rc-core-priv.h|  24 ++
>> >  drivers/media/rc/rc-ir-raw.c   |  14 +-
>> >  include/linux/bpf_rcdev.h  |  30 +++
>> >  include/linux/bpf_types.h  |   3 +
>> >  include/uapi/linux/bpf.h   |  55 -
>> >  kernel/bpf/syscall.c   |   7 +
>> >  10 files changed, 531 insertions(+), 3 deletions(-)
>> >  create mode 100644 drivers/media/rc/bpf-rawir-event.c
>> >  create mode 100644 include/linux/bpf_rcdev.h
>> >
>> > diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
>> > index eb2c3b6eca7f..2172d65b0213 100644
>> > --- a/drivers/media/rc/Kconfig
>> > +++ b/drivers/media/rc/Kconfig
>> > @@ -25,6 +25,19 @@ config LIRC
>> >passes raw IR to and from userspace, which is needed for
>> >IR transmitting (aka "blasting") and for the lirc daemon.
>> >
>> > +config BPF_RAWIR_EVENT
>> > +   bool "Support for eBPF programs attached to lirc devices"
>> > +   depends on BPF_SYSCALL
>> > +   depends on RC_CORE=y
>> > +   depends on LIRC
>> > +   help
>> > +  Allow attaching eBPF programs to a lirc device using the bpf(2)
>> > +  syscall command BPF_PROG_ATTACH. This is supported for raw IR
>> > +  receivers.
>> > +
>> > +  These eBPF programs can be used to decode IR into scancodes, for
>> > +  IR protocols not supported by the kernel decoders.
>> > +
>> >  menuconfig RC_DECODERS
>> > bool "Remote controller decoders"
>> > depends on RC_CORE
>> > diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
>> > index 2e1c87066f6c..74907823bef8 100644
>> > --- a/drivers/media/rc/Makefile
>> > +++ b/drivers/media/rc/Makefile
>> > @@ -5,6 +5,7 @@ obj-y += keymaps/
>> >  obj-$(CONFIG_RC_CORE) += rc-core.o
>> >  rc-core-y := rc-main.o rc-ir-raw.o
>> >  rc-core-$(CONFIG_LIRC) += lirc_dev.o
>> > +rc-core-$(CONFIG_BPF_RAWIR_EVENT) += bpf-rawir-event.o
>> >  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>> >  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>> >  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
>> > diff --git a/drivers/media/rc/bpf-rawir-event.c 
>> > b/drivers/media/rc/bpf-rawir-event.c
>> > new file mode 100644
>> > index ..7cb48b8d87b5
>> > --- /dev/null
>> > +++ b/drivers/media/rc/bpf-rawir-event.c
>> > @@ -0,0 +1,363 @@
>> > +// SPDX-License-Identifier: GPL-2.0
>> > +// bpf-rawir-event.c - handles bpf
>> > +//
>> > +// Copyright (C) 2018 Sean Young 
>> > +
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include "rc-core-priv.h"
>> > +
>> > +/*
>> > + * BPF interface for raw IR
>> > + */
>> > +const struct bpf_prog_ops rawir_event_prog_ops = {
>> > +};
>> > +
>> > +BPF_CALL_1(bpf_rc_repeat, struct bpf_rawir_event*, event)
>> > +{
>> > +   struct ir_raw_event_ctrl *ctrl;
>> > +
>> > +   ctrl = container_of(event, struct ir_raw_event_ctrl, 
>> > bpf_rawir_event);
>> > +
>> > +   rc_repeat(ctrl->dev);
>> > +
>> > +   return 0;
>> > +}
>> > +
>> > +static const struct bpf_func_proto rc_repeat_proto = {
>> > +   .func  = bpf_rc_repeat,
>> > +   .gpl_only  = true, /* rc_repeat is EXPORT_SYMBOL_GPL */
>> > 

Re: [PATCH v3 2/2] bpf: add selftest for rawir_event type program

2018-05-17 Thread Y Song
On Wed, May 16, 2018 at 2:04 PM, Sean Young  wrote:
> This is simple test over rc-loopback.
>
> Signed-off-by: Sean Young 
> ---
>  tools/bpf/bpftool/prog.c  |   1 +
>  tools/include/uapi/linux/bpf.h|  57 +++-
>  tools/lib/bpf/libbpf.c|   1 +
>  tools/testing/selftests/bpf/Makefile  |   8 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |   6 +
>  tools/testing/selftests/bpf/test_rawir.sh |  37 +
>  .../selftests/bpf/test_rawir_event_kern.c |  26 
>  .../selftests/bpf/test_rawir_event_user.c | 130 ++
>  8 files changed, 261 insertions(+), 5 deletions(-)
>  create mode 100755 tools/testing/selftests/bpf/test_rawir.sh
>  create mode 100644 tools/testing/selftests/bpf/test_rawir_event_kern.c
>  create mode 100644 tools/testing/selftests/bpf/test_rawir_event_user.c

Could you copy include/uapi/linux/lirc.h file to tools directory as well.
Otherwise, I will get the following compilation error:

gcc -Wall -O2 -I../../../include/uapi -I../../../lib
-I../../../lib/bpf -I../../../../include/generated  -I../../../include
   test_rawir_event_user.c
/home/yhs/work/bpf-next/tools/testing/selftests/bpf/libbpf.a -lcap
-lelf -lrt -lpthread -o
/home/yhs/work/bpf-next/tools/testing/selftests/bpf/test_rawir_event_user
test_rawir_event_user.c: In function ‘main’:
test_rawir_event_user.c:60:15: error: ‘LIRC_MODE_SCANCODE’ undeclared
(first use in this function); did you mean ‘LIRC_MODE_LIRCCODE’?
mode = LIRC_MODE_SCANCODE;
   ^~
   LIRC_MODE_LIRCCODE
test_rawir_event_user.c:60:15: note: each undeclared identifier is
reported only once for each function it appears in
test_rawir_event_user.c:93:29: error: storage size of ‘lsc’ isn’t known
struct lirc_scancode lsc;
 ^~~
test_rawir_event_user.c:93:29: warning: unused variable ‘lsc’
[-Wunused-variable]

>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 9bdfdf2d3fbe..8889a4ee8577 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -71,6 +71,7 @@ static const char * const prog_type_name[] = {
> [BPF_PROG_TYPE_SK_MSG]  = "sk_msg",
> [BPF_PROG_TYPE_RAW_TRACEPOINT]  = "raw_tracepoint",
> [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
> +   [BPF_PROG_TYPE_RAWIR_EVENT] = "rawir_event",
>  };
>
>  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 1205d86a7a29..243e141e8a5b 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -141,6 +141,7 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_SK_MSG,
> BPF_PROG_TYPE_RAW_TRACEPOINT,
> BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> +   BPF_PROG_TYPE_RAWIR_EVENT,
>  };
>
>  enum bpf_attach_type {
> @@ -158,6 +159,7 @@ enum bpf_attach_type {
> BPF_CGROUP_INET6_CONNECT,
> BPF_CGROUP_INET4_POST_BIND,
> BPF_CGROUP_INET6_POST_BIND,
> +   BPF_RAWIR_EVENT,
> __MAX_BPF_ATTACH_TYPE
>  };
>
> @@ -1829,7 +1831,6 @@ union bpf_attr {
>   * Return
>   * 0 on success, or a negative error in case of failure.
>   *
> - *
>   * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, 
> u32 flags)
>   * Description
>   * Do FIB lookup in kernel tables using parameters in *params*.
> @@ -1856,6 +1857,7 @@ union bpf_attr {
>   * Egress device index on success, 0 if packet needs to continue
>   * up the stack for further processing or a negative error in 
> case
>   * of failure.
> + *
>   * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
> *map, void *key, u64 flags)
>   * Description
>   * Add an entry to, or update a sockhash *map* referencing 
> sockets.
> @@ -1902,6 +1904,35 @@ union bpf_attr {
>   * egress otherwise). This is the only flag supported for now.
>   * Return
>   * **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u32 scancode, u32 toggle)
> + * Description
> + * Report decoded scancode with toggle value. For use in
> + * BPF_PROG_TYPE_RAWIR_EVENT, to report a successfully
> + * decoded scancode. This is will generate a keydown event,
> + * and a keyup event once the scancode is no longer repeated.
> + *
> + * *ctx* pointer to bpf_rawir_event, *protocol* is decoded
> + * protocol (see RC_PROTO_* enum).
> + *
> + * Some protocols include a toggle bit, in case the button
> + * was released and pressed again between consecutive scancodes,
> + * copy this bit into *toggle* if it exists, else set to 0.
> + *
> + * Return
> + * Always return 0 (for now)
>

Re: [PATCH v3 1/2] media: rc: introduce BPF_PROG_RAWIR_EVENT

2018-05-17 Thread Y Song
On Wed, May 16, 2018 at 2:04 PM, Sean Young  wrote:
> Add support for BPF_PROG_RAWIR_EVENT. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
>
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.
>
> Signed-off-by: Sean Young 
> ---
>  drivers/media/rc/Kconfig   |  13 ++
>  drivers/media/rc/Makefile  |   1 +
>  drivers/media/rc/bpf-rawir-event.c | 363 +
>  drivers/media/rc/lirc_dev.c|  24 ++
>  drivers/media/rc/rc-core-priv.h|  24 ++
>  drivers/media/rc/rc-ir-raw.c   |  14 +-
>  include/linux/bpf_rcdev.h  |  30 +++
>  include/linux/bpf_types.h  |   3 +
>  include/uapi/linux/bpf.h   |  55 -
>  kernel/bpf/syscall.c   |   7 +
>  10 files changed, 531 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/media/rc/bpf-rawir-event.c
>  create mode 100644 include/linux/bpf_rcdev.h
>
> diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> index eb2c3b6eca7f..2172d65b0213 100644
> --- a/drivers/media/rc/Kconfig
> +++ b/drivers/media/rc/Kconfig
> @@ -25,6 +25,19 @@ config LIRC
>passes raw IR to and from userspace, which is needed for
>IR transmitting (aka "blasting") and for the lirc daemon.
>
> +config BPF_RAWIR_EVENT
> +   bool "Support for eBPF programs attached to lirc devices"
> +   depends on BPF_SYSCALL
> +   depends on RC_CORE=y
> +   depends on LIRC
> +   help
> +  Allow attaching eBPF programs to a lirc device using the bpf(2)
> +  syscall command BPF_PROG_ATTACH. This is supported for raw IR
> +  receivers.
> +
> +  These eBPF programs can be used to decode IR into scancodes, for
> +  IR protocols not supported by the kernel decoders.
> +
>  menuconfig RC_DECODERS
> bool "Remote controller decoders"
> depends on RC_CORE
> diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> index 2e1c87066f6c..74907823bef8 100644
> --- a/drivers/media/rc/Makefile
> +++ b/drivers/media/rc/Makefile
> @@ -5,6 +5,7 @@ obj-y += keymaps/
>  obj-$(CONFIG_RC_CORE) += rc-core.o
>  rc-core-y := rc-main.o rc-ir-raw.o
>  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> +rc-core-$(CONFIG_BPF_RAWIR_EVENT) += bpf-rawir-event.o
>  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
> diff --git a/drivers/media/rc/bpf-rawir-event.c 
> b/drivers/media/rc/bpf-rawir-event.c
> new file mode 100644
> index ..7cb48b8d87b5
> --- /dev/null
> +++ b/drivers/media/rc/bpf-rawir-event.c
> @@ -0,0 +1,363 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// bpf-rawir-event.c - handles bpf
> +//
> +// Copyright (C) 2018 Sean Young 
> +
> +#include 
> +#include 
> +#include 
> +#include "rc-core-priv.h"
> +
> +/*
> + * BPF interface for raw IR
> + */
> +const struct bpf_prog_ops rawir_event_prog_ops = {
> +};
> +
> +BPF_CALL_1(bpf_rc_repeat, struct bpf_rawir_event*, event)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(event, struct ir_raw_event_ctrl, bpf_rawir_event);
> +
> +   rc_repeat(ctrl->dev);
> +
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_repeat_proto = {
> +   .func  = bpf_rc_repeat,
> +   .gpl_only  = true, /* rc_repeat is EXPORT_SYMBOL_GPL */
> +   .ret_type  = RET_INTEGER,
> +   .arg1_type = ARG_PTR_TO_CTX,
> +};
> +
> +BPF_CALL_4(bpf_rc_keydown, struct bpf_rawir_event*, event, u32, protocol,
> +  u32, scancode, u32, toggle)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(event, struct ir_raw_event_ctrl, bpf_rawir_event);
> +
> +   rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
> +
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_keydown_proto = {
> +   .func  = bpf_rc_keydown,
> +   .gpl_only  = true, /* rc_keydown is EXPORT_SYMBOL_GPL */
> +   .ret_type  = RET_INTEGER,
> +   .arg1_type = ARG_PTR_TO_CTX,
> +   .arg2_type = ARG_ANYTHING,
> +   .arg3_type = ARG_ANYTHING,
> +   .arg4_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto *
> +rawir_event_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> +   switch (func_id) {
> +   case BPF_FUNC_rc_repeat:
> +   return &rc_repeat_proto;
> +   case BPF_FUNC_rc_keydown:
> +   return &rc_keydown_proto;
> +   case BPF_FUNC_map_lookup_elem:
> +   return &bpf_map_lookup_elem_proto;
> +   case BPF_FUNC_map_update_elem:
> +   return &bpf_map_update_elem_proto;
> +   case BPF_FUNC_map_delete_elem:
> +   return &bpf_map_delete_elem_proto;
> +   case BPF_FUNC_ktime_get_ns:
> +   return &bpf

Re: [PATCH v1 4/4] samples/bpf: an example of a raw IR decoder

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:11 PM, Sean Young  wrote:
> This implements the grundig-16 IR protocol.
>
> Signed-off-by: Sean Young 
> ---
>  samples/bpf/Makefile  |   4 +
>  samples/bpf/bpf_load.c|   9 +-
>  samples/bpf/grundig_decoder_kern.c| 112 ++
>  samples/bpf/grundig_decoder_user.c|  54 +++
>  tools/bpf/bpftool/prog.c  |   1 +
>  tools/include/uapi/linux/bpf.h|  17 +++-
>  tools/testing/selftests/bpf/bpf_helpers.h |   6 ++
>  7 files changed, 200 insertions(+), 3 deletions(-)
>  create mode 100644 samples/bpf/grundig_decoder_kern.c
>  create mode 100644 samples/bpf/grundig_decoder_user.c
>
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 4d6a6edd4bf6..c6fa111f103a 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -44,6 +44,7 @@ hostprogs-y += xdp_monitor
>  hostprogs-y += xdp_rxq_info
>  hostprogs-y += syscall_tp
>  hostprogs-y += cpustat
> +hostprogs-y += grundig_decoder
>
>  # Libbpf dependencies
>  LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
> @@ -95,6 +96,7 @@ xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
>  xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
>  syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
>  cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
> +grundig_decoder-objs := bpf_load.o $(LIBBPF) grundig_decoder_user.o
>
>  # Tell kbuild to always build the programs
>  always := $(hostprogs-y)
> @@ -148,6 +150,7 @@ always += xdp_rxq_info_kern.o
>  always += xdp2skb_meta_kern.o
>  always += syscall_tp_kern.o
>  always += cpustat_kern.o
> +always += grundig_decoder_kern.o
>
>  HOSTCFLAGS += -I$(objtree)/usr/include
>  HOSTCFLAGS += -I$(srctree)/tools/lib/
> @@ -193,6 +196,7 @@ HOSTLOADLIBES_xdp_monitor += -lelf
>  HOSTLOADLIBES_xdp_rxq_info += -lelf
>  HOSTLOADLIBES_syscall_tp += -lelf
>  HOSTLOADLIBES_cpustat += -lelf
> +HOSTLOADLIBES_grundig_decoder += -lelf
>
>  # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
> cmdline:
>  #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
> CLANG=~/git/llvm/build/bin/clang
> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> index bebe4188b4b3..0fd389e95bb9 100644
> --- a/samples/bpf/bpf_load.c
> +++ b/samples/bpf/bpf_load.c
> @@ -69,6 +69,7 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
> bool is_sockops = strncmp(event, "sockops", 7) == 0;
> bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
> bool is_sk_msg = strncmp(event, "sk_msg", 6) == 0;
> +   bool is_ir_decoder = strncmp(event, "ir_decoder", 10) == 0;
> size_t insns_cnt = size / sizeof(struct bpf_insn);
> enum bpf_prog_type prog_type;
> char buf[256];
> @@ -102,6 +103,8 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
> prog_type = BPF_PROG_TYPE_SK_SKB;
> } else if (is_sk_msg) {
> prog_type = BPF_PROG_TYPE_SK_MSG;
> +   } else if (is_ir_decoder) {
> +   prog_type = BPF_PROG_TYPE_RAWIR_DECODER;
> } else {
> printf("Unknown event '%s'\n", event);
> return -1;
> @@ -116,7 +119,8 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
>
> prog_fd[prog_cnt++] = fd;
>
> -   if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
> +   if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk ||
> +   is_ir_decoder)
> return 0;
>
> if (is_socket || is_sockops || is_sk_skb || is_sk_msg) {
> @@ -607,7 +611,8 @@ static int do_load_bpf_file(const char *path, 
> fixup_map_cb fixup_map)
> memcmp(shname, "cgroup/", 7) == 0 ||
> memcmp(shname, "sockops", 7) == 0 ||
> memcmp(shname, "sk_skb", 6) == 0 ||
> -   memcmp(shname, "sk_msg", 6) == 0) {
> +   memcmp(shname, "sk_msg", 6) == 0 ||
> +   memcmp(shname, "ir_decoder", 10) == 0) {
> ret = load_and_attach(shname, data->d_buf,
>   data->d_size);
> if (ret != 0)
> diff --git a/samples/bpf/grundig_decoder_kern.c 
> b/samples/bpf/grundig_decoder_kern.c
> new file mode 100644
> index ..c80f2c9cc69a
> --- /dev/null
> +++ b/samples/bpf/grundig_decoder_kern.c
> @@ -0,0 +1,112 @@
> +
> +#include 
> +#include 
> +#include "bpf_helpers.h"
> +#include 
> +
> +enum grundig_state {
> +   STATE_INACTIVE,
> +   STATE_HEADER_SPACE,
> +   STATE_LEADING_PULSE,
> +   STATE_BITS_SPACE,
> +   STATE_BITS_PULSE,
> +};
> +
> +struct decoder_state {
> +   u32 bits;
> +   enum grundig_state state;
> +   u32 count;
> +   u32 last_space;
> +};
> +
> +struct bpf_map_def SEC("maps") decoder_state_map 

Re: [PATCH v1 3/4] media: rc bpf: move ir_raw_event to uapi

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:11 PM, Sean Young  wrote:
> The context provided to a BPF_PROG_RAWIR_DECODER is a struct ir_raw_event;
> ensure user space has a a definition.
>
> Signed-off-by: Sean Young 
> ---
>  include/media/rc-core.h| 19 +--
>  include/uapi/linux/bpf_rcdev.h | 24 

Patch #2 already referenced this file. So if Patches #1 and #2
applied, there will be
a compilation error. Could you re-arrange your patchset so that after
sequentially
applying each patch, there is no compilation error?

>  2 files changed, 25 insertions(+), 18 deletions(-)
>  create mode 100644 include/uapi/linux/bpf_rcdev.h
>
> diff --git a/include/media/rc-core.h b/include/media/rc-core.h
> index 6742fd86ff65..5d31e31d8ade 100644
> --- a/include/media/rc-core.h
> +++ b/include/media/rc-core.h
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /**
> @@ -299,24 +300,6 @@ void rc_keydown_notimeout(struct rc_dev *dev, enum 
> rc_proto protocol,
>  void rc_keyup(struct rc_dev *dev);
>  u32 rc_g_keycode_from_table(struct rc_dev *dev, u32 scancode);
>
> -/*
> - * From rc-raw.c
> - * The Raw interface is specific to InfraRed. It may be a good idea to
> - * split it later into a separate header.
> - */
> -struct ir_raw_event {
> -   union {
> -   u32 duration;
> -   u32 carrier;
> -   };
> -   u8  duty_cycle;
> -
> -   unsignedpulse:1;
> -   unsignedreset:1;
> -   unsignedtimeout:1;
> -   unsignedcarrier_report:1;
> -};
> -
>  #define DEFINE_IR_RAW_EVENT(event) struct ir_raw_event event = {}
>
>  static inline void init_ir_raw_event(struct ir_raw_event *ev)
> diff --git a/include/uapi/linux/bpf_rcdev.h b/include/uapi/linux/bpf_rcdev.h
> new file mode 100644
> index ..d8629ff2b960
> --- /dev/null
> +++ b/include/uapi/linux/bpf_rcdev.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Copyright (c) 2018 Sean Young 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +#ifndef _UAPI__LINUX_BPF_RCDEV_H__
> +#define _UAPI__LINUX_BPF_RCDEV_H__
> +
> +struct ir_raw_event {
> +   union {
> +   __u32   duration;
> +   __u32   carrier;
> +   };
> +   __u8duty_cycle;
> +
> +   unsignedpulse:1;
> +   unsignedreset:1;
> +   unsignedtimeout:1;
> +   unsignedcarrier_report:1;
> +};
> +
> +#endif /* _UAPI__LINUX_BPF_RCDEV_H__ */
> --
> 2.17.0
>


Re: [PATCH v1 2/4] media: bpf: allow raw IR decoder bpf programs to be used

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:10 PM, Sean Young  wrote:
> This implements attaching, detaching, querying and execution. The target
> fd has to be the /dev/lircN device.
>
> Signed-off-by: Sean Young 
> ---
>  drivers/media/rc/ir-bpf-decoder.c | 191 ++
>  drivers/media/rc/lirc_dev.c   |  30 +
>  drivers/media/rc/rc-core-priv.h   |  15 +++
>  drivers/media/rc/rc-ir-raw.c  |   5 +
>  include/uapi/linux/bpf.h  |   1 +
>  kernel/bpf/syscall.c  |   7 ++
>  6 files changed, 249 insertions(+)
>
> diff --git a/drivers/media/rc/ir-bpf-decoder.c 
> b/drivers/media/rc/ir-bpf-decoder.c
> index aaa5e208b1a5..651590a14772 100644
> --- a/drivers/media/rc/ir-bpf-decoder.c
> +++ b/drivers/media/rc/ir-bpf-decoder.c
> @@ -91,3 +91,194 @@ const struct bpf_verifier_ops ir_decoder_verifier_ops = {
> .get_func_proto  = ir_decoder_func_proto,
> .is_valid_access = ir_decoder_is_valid_access
>  };
> +
> +#define BPF_MAX_PROGS 64
> +
> +int rc_dev_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog, u32 flags)

flags is not used in this function.

> +{
> +   struct ir_raw_event_ctrl *raw;
> +   struct bpf_prog_array __rcu *old_array;
> +   struct bpf_prog_array *new_array;
> +   int ret;
> +
> +   if (rcdev->driver_type != RC_DRIVER_IR_RAW)
> +   return -EINVAL;
> +
> +   ret = mutex_lock_interruptible(&rcdev->lock);
> +   if (ret)
> +   return ret;
> +
> +   raw = rcdev->raw;
> +
> +   if (raw->progs && bpf_prog_array_length(raw->progs) >= BPF_MAX_PROGS) 
> {
> +   ret = -E2BIG;
> +   goto out;
> +   }
> +
> +   old_array = raw->progs;
> +   ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
> +   if (ret < 0)
> +   goto out;
> +
> +   rcu_assign_pointer(raw->progs, new_array);
> +   bpf_prog_array_free(old_array);
> +out:
> +   mutex_unlock(&rcdev->lock);
> +   return ret;
> +}
> +
> +int rc_dev_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog, u32 flags)

flags is not used in this function.

> +{
> +   struct ir_raw_event_ctrl *raw;
> +   struct bpf_prog_array __rcu *old_array;
> +   struct bpf_prog_array *new_array;
> +   int ret;
> +
> +   if (rcdev->driver_type != RC_DRIVER_IR_RAW)
> +   return -EINVAL;
> +
> +   ret = mutex_lock_interruptible(&rcdev->lock);
> +   if (ret)
> +   return ret;
> +
> +   raw = rcdev->raw;
> +
> +   old_array = raw->progs;
> +   ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array);
> +   if (ret < 0) {
> +   bpf_prog_array_delete_safe(old_array, prog);
> +   } else {
> +   rcu_assign_pointer(raw->progs, new_array);
> +   bpf_prog_array_free(old_array);
> +   }
> +
> +   bpf_prog_put(prog);
> +   mutex_unlock(&rcdev->lock);
> +   return 0;
> +}
> +
> +void rc_dev_bpf_run(struct rc_dev *rcdev)
> +{
> +   struct ir_raw_event_ctrl *raw = rcdev->raw;
> +
> +   if (raw->progs)
> +   BPF_PROG_RUN_ARRAY(raw->progs, &raw->prev_ev, BPF_PROG_RUN);
> +}
> +
> +void rc_dev_bpf_put(struct rc_dev *rcdev)
> +{
> +   struct bpf_prog_array *progs = rcdev->raw->progs;
> +   int i, size;
> +
> +   if (!progs)
> +   return;
> +
> +   size = bpf_prog_array_length(progs);
> +   for (i = 0; i < size; i++)
> +   bpf_prog_put(progs->progs[i]);
> +
> +   bpf_prog_array_free(rcdev->raw->progs);
> +}
> +
> +int rc_dev_prog_attach(const union bpf_attr *attr)
> +{
> +   struct bpf_prog *prog;
> +   struct rc_dev *rcdev;
> +   int ret;
> +
> +   if (attr->attach_flags & BPF_F_ALLOW_OVERRIDE)
> +   return -EINVAL;

Looks like you really did not use flags except here.
BPF_F_ALLOW_OVERRIDE is originally used for
cgroup type of attachment and the comment explicits
saying so.

In the query below, the flags value "0" is copied to userspace.

In your case, I think you can just disallow any value, i.g.,
attr->attach_flags must be 0, and then you further down
check that if the prog is already in the array, you just return an error.

> +
> +   prog = bpf_prog_get_type(attr->attach_bpf_fd,
> +BPF_PROG_TYPE_RAWIR_DECODER);
> +   if (IS_ERR(prog))
> +   return PTR_ERR(prog);
> +
> +   rcdev = rc_dev_get_from_fd(attr->target_fd);
> +   if (IS_ERR(rcdev)) {
> +   bpf_prog_put(prog);
> +   return PTR_ERR(rcdev);
> +   }
> +
> +   ret = rc_dev_bpf_attach(rcdev, prog, attr->attach_flags);
> +   if (ret)
> +   bpf_prog_put(prog);
> +
> +   put_device(&rcdev->dev);
> +
> +   return ret;
> +}
> +
> +int rc_dev_prog_detach(const union bpf_attr *attr)
> +{
> +   struct bpf_prog *prog;
> +   struct rc_dev *rcdev;
> +   int ret;
> +
> +   if (attr->attach_flags & BPF_F_ALLOW_OVERRIDE)
>

Re: [PATCH v1 1/4] media: rc: introduce BPF_PROG_IR_DECODER

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:10 PM, Sean Young  wrote:
> Add support for BPF_PROG_IR_DECODER. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
>
> Signed-off-by: Sean Young 
> ---
>  drivers/media/rc/Kconfig  |  8 +++
>  drivers/media/rc/Makefile |  1 +
>  drivers/media/rc/ir-bpf-decoder.c | 93 +++
>  include/linux/bpf_types.h |  3 +
>  include/uapi/linux/bpf.h  | 16 +-
>  5 files changed, 120 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/media/rc/ir-bpf-decoder.c
>
> diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> index eb2c3b6eca7f..10ad6167d87c 100644
> --- a/drivers/media/rc/Kconfig
> +++ b/drivers/media/rc/Kconfig
> @@ -120,6 +120,14 @@ config IR_IMON_DECODER
>remote control and you would like to use it with a raw IR
>receiver, or if you wish to use an encoder to transmit this IR.
>
> +config IR_BPF_DECODER
> +   bool "Enable IR raw decoder using BPF"
> +   depends on BPF_SYSCALL
> +   depends on RC_CORE=y
> +   help
> +  Enable this option to make it possible to load custom IR
> +  decoders written in BPF.
> +
>  endif #RC_DECODERS
>
>  menuconfig RC_DEVICES
> diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> index 2e1c87066f6c..12e1118430d0 100644
> --- a/drivers/media/rc/Makefile
> +++ b/drivers/media/rc/Makefile
> @@ -5,6 +5,7 @@ obj-y += keymaps/
>  obj-$(CONFIG_RC_CORE) += rc-core.o
>  rc-core-y := rc-main.o rc-ir-raw.o
>  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> +rc-core-$(CONFIG_IR_BPF_DECODER) += ir-bpf-decoder.o
>  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
> diff --git a/drivers/media/rc/ir-bpf-decoder.c 
> b/drivers/media/rc/ir-bpf-decoder.c
> new file mode 100644
> index ..aaa5e208b1a5
> --- /dev/null
> +++ b/drivers/media/rc/ir-bpf-decoder.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// ir-bpf-decoder.c - handles bpf decoders
> +//
> +// Copyright (C) 2018 Sean Young 
> +
> +#include 
> +#include 
> +#include "rc-core-priv.h"
> +
> +/*
> + * BPF interface for raw IR decoder
> + */
> +const struct bpf_prog_ops ir_decoder_prog_ops = {
> +};
> +
> +BPF_CALL_1(bpf_rc_repeat, struct ir_raw_event*, event)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(event, struct ir_raw_event_ctrl, prev_ev);
> +
> +   rc_repeat(ctrl->dev);
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_repeat_proto = {
> +   .func  = bpf_rc_repeat,
> +   .gpl_only  = true, // rc_repeat is EXPORT_SYMBOL_GPL
> +   .ret_type  = RET_VOID,

I suggest using RET_INTEGER here since we do return an integer 0.
RET_INTEGER will also make it easy to extend if you want to return
a non-zero value for error code or other reasons.

> +   .arg1_type = ARG_PTR_TO_CTX,
> +};
> +
> +BPF_CALL_4(bpf_rc_keydown, struct ir_raw_event*, event, u32, protocol,
> +  u32, scancode, u32, toggle)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(event, struct ir_raw_event_ctrl, prev_ev);
> +   rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_keydown_proto = {
> +   .func  = bpf_rc_keydown,
> +   .gpl_only  = true, // rc_keydown is EXPORT_SYMBOL_GPL
> +   .ret_type  = RET_VOID,

ditto. RET_INTEGER is preferable.

> +   .arg1_type = ARG_PTR_TO_CTX,
> +   .arg2_type = ARG_ANYTHING,
> +   .arg3_type = ARG_ANYTHING,
> +   .arg4_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto *ir_decoder_func_proto(enum bpf_func_id 
> func_id, const struct bpf_prog *prog)
> +{
> +   switch (func_id) {
> +   case BPF_FUNC_rc_repeat:
> +   return &rc_repeat_proto;
> +   case BPF_FUNC_rc_keydown:
> +   return &rc_keydown_proto;
> +   case BPF_FUNC_map_lookup_elem:
> +   return &bpf_map_lookup_elem_proto;
> +   case BPF_FUNC_map_update_elem:
> +   return &bpf_map_update_elem_proto;
> +   case BPF_FUNC_map_delete_elem:
> +   return &bpf_map_delete_elem_proto;
> +   case BPF_FUNC_ktime_get_ns:
> +   return &bpf_ktime_get_ns_proto;
> +   case BPF_FUNC_tail_call:
> +   return &bpf_tail_call_proto;
> +   case BPF_FUNC_get_prandom_u32:
> +   return &bpf_get_prandom_u32_proto;
> +   default:
> +   return NULL;
> +   }
> +}
> +
> +static bool ir_decoder_is_valid_access(int off, int size,
> +  enum bpf_access_type type,
> +  const struct bpf_prog *prog,
> +  struct bpf_insn_access_aux *info

Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-14 Thread Y Song
On Sun, May 13, 2018 at 10:33 AM, Alban Crequy  wrote:
> From: Alban Crequy 
>
> bpf_get_current_cgroup_ino() allows BPF trace programs to get the inode
> of the cgroup where the current process resides.
>
> My use case is to get statistics about syscalls done by a specific
> Kubernetes container. I have a tracepoint on raw_syscalls/sys_enter and
> a BPF map containing the cgroup inode that I want to trace. I use
> bpf_get_current_cgroup_ino() and I quickly return from the tracepoint if
> the inode is not in the BPF hash map.

Alternatively, the kernel already has bpf_current_task_under_cgroup helper
which uses a cgroup array to store cgroup fd's. If the current task is
in the hierarchy of a particular cgroup, the helper will return true.

One difference between your helper and bpf_current_task_under_cgroup() is
that your helper tests against a particular cgroup, not including its
children, but
bpf_current_task_under_cgroup() will return true even the task is in a
nested cgroup.

Maybe this will work for you?

>
> Without this BPF helper, I would need to keep track of all pids in the
> container. The Netlink proc connector can be used to follow process
> creation and destruction but it is racy.
>
> This patch only looks at the memory cgroup, which was enough for me
> since each Kubernetes container is placed in a different mem cgroup.
> For a generic implementation, I'm not sure how to proceed: it seems I
> would need to use 'for_each_root(root)' (see example in
> proc_cgroup_show() from kernel/cgroup/cgroup.c) but I don't know if
> taking the cgroup mutex is possible in the BPF helper function. It might
> be ok in the tracepoint raw_syscalls/sys_enter but could the mutex
> already be taken in some other tracepoints?

mutex is not allowed in a helper since it can block.

>
> Signed-off-by: Alban Crequy 
> ---
>  include/uapi/linux/bpf.h | 11 ++-
>  kernel/trace/bpf_trace.c | 25 +
>  2 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec89732a8d..38ac3959cdf3 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -755,6 +755,14 @@ union bpf_attr {
>   * @addr: pointer to struct sockaddr to bind socket to
>   * @addr_len: length of sockaddr structure
>   * Return: 0 on success or negative error code
> + *
> + * u64 bpf_get_current_cgroup_ino(hierarchy, flags)
> + * Get the cgroup{1,2} inode of current task under the specified 
> hierarchy.
> + * @hierarchy: cgroup hierarchy

Not sure what is the value to specify hierarchy here.
A cgroup directory fd?

> + * @flags: reserved for future use
> + * Return:
> + *   == 0 error

looks like < 0 means error.

> + *> 0 inode of the cgroup
   >= 0 means good?
>   */
>  #define __BPF_FUNC_MAPPER(FN)  \
> FN(unspec), \
> @@ -821,7 +829,8 @@ union bpf_attr {
> FN(msg_apply_bytes),\
> FN(msg_cork_bytes), \
> FN(msg_pull_data),  \
> -   FN(bind),
> +   FN(bind),   \
> +   FN(get_current_cgroup_ino),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 56ba0f2a01db..9bf92a786639 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -524,6 +524,29 @@ static const struct bpf_func_proto 
> bpf_probe_read_str_proto = {
> .arg3_type  = ARG_ANYTHING,
>  };
>
> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
> +{
> +   // TODO: pick the correct hierarchy instead of the mem controller
> +   struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
> +
> +   if (unlikely(!cgrp))
> +   return -EINVAL;
> +   if (unlikely(hierarchy))
> +   return -EINVAL;
> +   if (unlikely(flags))
> +   return -EINVAL;
> +
> +   return cgrp->kn->id.ino;
> +}
> +
> +static const struct bpf_func_proto bpf_get_current_cgroup_ino_proto = {
> +   .func   = bpf_get_current_cgroup_ino,
> +   .gpl_only   = false,
> +   .ret_type   = RET_INTEGER,
> +   .arg1_type  = ARG_DONTCARE,
> +   .arg2_type  = ARG_DONTCARE,
> +};
> +
>  static const struct bpf_func_proto *
>  tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -564,6 +587,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct 
> bpf_prog *prog)
> return &bpf_get_prandom_u32_proto;
> case BPF_FUNC_probe_read_str:
> return &bpf_probe_read_str_proto;
> +   case BPF_FUNC_get_current_cgroup_ino:
> +   return &bpf_get_current_cgroup_ino_proto;
> default:
> return NULL;
> }
> --
> 2.14.3
>


Re: possible deadlock in perf_event_detach_bpf_prog

2018-04-08 Thread Y Song
On Thu, Mar 29, 2018 at 2:18 PM, Daniel Borkmann  wrote:
> On 03/29/2018 11:04 PM, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
>> Linux 4.16-rc7
>> syzbot dashboard link: 
>> https://syzkaller.appspot.com/bug?extid=dc5ca0e4c9bfafaf2bae
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output: 
>> https://syzkaller.appspot.com/x/log.txt?id=4742532743299072
>> Kernel config: 
>> https://syzkaller.appspot.com/x/.config?id=-8440362230543204781
>> compiler: gcc (GCC) 7.1.1 20170620
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+dc5ca0e4c9bfafaf2...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for details.
>> If you forward the report, please keep this part and the footer.
>>
>>
>> ==
>> WARNING: possible circular locking dependency detected
>> 4.16.0-rc7+ #3 Not tainted
>> --
>> syz-executor7/24531 is trying to acquire lock:
>>  (bpf_event_mutex){+.+.}, at: [<8a849b07>] 
>> perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
>>
>> but task is already holding lock:
>>  (&mm->mmap_sem){}, at: [<38768f87>] vm_mmap_pgoff+0x198/0x280 
>> mm/util.c:353
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #1 (&mm->mmap_sem){}:
>>__might_fault+0x13a/0x1d0 mm/memory.c:4571
>>_copy_to_user+0x2c/0xc0 lib/usercopy.c:25
>>copy_to_user include/linux/uaccess.h:155 [inline]
>>bpf_prog_array_copy_info+0xf2/0x1c0 kernel/bpf/core.c:1694
>>perf_event_query_prog_array+0x1c7/0x2c0 kernel/trace/bpf_trace.c:891
>
> Looks like we should move the two copy_to_user() outside of
> bpf_event_mutex section to avoid the deadlock.

This is introduced by one of my previous patches. The above suggested fix
makes sense. I will craft a patch and send to the mailing list for bpf branch
soon.

>
>>_perf_ioctl kernel/events/core.c:4750 [inline]
>>perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4770
>>vfs_ioctl fs/ioctl.c:46 [inline]
>>do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>>SYSC_ioctl fs/ioctl.c:701 [inline]
>>SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>>do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x42/0xb7
>>
>> -> #0 (bpf_event_mutex){+.+.}:
>>lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>>perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
>>perf_event_free_bpf_prog kernel/events/core.c:8147 [inline]
>>_free_event+0xbdb/0x10f0 kernel/events/core.c:4116
>>put_event+0x24/0x30 kernel/events/core.c:4204
>>perf_mmap_close+0x60d/0x1010 kernel/events/core.c:5172
>>remove_vma+0xb4/0x1b0 mm/mmap.c:172
>>remove_vma_list mm/mmap.c:2490 [inline]
>>do_munmap+0x82a/0xdf0 mm/mmap.c:2731
>>mmap_region+0x59e/0x15a0 mm/mmap.c:1646
>>do_mmap+0x6c0/0xe00 mm/mmap.c:1483
>>do_mmap_pgoff include/linux/mm.h:2223 [inline]
>>vm_mmap_pgoff+0x1de/0x280 mm/util.c:355
>>SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
>>SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491
>>SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>>SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
>>do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x42/0xb7
>>
>> other info that might help us debug this:
>>
>>  Possible unsafe locking scenario:
>>
>>CPU0CPU1
>>
>>   lock(&mm->mmap_sem);
>>lock(bpf_event_mutex);
>>lock(&mm->mmap_sem);
>>   lock(bpf_event_mutex);
>>
>>  *** DEADLOCK ***
>>
>> 1 lock held by syz-executor7/24531:
>>  #0:  (&mm->mmap_sem){}, at: [<38768f87>] 
>> vm_mmap_pgoff+0x198/0x280 mm/util.c:353
>>
>> stack backtrace:
>> CPU: 0 PID: 24531 Comm: syz-executor7 Not tainted 4.16.0-rc7+ #3
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:17 [inline]
>>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>>  print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
>>  check_prev_add kernel/locking/lockdep.c:1863 [inline]
>>  check_prevs_add kernel/locking/lockdep.c:1976 [inline]
>>  validate_chain kernel/locking/lockdep.c:2417 [inline]
>>  __lock_acquire+0x30a8/0x3e00 kernel/l

Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 4:29 PM, Atish Patra  wrote:
> On 11/07/2017 04:42 PM, Y Song wrote:
>>
>> On Tue, Nov 7, 2017 at 2:04 PM, Alexei Starovoitov  wrote:
>>>
>>> On 11/8/17 6:47 AM, Y Song wrote:
>>>>
>>>>
>>>> On Tue, Nov 7, 2017 at 1:39 PM, Alexei Starovoitov  wrote:
>>>>>
>>>>>
>>>>> On 11/8/17 6:14 AM, Y Song wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
>>>>>>  wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Alexei Starovoitov wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I thought such struct shouldn't change layout.
>>>>>>>>>> If it is we need to fix include/linux/compiler-clang.h to do that
>>>>>>>>>> anon struct as well.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We considered that, but it looked to be very dependent on the
>>>>>>>>> version
>>>>>>>>> of
>>>>>>>>> gcc used to build the kernel. But, this may be a simpler approach
>>>>>>>>> for
>>>>>>>>> the shorter term.
>>>>>>>>>
>>>>>>>>
>>>>>>>> why it would depend on version of gcc?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From what I can see, randomized_struct_fields_start is defined only
>>>>>>> for
>>>>>>> gcc
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> = 4.6. For older versions, it does not get mapped to an anonymous
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> structure. We may not care for older gcc versions, but..
>>>>>>>
>>>>>>> The other issue was that __randomize_layout maps to __designated_init
>>>>>>> when
>>>>>>> randstruct plugin is not enabled, which is in turn an attribute on
>>>>>>> gcc
>>>>>>>>
>>>>>>>> =
>>>>>>>
>>>>>>> v5.1, but not otherwise.
>>>>>>>
>>>>>>>> We just need this, no?
>>>>>>>>
>>>>>>>> diff --git a/include/linux/compiler-clang.h
>>>>>>>> b/include/linux/compiler-clang.h
>>>>>>>> index de179993e039..4e29ab6187cb 100644
>>>>>>>> --- a/include/linux/compiler-clang.h
>>>>>>>> +++ b/include/linux/compiler-clang.h
>>>>>>>> @@ -15,3 +15,6 @@
>>>>>>>>* with any version that can compile the kernel
>>>>>>>>*/
>>>>>>>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>>>>>>>> __COUNTER__)
>>>>>>>> +
>>>>>>>> +#define randomized_struct_fields_start struct {
>>>>>>>> +#define randomized_struct_fields_end   };
>>>>>>>>
>>>>>>>> since offsets are mandated by C standard.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is what we're testing with and is probably sufficient for
>>>>>>> our
>>>>>>> purposes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Just tested this with bcc. bcc actually complains. the rewriter
>>>>>> is not able to rewrite prev->pid where prev is "struct task_struct
>>>>>> *prev".
>>>>>> I will change bcc rewriter to see whether the field value is correct
>>>>>> or
>>>>>> not.
>>>>>>
>>>>>> Not sure my understanding is correct or not, but I am afraid that
>>>>>> the above approach for clang compiler change may not work.
>>>>>> If clang calculates the field offset based on header file, the offset
>>>>>> may not be the same as kernel one
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> why is that?
>>>>> When randomization is off both gcc and clang must generate the same
>>>>> offsets, since it's C standard.
>>>>
>>>>
>>>>
>>>> The patch changed compiler-clang.h, so gcc still do randomization.
>>>
>>>
>>>
>>> gcc_plugins are off by default and randomization will not be
>>> turned on for any sane distro or datacenter that cares about
>>> performance and stability.
>>> So imo above compiler-clang.h patch together with bcc fix would
>>> be enough.
>>
>>
>> Agree that short time the suggested fix should be enough.
>> Long time, disto could become "insane" someday :-)
>>
>
> Yup it works with clang compiler & bcc hack. Thanks :).
> I verified it with task_switch.py & runqlat.py.
>
> Are you going to push the bcc hacks to github repo ?

I will need to have proper implementation than a hack. Yes, once done, will
push into bcc repo. Should be done soon.

>
> Regards,
> Atish


Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 2:04 PM, Alexei Starovoitov  wrote:
> On 11/8/17 6:47 AM, Y Song wrote:
>>
>> On Tue, Nov 7, 2017 at 1:39 PM, Alexei Starovoitov  wrote:
>>>
>>> On 11/8/17 6:14 AM, Y Song wrote:
>>>>
>>>>
>>>> On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
>>>>  wrote:
>>>>>
>>>>>
>>>>> Alexei Starovoitov wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I thought such struct shouldn't change layout.
>>>>>>>> If it is we need to fix include/linux/compiler-clang.h to do that
>>>>>>>> anon struct as well.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We considered that, but it looked to be very dependent on the version
>>>>>>> of
>>>>>>> gcc used to build the kernel. But, this may be a simpler approach for
>>>>>>> the shorter term.
>>>>>>>
>>>>>>
>>>>>> why it would depend on version of gcc?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> From what I can see, randomized_struct_fields_start is defined only for
>>>>> gcc
>>>>>>
>>>>>>
>>>>>> = 4.6. For older versions, it does not get mapped to an anonymous
>>>>>
>>>>>
>>>>> structure. We may not care for older gcc versions, but..
>>>>>
>>>>> The other issue was that __randomize_layout maps to __designated_init
>>>>> when
>>>>> randstruct plugin is not enabled, which is in turn an attribute on gcc
>>>>> >=
>>>>> v5.1, but not otherwise.
>>>>>
>>>>>> We just need this, no?
>>>>>>
>>>>>> diff --git a/include/linux/compiler-clang.h
>>>>>> b/include/linux/compiler-clang.h
>>>>>> index de179993e039..4e29ab6187cb 100644
>>>>>> --- a/include/linux/compiler-clang.h
>>>>>> +++ b/include/linux/compiler-clang.h
>>>>>> @@ -15,3 +15,6 @@
>>>>>>* with any version that can compile the kernel
>>>>>>*/
>>>>>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>>>>>> __COUNTER__)
>>>>>> +
>>>>>> +#define randomized_struct_fields_start struct {
>>>>>> +#define randomized_struct_fields_end   };
>>>>>>
>>>>>> since offsets are mandated by C standard.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Yes, this is what we're testing with and is probably sufficient for our
>>>>> purposes.
>>>>
>>>>
>>>>
>>>> Just tested this with bcc. bcc actually complains. the rewriter
>>>> is not able to rewrite prev->pid where prev is "struct task_struct
>>>> *prev".
>>>> I will change bcc rewriter to see whether the field value is correct or
>>>> not.
>>>>
>>>> Not sure my understanding is correct or not, but I am afraid that
>>>> the above approach for clang compiler change may not work.
>>>> If clang calculates the field offset based on header file, the offset
>>>> may not be the same as kernel one
>>>
>>>
>>>
>>> why is that?
>>> When randomization is off both gcc and clang must generate the same
>>> offsets, since it's C standard.
>>
>>
>> The patch changed compiler-clang.h, so gcc still do randomization.
>
>
> gcc_plugins are off by default and randomization will not be
> turned on for any sane distro or datacenter that cares about
> performance and stability.
> So imo above compiler-clang.h patch together with bcc fix would
> be enough.

Agree that short time the suggested fix should be enough.
Long time, disto could become "insane" someday :-)


Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 1:39 PM, Alexei Starovoitov  wrote:
> On 11/8/17 6:14 AM, Y Song wrote:
>>
>> On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
>>  wrote:
>>>
>>> Alexei Starovoitov wrote:
>>>>
>>>>
>>>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:
>>>>>>
>>>>>>
>>>>>> I thought such struct shouldn't change layout.
>>>>>> If it is we need to fix include/linux/compiler-clang.h to do that
>>>>>> anon struct as well.
>>>>>
>>>>>
>>>>>
>>>>> We considered that, but it looked to be very dependent on the version
>>>>> of
>>>>> gcc used to build the kernel. But, this may be a simpler approach for
>>>>> the shorter term.
>>>>>
>>>>
>>>> why it would depend on version of gcc?
>>>
>>>
>>>
>>> From what I can see, randomized_struct_fields_start is defined only for
>>> gcc
>>>>
>>>> = 4.6. For older versions, it does not get mapped to an anonymous
>>>
>>> structure. We may not care for older gcc versions, but..
>>>
>>> The other issue was that __randomize_layout maps to __designated_init
>>> when
>>> randstruct plugin is not enabled, which is in turn an attribute on gcc >=
>>> v5.1, but not otherwise.
>>>
>>>> We just need this, no?
>>>>
>>>> diff --git a/include/linux/compiler-clang.h
>>>> b/include/linux/compiler-clang.h
>>>> index de179993e039..4e29ab6187cb 100644
>>>> --- a/include/linux/compiler-clang.h
>>>> +++ b/include/linux/compiler-clang.h
>>>> @@ -15,3 +15,6 @@
>>>>* with any version that can compile the kernel
>>>>*/
>>>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>>>> __COUNTER__)
>>>> +
>>>> +#define randomized_struct_fields_start struct {
>>>> +#define randomized_struct_fields_end   };
>>>>
>>>> since offsets are mandated by C standard.
>>>
>>>
>>>
>>> Yes, this is what we're testing with and is probably sufficient for our
>>> purposes.
>>
>>
>> Just tested this with bcc. bcc actually complains. the rewriter
>> is not able to rewrite prev->pid where prev is "struct task_struct *prev".
>> I will change bcc rewriter to see whether the field value is correct or
>> not.
>>
>> Not sure my understanding is correct or not, but I am afraid that
>> the above approach for clang compiler change may not work.
>> If clang calculates the field offset based on header file, the offset
>> may not be the same as kernel one
>
>
> why is that?
> When randomization is off both gcc and clang must generate the same
> offsets, since it's C standard.

The patch changed compiler-clang.h, so gcc still do randomization.

>
> bcc rewriter issue is odd. I suspect it was broken from day one.
> Meaning that bcc didn't support poking into anonymous union and structs.

This seems right.

>
>> I verified that the drawf info with randomized structure config does not
>> match randomized structure member offset. Specifically, I tried
>> linux/proc_ns.h struct proc_ns_operations,
>> dwarf says:
>>   field name: offset 0
>>   field real_ns_name: offset 8
>> But if you print out the real offset at runtime, you get 40 and 16
>> respectively.
>
>
> thanks for confirming. It means that gcc randomization plugin is broken
> and has to be fixed with regard to adjusting debug info while
> randomizing the fields.
>


Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 1:31 PM, Atish Patra  wrote:
> On 11/07/2017 03:14 PM, Y Song wrote:
>>
>> On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
>>  wrote:
>>>
>>> Alexei Starovoitov wrote:
>>>>
>>>>
>>>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:
>>>>>>
>>>>>>
>>>>>> I thought such struct shouldn't change layout.
>>>>>> If it is we need to fix include/linux/compiler-clang.h to do that
>>>>>> anon struct as well.
>>>>>
>>>>>
>>>>>
>>>>> We considered that, but it looked to be very dependent on the version
>>>>> of
>>>>> gcc used to build the kernel. But, this may be a simpler approach for
>>>>> the shorter term.
>>>>>
>>>>
>>>> why it would depend on version of gcc?
>>>
>>>
>>>
>>> From what I can see, randomized_struct_fields_start is defined only for
>>> gcc
>>>>
>>>> = 4.6. For older versions, it does not get mapped to an anonymous
>>>
>>> structure. We may not care for older gcc versions, but..
>>>
>>> The other issue was that __randomize_layout maps to __designated_init
>>> when
>>> randstruct plugin is not enabled, which is in turn an attribute on gcc >=
>>> v5.1, but not otherwise.
>>>
>>>> We just need this, no?
>>>>
>>>> diff --git a/include/linux/compiler-clang.h
>>>> b/include/linux/compiler-clang.h
>>>> index de179993e039..4e29ab6187cb 100644
>>>> --- a/include/linux/compiler-clang.h
>>>> +++ b/include/linux/compiler-clang.h
>>>> @@ -15,3 +15,6 @@
>>>>* with any version that can compile the kernel
>>>>*/
>>>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>>>> __COUNTER__)
>>>> +
>>>> +#define randomized_struct_fields_start struct {
>>>> +#define randomized_struct_fields_end   };
>>>>
>>>> since offsets are mandated by C standard.
>>>
>>>
>>>
>>> Yes, this is what we're testing with and is probably sufficient for our
>>> purposes.
>>
>>
>> Just tested this with bcc. bcc actually complains. the rewriter
>> is not able to rewrite prev->pid where prev is "struct task_struct *prev".
>> I will change bcc rewriter to see whether the field value is correct or
>> not.

Just verified in bcc with the following hack:
--- a/examples/tracing/task_switch.py
+++ b/examples/tracing/task_switch.py
@@ -20,7 +20,8 @@ int count_sched(struct pt_regs *ctx, struct
task_struct *prev) {
   u64 zero = 0, *val;

   key.curr_pid = bpf_get_current_pid_tgid();
-  key.prev_pid = prev->pid;
+  bpf_probe_read(&key.prev_pid, 4, &prev->pid);
+  // key.prev_pid = prev->pid;

   val = stats.lookup_or_init(&key, &zero);
   (*val)++;
diff --git a/src/cc/frontends/clang/b_frontend_action.cc
b/src/cc/frontends/clang/b_frontend_action.cc
index c11d89e..df48115 100644
--- a/src/cc/frontends/clang/b_frontend_action.cc
+++ b/src/cc/frontends/clang/b_frontend_action.cc
@@ -827,6 +827,7 @@ void
BTypeConsumer::HandleTranslationUnit(ASTContext &Context) {
*/
   for (it = DC->decls_begin(); it != DC->decls_end(); it++) {
 Decl *D = *it;
+#if 0
 if (FunctionDecl *F = dyn_cast(D)) {
   if (fe_.is_rewritable_ext_func(F)) {
 for (auto arg : F->parameters()) {
@@ -836,6 +837,7 @@ void
BTypeConsumer::HandleTranslationUnit(ASTContext &Context) {
 probe_visitor_.TraverseDecl(D);
   }
 }
+#endif

Basically, explicitly using bpf_probe_read instead of letting rewriter
to change it.
Also disable probe rewriting part in the frontend.

I confirmed that value of prev->pid is always 0 (wrong).

The linux patch I have is:
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 54dfef70a072..5d9609ea8e30 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -16,3 +16,6 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
+
+#define randomized_struct_fields_start struct {
+#define randomized_struct_fields_end   };

Therefore, getting bpf program to access randomized structure
through either BTF or fixed dwarfinfo may be the best option.
I know dwarfinfo is too big so smaller BTF is more desirable.

>>
>> Not sure my understanding is correct or not, but I am afraid that
>> the above approach for clang compiler change may not work.
>> 

Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
 wrote:
> Alexei Starovoitov wrote:
>>
>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:

 I thought such struct shouldn't change layout.
 If it is we need to fix include/linux/compiler-clang.h to do that
 anon struct as well.
>>>
>>>
>>> We considered that, but it looked to be very dependent on the version of
>>> gcc used to build the kernel. But, this may be a simpler approach for
>>> the shorter term.
>>>
>>
>> why it would depend on version of gcc?
>
>
> From what I can see, randomized_struct_fields_start is defined only for gcc
>>= 4.6. For older versions, it does not get mapped to an anonymous
> structure. We may not care for older gcc versions, but..
>
> The other issue was that __randomize_layout maps to __designated_init when
> randstruct plugin is not enabled, which is in turn an attribute on gcc >=
> v5.1, but not otherwise.
>
>> We just need this, no?
>>
>> diff --git a/include/linux/compiler-clang.h
>> b/include/linux/compiler-clang.h
>> index de179993e039..4e29ab6187cb 100644
>> --- a/include/linux/compiler-clang.h
>> +++ b/include/linux/compiler-clang.h
>> @@ -15,3 +15,6 @@
>>* with any version that can compile the kernel
>>*/
>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>> __COUNTER__)
>> +
>> +#define randomized_struct_fields_start struct {
>> +#define randomized_struct_fields_end   };
>>
>> since offsets are mandated by C standard.
>
>
> Yes, this is what we're testing with and is probably sufficient for our
> purposes.

Just tested this with bcc. bcc actually complains. the rewriter
is not able to rewrite prev->pid where prev is "struct task_struct *prev".
I will change bcc rewriter to see whether the field value is correct or not.

Not sure my understanding is correct or not, but I am afraid that
the above approach for clang compiler change may not work.
If clang calculates the field offset based on header file, the offset
may not be the same as kernel one

I verified that the drawf info with randomized structure config does not
match randomized structure member offset. Specifically, I tried
linux/proc_ns.h struct proc_ns_operations,
dwarf says:
  field name: offset 0
  field real_ns_name: offset 8
But if you print out the real offset at runtime, you get 40 and 16 respectively.

>
> - Naveen
>
>


Re: [iovisor-dev] [PATCH v3 net-next 00/12] bpf: rewrite value tracking in verifier

2017-07-14 Thread Y Song
I did an experiment with one of our internal bpf programs.
The program has 1563 insns.

Without Edward's patch:
processed 13634 insns, stack depth 160

With Edward's patch:
processed 15807 insns, stack depth 160

So the number of processed insns regressed by roughly 16%.
Did anybody do any similar experiments to quantify the patch's
impact in verification performance (e.g., in terms of processed insns)?

On Tue, Jun 27, 2017 at 5:53 AM, Edward Cree via iovisor-dev
 wrote:
> This series simplifies alignment tracking, generalises bounds tracking and
>  fixes some bounds-tracking bugs in the BPF verifier.  Pointer arithmetic on
>  packet pointers, stack pointers, map value pointers and context pointers has
>  been unified, and bounds on these pointers are only checked when the pointer
>  is dereferenced.
> Operations on pointers which destroy all relation to the original pointer
>  (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks,
>  otherwise they convert the pointer to an unknown scalar and feed it to the
>  normal scalar arithmetic handling.
> Pointer types have been unified with the corresponding adjusted-pointer types
>  where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs
>  PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into
>  SCALAR_VALUE.
> Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and
>  PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and
>  a 'variable offset'; the former is used when e.g. adding an immediate or a
>  known-constant register, as long as it does not overflow.  Otherwise the
>  latter is used, and any operation creating a new variable offset creates a
>  new 'id' (and, for PTR_TO_PACKET, clears the 'range').
> SCALAR_VALUEs use the 'variable offset' fields to track the range of possible
>  values; the 'fixed offset' should never be set on a scalar.
>
> As of patch 12/12, all tests of tools/testing/selftests/bpf/test_verifier
>  and tools/testing/selftests/bpf/test_align pass.
>
> v3: added a few more tests; removed RFC tags.
>
> v2: fixed nfp build, made test_align pass again and extended it with a few
>  new tests (though still need to add more).
>
> Edward Cree (12):
>   selftests/bpf: add test for mixed signed and unsigned bounds checks
>   bpf/verifier: rework value tracking
>   nfp: change bpf verifier hooks to match new verifier data structures
>   bpf/verifier: track signed and unsigned min/max values
>   bpf/verifier: more concise register state logs for constant var_off
>   selftests/bpf: change test_verifier expectations
>   selftests/bpf: rewrite test_align
>   selftests/bpf: add a test to test_align
>   selftests/bpf: add test for bogus operations on pointers
>   selftests/bpf: don't try to access past MAX_PACKET_OFF in
> test_verifier
>   selftests/bpf: add tests for subtraction & negative numbers
>   selftests/bpf: variable offset negative tests
>
>  drivers/net/ethernet/netronome/nfp/bpf/verifier.c |   24 +-
>  include/linux/bpf.h   |   34 +-
>  include/linux/bpf_verifier.h  |   56 +-
>  include/linux/tnum.h  |   81 +
>  kernel/bpf/Makefile   |2 +-
>  kernel/bpf/tnum.c |  180 ++
>  kernel/bpf/verifier.c | 1943 
> -
>  tools/testing/selftests/bpf/test_align.c  |  462 -
>  tools/testing/selftests/bpf/test_verifier.c   |  293 ++--
>  9 files changed, 2034 insertions(+), 1041 deletions(-)
>  create mode 100644 include/linux/tnum.h
>  create mode 100644 kernel/bpf/tnum.c
>
> ___
> iovisor-dev mailing list
> iovisor-...@lists.iovisor.org
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev