from:"Wangnan \(F\)"

Re: [RFC PATCH bpf-next] libbpf: make bpf_object__open default to UNSPEC

2018-11-23 Thread Wangnan (F)




On 2018/11/23 5:52, Daniel Borkmann wrote:
> [ +Wang ]
> 
> On 11/22/2018 07:03 AM, Nikita V. Shirokov wrote:
>> currently by default libbpf's bpf_object__open requires
>> bpf's program to specify  version in a code because of two things:
>> 1) default prog type is set to KPROBE
>> 2) KPROBE requires (in kernel/bpf/syscall.c) version to be specified
>>
>> in this RFC i'm proposing change default to UNSPEC and also changing
>> logic of libbpf that it would reflect what we have today in kernel
>> (aka only KPROBE type requires for version to be explicitly set).
>>
>> reason for change:
>> currently only libbpf requires by default version to be
>> explicitly set. it would be really hard for mainteiners of other custom
>> bpf loaders to migrate to libbpf (as they dont control user's code
>> and migration to the new loader (libbpf) wont be transparent for end
>> user).
>>
>> what is going to be broken after this change:
>> if someone were relying on default to be KPROBE for bpf_object__open
>> his code will stop to work. however i'm really doubtfull that anyone
>> is using this for kprobe type of programs (instead of, say, bcc or
>> other tracing frameworks)
>>
>> other possible solutions (for discussion, would require more machinery):
>> add another function like bpf_object__open w/ default to unspec
>>
>> Signed-off-by: Nikita V. Shirokov 
>> ---
>>  tools/lib/bpf/libbpf.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 0f14f7c074c2..ed4212a4c5f9 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -333,7 +333,7 @@ bpf_program__init(void *data, size_t size, char 
>> *section_name, int idx,
>>  prog->idx = idx;
>>  prog->instances.fds = NULL;
>>  prog->instances.nr = -1;
>> -prog->type = BPF_PROG_TYPE_KPROBE;
>> +prog->type = BPF_PROG_TYPE_UNSPEC;
>>  prog->btf_fd = -1;
> 
> Seems this was mostly for historic reasons, but for a generic library this
> would indeed be an odd convention for default. Wang, given 5f44e4c810bf
> ("tools lib bpf: New API to adjust type of a BPF program"), are you in any
> way relying on this default or using things like bpf_program__set_kprobe()
> instead which you've added there? If latter, I'd say we should then change
> it better now than later when there's even more lib usage (and in particular
> before we add official ABI versioning).

OK. I don't rely on that now.

Thank you.

Re: [PATCH v2 net-next 3/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-31 Thread Wangnan (F)




On 2017/4/1 11:18, Alexei Starovoitov wrote:

On 3/31/17 7:29 PM, Wangnan (F) wrote:



On 2017/3/31 12:45, Alexei Starovoitov wrote:

expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
  tools/lib/bpf/libbpf.c | 3 +--
  tools/lib/bpf/libbpf.h | 2 ++
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ac6eb863b2a4..1a2c07eb7795 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1618,8 +1618,7 @@ int bpf_program__nth_fd(struct bpf_program
*prog, int n)
  return fd;
  }
  -static void bpf_program__set_type(struct bpf_program *prog,
-  enum bpf_prog_type type)
+void bpf_program__set_type(struct bpf_program *prog, enum
bpf_prog_type type)
  {
  prog->type = type;
  }


Since it become a public interface, we need to check if prog is a
NULL pointer and check if the value of type is okay, and let it
return an errno.


I strongly disagree with such defensive programming. It's a cause
of bugs for users of this library.
I think you're trying to mimic c++ setters/getters, but c++
never checks 'this != null', since passing null into setter
is a bug of the user of the library and not the library.
The setters also should have 'void' return type when setters
cannot fail. That is exactly the case here.
If, in the future, we decide that this libbpf shouldn't support
all bpf program types then you'd need to change the prototype
of this function to return error code and change all places
where this function is called to check for error code.
It may or may not be the right approach.
For example today the only user of bpf_program__set*() methods
is perf/util/bpf-loader.c and it calls bpf_program__set_kprobe() and
bpf_program__set_tracepoint() _without_ checking the return value
which is _correct_ thing to do. Instead the current prototype of
'int bpf_program__set_tracepoint(struct bpf_program *prog);
is not correct and I suggest you to fix it.

You also need to do other cleanup. Like in bpf_object__elf_finish()
you have:
if (!obj_elf_valid(obj))
return;

if (obj->efile.elf) {

which is redundant. It's another example where mistakes creep in
due to defensive programming.

Another bug in bpf_object__close() which does:
if (!obj)
return;
again defensive programming strikes, since
you're not checking IS_ERR(obj) and that's what bpf_object__open()
returns, so most users of the library (who don't read the source
code and just using it based on .h) will do

obj = bpf_object__open(...);
bpf_object__close(obj);

and current 'if (!obj)' won't help and it will segfault.
I hit this issue will developing this patch set.


diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b30394f9947a..32c7252f734e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include   // for size_t
+#include 
enum libbpf_errno {
  __LIBBPF_ERRNO__START = 4000,
@@ -185,6 +186,7 @@ int bpf_program__set_sched_cls(struct bpf_program
*prog);
  int bpf_program__set_sched_act(struct bpf_program *prog);
  int bpf_program__set_xdp(struct bpf_program *prog);
  int bpf_program__set_perf_event(struct bpf_program *prog);
+void bpf_program__set_type(struct bpf_program *prog, enum
bpf_prog_type type);



The above bpf_program__set_xxx become redundancy. It should be generated
using macro as static inline functions.


  bool bpf_program__is_socket_filter(struct bpf_program *prog);
  bool bpf_program__is_tracepoint(struct bpf_program *prog);


bpf_program__is_xxx should be updated like bpf_program__set_xxx, since
enum bpf_prog_type is not a problem now.


All of these suggestions is a cleanup for your code that you
need to do yourself. I actually suggest you kill all bpf_program__is*()
and all but one bpf_program__set*() functions.
The current user perf/util/bpf-loader.c should be converted
to using bpf_program__set_type() with _void_ return code that
I'm introducing here.

Overall, I think, tools/lib/bpf/ is a nice library and it can be used
by many projects, but I suggest to stop making excuses based on
your proprietary usage of it.

Also please cc me in the future on changes to the library. It still
has my copyrights, though a lot has changed, since last time
I looked at it and it's my fault for not pay attention earlier.



OK. Then let your patch be merged first then let me do the code cleanup.

Thank you.

Re: [PATCH v2 net-next 3/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-31 Thread Wangnan (F)




On 2017/3/31 12:45, Alexei Starovoitov wrote:

expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
  tools/lib/bpf/libbpf.c | 3 +--
  tools/lib/bpf/libbpf.h | 2 ++
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ac6eb863b2a4..1a2c07eb7795 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1618,8 +1618,7 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n)
return fd;
  }
  
-static void bpf_program__set_type(struct bpf_program *prog,

- enum bpf_prog_type type)
+void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type)
  {
prog->type = type;
  }


Since it become a public interface, we need to check if prog is a
NULL pointer and check if the value of type is okay, and let it
return an errno.


diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b30394f9947a..32c7252f734e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include   // for size_t
+#include 
  
  enum libbpf_errno {

__LIBBPF_ERRNO__START = 4000,
@@ -185,6 +186,7 @@ int bpf_program__set_sched_cls(struct bpf_program *prog);
  int bpf_program__set_sched_act(struct bpf_program *prog);
  int bpf_program__set_xdp(struct bpf_program *prog);
  int bpf_program__set_perf_event(struct bpf_program *prog);
+void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type);
  


The above bpf_program__set_xxx become redundancy. It should be generated
using macro as static inline functions.


  bool bpf_program__is_socket_filter(struct bpf_program *prog);
  bool bpf_program__is_tracepoint(struct bpf_program *prog);


bpf_program__is_xxx should be updated like bpf_program__set_xxx, since
enum bpf_prog_type is not a problem now.

Thank you.

Re: [PATCH v2 net-next 3/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-31 Thread Wangnan (F)


Hi Alexei,

Please see the patch I sent. Since we export bpf_program__set_type(),
bpf_program__set_xxx() should be built based on it.

Thank you.

On 2017/3/31 12:45, Alexei Starovoitov wrote:

expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---

Re: [PATCH v2 net-next 2/6] tools/lib/bpf: add support for BPF_PROG_TEST_RUN command

2017-03-30 Thread Wangnan (F)




On 2017/3/31 12:45, Alexei Starovoitov wrote:

add support for BPF_PROG_TEST_RUN command to libbpf.a

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 


Acked-by: Wang Nan 


---
  tools/include/uapi/linux/bpf.h | 24 
  tools/lib/bpf/bpf.c| 24 
  tools/lib/bpf/bpf.h|  4 +++-
  3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1ea08ce35567..a1d95386f562 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -81,6 +81,7 @@ enum bpf_cmd {
BPF_OBJ_GET,
BPF_PROG_ATTACH,
BPF_PROG_DETACH,
+   BPF_PROG_TEST_RUN,
  };
  
  enum bpf_map_type {

@@ -189,6 +190,17 @@ union bpf_attr {
__u32   attach_type;
__u32   attach_flags;
};
+
+   struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
+   __u32   prog_fd;
+   __u32   retval;
+   __u32   data_size_in;
+   __u32   data_size_out;
+   __aligned_u64   data_in;
+   __aligned_u64   data_out;
+   __u32   repeat;
+   __u32   duration;
+   } test;
  } __attribute__((aligned(8)));
  
  /* BPF helper function descriptions:

@@ -459,6 +471,18 @@ union bpf_attr {
   * Return:
   *   > 0 length of the string including the trailing NUL on success
   *   < 0 error
+ *
+ * u64 bpf_bpf_get_socket_cookie(skb)
+ * Get the cookie for the socket stored inside sk_buff.
+ * @skb: pointer to skb
+ * Return: 8 Bytes non-decreasing number on success or 0 if the socket
+ * field is missing inside sk_buff
+ *
+ * u32 bpf_get_socket_uid(skb)
+ * Get the owner uid of the socket stored inside sk_buff.
+ * @skb: pointer to skb
+ * Return: uid of the socket owner on success or 0 if the socket pointer
+ * inside sk_buff is NULL
   */
  #define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9b58d20e8c93..f84c398c11f4 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -209,3 +209,27 @@ int bpf_prog_detach(int target_fd, enum bpf_attach_type 
type)
  
  	return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr));

  }
+
+int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size,
+ void *data_out, __u32 *size_out, __u32 *retval,
+ __u32 *duration)
+{
+   union bpf_attr attr;
+   int ret;
+
+   bzero(&attr, sizeof(attr));
+   attr.test.prog_fd = prog_fd;
+   attr.test.data_in = ptr_to_u64(data);
+   attr.test.data_out = ptr_to_u64(data_out);
+   attr.test.data_size_in = size;
+   attr.test.repeat = repeat;
+
+   ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
+   if (size_out)
+   *size_out = attr.test.data_size_out;
+   if (retval)
+   *retval = attr.test.retval;
+   if (duration)
+   *duration = attr.test.duration;
+   return ret;
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 93f021932623..edb4daeff7a5 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -47,6 +47,8 @@ int bpf_obj_get(const char *pathname);
  int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type,
unsigned int flags);
  int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
-
+int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size,
+ void *data_out, __u32 *size_out, __u32 *retval,
+ __u32 *duration);
  
  #endif

Re: [PATCH net-next 1/6] bpf: introduce BPF_PROG_TEST_RUN command

2017-03-30 Thread Wangnan (F)




On 2017/3/31 11:24, Alexei Starovoitov wrote:

On 3/30/17 8:12 PM, Wangnan (F) wrote:



On 2017/3/31 10:57, Alexei Starovoitov wrote:

On 3/30/17 7:53 PM, Wangnan (F) wrote:

I suggest using a CONFIG option to enable/disable code in
test_run.o to reduce attack plane.


attack plane? what attack do you see and how config helps?



I think all testing features are not required to be compiled
for a production system. A feature which should never be used
looks dangerous to me.


It is required on production system, since xdp testing and
xdp production has to use the same kernel. We cannot
keep rebooting the server back and forth to test and then to run.
It's not testing the kernel features, it's testing bpf programs
which are technically user space components.



Okay. Now I understand it is a production feature.

Thank you.

Re: [PATCH net-next 2/6] tools/lib/bpf: add support for BPF_PROG_TEST_RUN command

2017-03-30 Thread Wangnan (F)




On 2017/3/31 9:31, Alexei Starovoitov wrote:

add support for BPF_PROG_TEST_RUN command to libbpf.a

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
  tools/include/uapi/linux/bpf.h | 24 
  tools/lib/bpf/bpf.c| 24 
  tools/lib/bpf/bpf.h|  4 +++-
  3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1ea08ce35567..a1d95386f562 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -81,6 +81,7 @@ enum bpf_cmd {
BPF_OBJ_GET,
BPF_PROG_ATTACH,
BPF_PROG_DETACH,
+   BPF_PROG_TEST_RUN,
  };
  
  enum bpf_map_type {

@@ -189,6 +190,17 @@ union bpf_attr {
__u32   attach_type;
__u32   attach_flags;
};
+
+   struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
+   __u32   prog_fd;
+   __u32   retval;
+   __u32   data_size_in;
+   __u32   data_size_out;
+   __aligned_u64   data_in;
+   __aligned_u64   data_out;
+   __u32   repeat;
+   __u32   duration;
+   } test;
  } __attribute__((aligned(8)));
  
  /* BPF helper function descriptions:

@@ -459,6 +471,18 @@ union bpf_attr {
   * Return:
   *   > 0 length of the string including the trailing NUL on success
   *   < 0 error
+ *
+ * u64 bpf_bpf_get_socket_cookie(skb)
+ * Get the cookie for the socket stored inside sk_buff.
+ * @skb: pointer to skb
+ * Return: 8 Bytes non-decreasing number on success or 0 if the socket
+ * field is missing inside sk_buff
+ *
+ * u32 bpf_get_socket_uid(skb)
+ * Get the owner uid of the socket stored inside sk_buff.
+ * @skb: pointer to skb
+ * Return: uid of the socket owner on success or 0 if the socket pointer
+ * inside sk_buff is NULL
   */
  #define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9b58d20e8c93..b5ca5277e30c 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -209,3 +209,27 @@ int bpf_prog_detach(int target_fd, enum bpf_attach_type 
type)
  
  	return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr));

  }
+
+int bpf_program_test_run(int prog_fd, int repeat, void *data, __u32 size,
+void *data_out, __u32 *size_out, __u32 *retval,
+__u32 *duration)
+{
+   union bpf_attr attr;
+   int ret;
+
+   bzero(&attr, sizeof(attr));
+   attr.test.prog_fd = prog_fd;
+   attr.test.data_in = ptr_to_u64(data);
+   attr.test.data_out = ptr_to_u64(data_out);
+   attr.test.data_size_in = size;
+   attr.test.repeat = repeat;
+
+   ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
+   if (size_out)
+   *size_out = attr.test.data_size_out;
+   if (retval)
+   *retval = attr.test.retval;
+   if (duration)
+   *duration = attr.test.duration;
+   return ret;
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 93f021932623..adfb320ff21d 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -47,6 +47,8 @@ int bpf_obj_get(const char *pathname);
  int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type,
unsigned int flags);
  int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
-
+int bpf_program_test_run(int prog_fd, int repeat, void *data, __u32 size,
+void *data_out, __u32 *size_out, __u32 *retval,
+__u32 *duration);
  


Please call it bpf_prog_test_run() so it looks uniform with others.

Thank you.

Re: [PATCH net-next 1/6] bpf: introduce BPF_PROG_TEST_RUN command

2017-03-30 Thread Wangnan (F)




On 2017/3/31 10:57, Alexei Starovoitov wrote:

On 3/30/17 7:53 PM, Wangnan (F) wrote:

I suggest using a CONFIG option to enable/disable code in
test_run.o to reduce attack plane.


attack plane? what attack do you see and how config helps?



I think all testing features are not required to be compiled
for a production system. A feature which should never be used
looks dangerous to me.

I suggest adding a CONFIG option like CONFIG_BPF_PROGRAM_TEST_RUN
to control whether the kernel should be compiled with this feature
or not. We can enable by default, and give people a chance to
turn it off. At least in my company people tends to turn all
unneeded features off. If you don't provide a config option they
will make one by themselves.

Thank you.

Re: [PATCH net-next 1/6] bpf: introduce BPF_PROG_TEST_RUN command

2017-03-30 Thread Wangnan (F)




On 2017/3/31 9:31, Alexei Starovoitov wrote:

development and testing of networking bpf programs is quite cumbersome.
Despite availability of user space bpf interpreters the kernel is
the ultimate authority and execution environment.
Current test frameworks for TC include creation of netns, veth,
qdiscs and use of various packet generators just to test functionality
of a bpf program. XDP testing is even more complicated, since
qemu needs to be started with gro/gso disabled and precise queue
configuration, transferring of xdp program from host into guest,
attaching to virtio/eth0 and generating traffic from the host
while capturing the results from the guest.

Moreover analyzing performance bottlenecks in XDP program is
impossible in virtio environment, since cost of running the program
is tiny comparing to the overhead of virtio packet processing,
so performance testing can only be done on physical nic
with another server generating traffic.

Furthermore ongoing changes to user space control plane of production
applications cannot be run on the test servers leaving bpf programs
stubbed out for testing.

Last but not least, the upstream llvm changes are validated by the bpf
backend testsuite which has no ability to test the code generated.

To improve this situation introduce BPF_PROG_TEST_RUN command
to test and performance benchmark bpf programs.

Joint work with Daniel Borkmann.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
  include/linux/bpf.h  |   7 ++
  include/uapi/linux/bpf.h |  12 
  kernel/bpf/syscall.c |  27 +++-
  net/Makefile |   2 +-
  net/bpf/Makefile |   1 +
  net/bpf/test_run.c   | 172 +++
  net/core/filter.c|   5 ++
  7 files changed, 223 insertions(+), 3 deletions(-)
  create mode 100644 net/bpf/Makefile
  create mode 100644 net/bpf/test_run.c



[SNIP]


diff --git a/net/Makefile b/net/Makefile
index 9b681550e3a3..9086ffbb5085 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_NET) += $(tmp-y)
  
  # LLC has to be linked before the files in net/802/

  obj-$(CONFIG_LLC) += llc/
-obj-$(CONFIG_NET)  += ethernet/ 802/ sched/ netlink/
+obj-$(CONFIG_NET)  += ethernet/ 802/ sched/ netlink/ bpf/
  obj-$(CONFIG_NETFILTER)   += netfilter/
  obj-$(CONFIG_INET)+= ipv4/
  obj-$(CONFIG_XFRM)+= xfrm/
diff --git a/net/bpf/Makefile b/net/bpf/Makefile
new file mode 100644
index ..27b2992a0692
--- /dev/null
+++ b/net/bpf/Makefile
@@ -0,0 +1 @@
+obj-y  := test_run.o


I suggest using a CONFIG option to enable/disable code in
test_run.o to reduce attack plane.

Thank you.

Re: [PATCH net-next 4/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-30 Thread Wangnan (F)




On 2017/3/31 10:37, Alexei Starovoitov wrote:

On 3/30/17 7:33 PM, Wangnan (F) wrote:

+void bpf_program__set_type(struct bpf_program *prog, enum
bpf_prog_type type);



This makes libbpf.h depend on uapi/linux/bpf.h (because of enum
bpf_prog_type), which is not always available.

What about defining another enum inside libbpf.h?


how about just including bpf.h? or making it 'int' instead of enum?



Including either kernel header into libbpf.h makes a lot of trouble,
because kernel header and uapi have many other things we don't need
and may conflict with existing code.

Making it 'int' looks like a backdoor. We still need macro to define
each program type.

Thank you.

Re: [PATCH net-next 4/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-30 Thread Wangnan (F)




On 2017/3/31 9:31, Alexei Starovoitov wrote:

expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
  tools/lib/bpf/libbpf.c | 3 +--
  tools/lib/bpf/libbpf.h | 1 +
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ac6eb863b2a4..1a2c07eb7795 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1618,8 +1618,7 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n)
return fd;
  }
  
-static void bpf_program__set_type(struct bpf_program *prog,

- enum bpf_prog_type type)
+void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type)
  {
prog->type = type;
  }
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b30394f9947a..82adde30b696 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -185,6 +185,7 @@ int bpf_program__set_sched_cls(struct bpf_program *prog);
  int bpf_program__set_sched_act(struct bpf_program *prog);
  int bpf_program__set_xdp(struct bpf_program *prog);
  int bpf_program__set_perf_event(struct bpf_program *prog);
+void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type);
  


This makes libbpf.h depend on uapi/linux/bpf.h (because of enum
bpf_prog_type), which is not always available.

What about defining another enum inside libbpf.h?

Thank you.

Re: [PATCH net-next v1] bpf: Remove redundant ifdef

2017-02-14 Thread Wangnan (F)

On 2017/2/15 1:07, David Miller wrote:

From: "Wangnan (F)" 
Date: Mon, 13 Feb 2017 09:53:49 +0800

On 2017/2/12 3:37, Mickaël Salaün wrote:

Remove a useless ifdef __NR_bpf as requested by Wang Nan.

Inline one-line static functions as it was in the bpf_sys.h file.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: Wang Nan 
Link:
https://lkml.kernel.org/r/828ab1ff-4dcf-53ff-c97b-074adb895...@huawei.com
---
   tools/lib/bpf/bpf.c | 12 +++-
   1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 50e04cc5..2de9c386989a 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -42,21 +42,15 @@
   # endif
   #endif
   -static __u64 ptr_to_u64(const void *ptr)
+static inline __u64 ptr_to_u64(const void *ptr)
   {
return (__u64) (unsigned long) ptr;
   }
   -static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
-  unsigned int size)
+static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
+ unsigned int size)
   {
-#ifdef __NR_bpf
return syscall(__NR_bpf, cmd, attr, size);
-#else
-   fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
-   errno = ENOSYS;
-   return -1;
-#endif
   }
 int bpf_create_map(enum bpf_map_type map_type, int key_size,

Acked-by: Wang Nan 

However, it is better to merge this patch with commit
702498a1426bc95b6f49f9c5fba616110cbd3947.

I don't know where this commit ID is.  Since this patch is targetting
net-next I would expect a commit ID with not context to be in that
tree.  Please always specify where the commit ID you mention is.

It is "bpf: Remove bpf_sys.h from selftests" in net-next.

Futhermore, commits in net-next are permanent so it is not possible
afterwards to "merge this patch with commit X".

I understand. Maintainers sometime reset
his/her head to an early version and amend
the commit to make the history clean, but
clearly net-next never do this.

Thank you.

Re: [PATCH net-next v1] bpf: Remove redundant ifdef

2017-02-12 Thread Wangnan (F)




On 2017/2/12 3:37, Mickaël Salaün wrote:

Remove a useless ifdef __NR_bpf as requested by Wang Nan.

Inline one-line static functions as it was in the bpf_sys.h file.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: Wang Nan 
Link: https://lkml.kernel.org/r/828ab1ff-4dcf-53ff-c97b-074adb895...@huawei.com
---
  tools/lib/bpf/bpf.c | 12 +++-
  1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 50e04cc5..2de9c386989a 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -42,21 +42,15 @@
  # endif
  #endif
  
-static __u64 ptr_to_u64(const void *ptr)

+static inline __u64 ptr_to_u64(const void *ptr)
  {
return (__u64) (unsigned long) ptr;
  }
  
-static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,

-  unsigned int size)
+static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
+ unsigned int size)
  {
-#ifdef __NR_bpf
return syscall(__NR_bpf, cmd, attr, size);
-#else
-   fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
-   errno = ENOSYS;
-   return -1;
-#endif
  }
  
  int bpf_create_map(enum bpf_map_type map_type, int key_size,


Acked-by: Wang Nan 

However, it is better to merge this patch with commit 
702498a1426bc95b6f49f9c5fba616110cbd3947.


Thank you.

Re: [PATCH v4 0/3] Miscellaneous fixes for BPF (perf tree)

2017-02-12 Thread Wangnan (F)




On 2017/2/9 4:27, Mickaël Salaün wrote:

This series brings some fixes and small improvements to the BPF samples.

This is intended for the perf tree and apply on 7a5980f9c006 ("tools lib bpf:
Add missing header to the library").

Changes since v3:
* remove applied patch 1/5
* remove patch 2/5 on bpf_load_program() as requested by Wang Nan

Changes since v2:
* add this cover letter

Changes since v1:
* exclude patches not intended for the perf tree

Regards,

Mickaël Salaün (3):
   samples/bpf: Ignore already processed ELF sections
   samples/bpf: Reset global variables
   samples/bpf: Add missing header

  samples/bpf/bpf_load.c | 7 +++
  samples/bpf/tracex5_kern.c | 1 +
  2 files changed, 8 insertions(+)


Looks good to me.

Thank you.

Re: [PATCH net-next v5 04/11] bpf: Use bpf_load_program() from the library

2017-02-09 Thread Wangnan (F)




On 2017/2/10 10:25, Wangnan (F) wrote:



On 2017/2/10 7:21, Mickaël Salaün wrote:

Replace bpf_prog_load() with bpf_load_program() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
  tools/lib/bpf/bpf.c |  6 +++---
  tools/lib/bpf/bpf.h |  4 ++--
  tools/testing/selftests/bpf/Makefile|  4 +++-
  tools/testing/selftests/bpf/bpf_sys.h   | 21 -
  tools/testing/selftests/bpf/test_tag.c  |  6 --
  tools/testing/selftests/bpf/test_verifier.c |  8 +---
  6 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3ddb58a36d3c..58ce252073fa 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -42,7 +42,7 @@
  # endif
  #endif
  -static __u64 ptr_to_u64(void *ptr)
+static __u64 ptr_to_u64(const void *ptr)
  {
  return (__u64) (unsigned long) ptr;
  }
@@ -69,8 +69,8 @@ int bpf_create_map(enum bpf_map_type map_type, int 
key_size,

  return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
  }
  -int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
- size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn 
*insns,

+ size_t insns_cnt, const char *license,
   __u32 kern_version, char *log_buf, size_t log_buf_sz)
  {
  int fd;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a2f9853dd882..bc959a2de023 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -28,8 +28,8 @@ int bpf_create_map(enum bpf_map_type map_type, int 
key_size, int value_size,

/* Recommend log buffer size */
  #define BPF_LOG_BUF_SIZE 65536
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
- size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn 
*insns,

+ size_t insns_cnt, const char *license,
   __u32 kern_version, char *log_buf,
   size_t log_buf_sz);


For libbpf changes:


And for similar code in patch 5-8:

Acked-by Wang Nan 

Thank you.


Acked-by Wang Nan 

Thank you.

Re: [PATCH net-next v5 10/11] bpf: Remove bpf_sys.h from selftests

2017-02-09 Thread Wangnan (F)




On 2017/2/10 7:21, Mickaël Salaün wrote:

Add require dependency headers.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
  tools/lib/bpf/bpf.c |  6 ++
  tools/testing/selftests/bpf/bpf_sys.h   | 27 ---
  tools/testing/selftests/bpf/test_lpm_map.c  |  1 -
  tools/testing/selftests/bpf/test_lru_map.c  |  1 -
  tools/testing/selftests/bpf/test_maps.c |  1 -
  tools/testing/selftests/bpf/test_tag.c  |  3 +--
  tools/testing/selftests/bpf/test_verifier.c |  4 ++--
  7 files changed, 9 insertions(+), 34 deletions(-)
  delete mode 100644 tools/testing/selftests/bpf/bpf_sys.h

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index f8a2b7fa7741..50e04cc5 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -50,7 +50,13 @@ static __u64 ptr_to_u64(const void *ptr)
  static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
   unsigned int size)
  {
+#ifdef __NR_bpf
return syscall(__NR_bpf, cmd, attr, size);
+#else
+   fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
+   errno = ENOSYS;
+   return -1;
+#endif
  }
  


We don't need check __NR_bpf again. It has already
been checked at the header of this file:

#ifndef __NR_bpf
# if defined(__i386__)
#  define __NR_bpf 357
# elif defined(__x86_64__)
#  define __NR_bpf 321
# elif defined(__aarch64__)
#  define __NR_bpf 280
# else
#  error __NR_bpf not defined. libbpf does not support your arch.
# endif
#endif

Thank you.

Re: [PATCH net-next v5 04/11] bpf: Use bpf_load_program() from the library

2017-02-09 Thread Wangnan (F)




On 2017/2/10 7:21, Mickaël Salaün wrote:

Replace bpf_prog_load() with bpf_load_program() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
  tools/lib/bpf/bpf.c |  6 +++---
  tools/lib/bpf/bpf.h |  4 ++--
  tools/testing/selftests/bpf/Makefile|  4 +++-
  tools/testing/selftests/bpf/bpf_sys.h   | 21 -
  tools/testing/selftests/bpf/test_tag.c  |  6 --
  tools/testing/selftests/bpf/test_verifier.c |  8 +---
  6 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3ddb58a36d3c..58ce252073fa 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -42,7 +42,7 @@
  # endif
  #endif
  
-static __u64 ptr_to_u64(void *ptr)

+static __u64 ptr_to_u64(const void *ptr)
  {
return (__u64) (unsigned long) ptr;
  }
@@ -69,8 +69,8 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
  }
  
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,

-size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+size_t insns_cnt, const char *license,
 __u32 kern_version, char *log_buf, size_t log_buf_sz)
  {
int fd;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a2f9853dd882..bc959a2de023 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -28,8 +28,8 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, 
int value_size,
  
  /* Recommend log buffer size */

  #define BPF_LOG_BUF_SIZE 65536
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
-size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+size_t insns_cnt, const char *license,
 __u32 kern_version, char *log_buf,
 size_t log_buf_sz);
  


For libbpf changes:

Acked-by Wang Nan 

Thank you.

Re: [PATCH v2 1/5] bpf: Add missing header to the library

2017-02-07 Thread Wangnan (F)


Please add me into the cc list of all of the 5 patches.

Thank you.

On 2017/2/7 4:40, Mickaël Salaün wrote:

Include stddef.h to define size_t.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Daniel Borkmann 
Cc: Wang Nan 
---
  tools/lib/bpf/bpf.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a2f9853dd882..df6e186da788 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -22,6 +22,7 @@
  #define __BPF_BPF_H
  
  #include 

+#include 
  
  int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,

   int max_entries, __u32 map_flags);

Re: [PATCH v3 1/5] bpf: Add missing header to the library

2017-02-07 Thread Wangnan (F)




On 2017/2/8 4:56, Mickaël Salaün wrote:

Include stddef.h to define size_t.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Daniel Borkmann 
Cc: Wang Nan 
---
  tools/lib/bpf/bpf.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a2f9853dd882..df6e186da788 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -22,6 +22,7 @@
  #define __BPF_BPF_H
  
  #include 

+#include 
  
  int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,

   int max_entries, __u32 map_flags);

Looks good to me.

Thank you.

Re: [PATCH v3 2/5] bpf: Simplify bpf_load_program() error handling in the library

2017-02-07 Thread Wangnan (F)




On 2017/2/8 4:56, Mickaël Salaün wrote:

Do not call a second time bpf(2) when a program load failed.


BPF_PROG_LOAD should success most of the time. Setting log_level to
0 by default and require log buffer when failure can make it faster
in normal case.

Thank you.


Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Daniel Borkmann 
Cc: Wang Nan 
---
  tools/lib/bpf/bpf.c | 18 ++
  1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3ddb58a36d3c..fda3f494f1cd 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -73,7 +73,6 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn 
*insns,
 size_t insns_cnt, char *license,
 __u32 kern_version, char *log_buf, size_t log_buf_sz)
  {
-   int fd;
union bpf_attr attr;
  
  	bzero(&attr, sizeof(attr));

@@ -81,20 +80,15 @@ int bpf_load_program(enum bpf_prog_type type, struct 
bpf_insn *insns,
attr.insn_cnt = (__u32)insns_cnt;
attr.insns = ptr_to_u64(insns);
attr.license = ptr_to_u64(license);
-   attr.log_buf = ptr_to_u64(NULL);
-   attr.log_size = 0;
-   attr.log_level = 0;
+   attr.log_buf = ptr_to_u64(log_buf);
+   attr.log_size = log_buf_sz;
attr.kern_version = kern_version;
  
-	fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));

-   if (fd >= 0 || !log_buf || !log_buf_sz)
-   return fd;
+   if (log_buf && log_buf_sz > 0) {
+   attr.log_level = 1;
+   log_buf[0] = 0;
+   }
  
-	/* Try again with log */

-   attr.log_buf = ptr_to_u64(log_buf);
-   attr.log_size = log_buf_sz;
-   attr.log_level = 1;
-   log_buf[0] = 0;
return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
  }

Re: [PATCHv2 perf/core 5/7] tools lib bpf: Add bpf_program__pin()

2017-01-24 Thread Wangnan (F)




On 2017/1/25 9:16, Joe Stringer wrote:

On 24 January 2017 at 17:06, Wangnan (F)  wrote:


On 2017/1/25 9:04, Wangnan (F) wrote:



On 2017/1/23 9:11, Joe Stringer wrote:

Add a new API to pin a BPF program to the filesystem. The user can
specify the path full path within a BPF filesystem to pin the program.
Programs with multiple instances are pinned as 'foo', 'foo_1', 'foo_2',
and so on.

Signed-off-by: Joe Stringer 
---
v2: Don't automount BPF filesystem
  Split program, map, object pinning into separate APIs and separate
  patches.
---


[SNIP]


+int bpf_program__pin(struct bpf_program *prog, const char *path)


In your next patch please let caller select one instance:

int bpf_program__pin(struct bpf_program *prog, int instance, const char
*path)
(please choose a better name)

Then your can wrap it with another function to pin all instances,
implement
naming schema (%s_%d) there.

Then implement naming schema in bpf_object__pin like:

%s/objectname_mapname
%s/objectname_progname_%d


Is it possible to use directory tree instead?

%s/object/mapname
%s/object/prog/instance

I don't think objects have names, so let's assume an object with two
program instances named foo, and one map named bar.

A call of bpf_object__pin(obj, "/sys/fs/bpf/myobj") would mount with
the following files and directories:
/sys/fs/bpf/myobj/foo/1
/sys/fs/bpf/myobj/foo/2
/sys/fs/bpf/myobj/bar

Alternatively, if you want to control exactly where you want the
progs/maps to be pinned, you can call eg
bpf_program__pin_instance(prog, "/sys/fs/bpf/wherever", 0) and that
instance will be mounted to /sys/fs/bpf/wherever, or alternatively
bpf_program__pin(prog, "/sys/fs/bpf/foo"), and you will end up with
/sys/fs/bpf/foo/{0,1}.

This looks pretty reasonable to me.


It looks good to me.

Thank you.

Re: [PATCHv2 perf/core 5/7] tools lib bpf: Add bpf_program__pin()

2017-01-24 Thread Wangnan (F)




On 2017/1/25 9:04, Wangnan (F) wrote:



On 2017/1/23 9:11, Joe Stringer wrote:

Add a new API to pin a BPF program to the filesystem. The user can
specify the path full path within a BPF filesystem to pin the program.
Programs with multiple instances are pinned as 'foo', 'foo_1', 'foo_2',
and so on.

Signed-off-by: Joe Stringer 
---
v2: Don't automount BPF filesystem
 Split program, map, object pinning into separate APIs and separate
 patches.
---


[SNIP]


+int bpf_program__pin(struct bpf_program *prog, const char *path)


In your next patch please let caller select one instance:

int bpf_program__pin(struct bpf_program *prog, int instance, const 
char *path)

(please choose a better name)

Then your can wrap it with another function to pin all instances, 
implement

naming schema (%s_%d) there.

Then implement naming schema in bpf_object__pin like:

%s/objectname_mapname
%s/objectname_progname_%d



Is it possible to use directory tree instead?

%s/object/mapname
%s/object/prog/instance

Thank you.

Re: [PATCHv2 perf/core 5/7] tools lib bpf: Add bpf_program__pin()

2017-01-24 Thread Wangnan (F)




On 2017/1/23 9:11, Joe Stringer wrote:

Add a new API to pin a BPF program to the filesystem. The user can
specify the path full path within a BPF filesystem to pin the program.
Programs with multiple instances are pinned as 'foo', 'foo_1', 'foo_2',
and so on.

Signed-off-by: Joe Stringer 
---
v2: Don't automount BPF filesystem
 Split program, map, object pinning into separate APIs and separate
 patches.
---
  tools/lib/bpf/libbpf.c | 76 ++
  tools/lib/bpf/libbpf.h |  1 +
  2 files changed, 77 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e6cd62b1264b..eea5c74808f7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4,6 +4,7 @@
   * Copyright (C) 2013-2015 Alexei Starovoitov 
   * Copyright (C) 2015 Wang Nan 
   * Copyright (C) 2015 Huawei Inc.
+ * Copyright (C) 2017 Nicira, Inc.
   *
   * This program is free software; you can redistribute it and/or
   * modify it under the terms of the GNU Lesser General Public
@@ -22,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -31,7 +33,10 @@
  #include 
  #include 
  #include 
+#include 
  #include 
+#include 
+#include 
  #include 
  #include 
  
@@ -1237,6 +1242,77 @@ int bpf_object__load(struct bpf_object *obj)

return err;
  }
  
+static int check_path(const char *path)

+{
+   struct statfs st_fs;
+   char *dname, *dir;
+   int err = 0;
+
+   if (path == NULL)
+   return -EINVAL;
+
+   dname = strdup(path);
+   dir = dirname(dname);
+   if (statfs(dir, &st_fs)) {
+   pr_warning("failed to statfs %s: %s\n", dir, strerror(errno));
+   err = -errno;
+   }
+   free(dname);
+
+   if (!err && st_fs.f_type != BPF_FS_MAGIC) {
+   pr_warning("specified path %s is not on BPF FS\n", path);
+   err = -EINVAL;
+   }
+
+   return err;
+}
+
+int bpf_program__pin(struct bpf_program *prog, const char *path)


In your next patch please let caller select one instance:

int bpf_program__pin(struct bpf_program *prog, int instance, const char 
*path)

(please choose a better name)

Then your can wrap it with another function to pin all instances, implement
naming schema (%s_%d) there.

Then implement naming schema in bpf_object__pin like:

%s/objectname_mapname
%s/objectname_progname_%d

and make a bpf_{map,program}__pin the raw pinning functions with no 
naming schema.


Thank you.

Re: [PATCHv3 perf/core 5/7] samples/bpf: Switch over to libbpf

2016-12-08 Thread Wangnan (F)




On 2016/12/9 13:04, Wangnan (F) wrote:



On 2016/12/9 10:46, Joe Stringer wrote:

[SNIP]


  diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 62d89d50fcbd..616bd55f3be8 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -149,6 +149,8 @@ CMD_TARGETS = $(LIB_FILE)
TARGETS = $(CMD_TARGETS)
  +libbpf: all
+


Why we need this? I tested this patch without it and it seems to work, 
and

this line causes an extra error:
 $ pwd
 /home/wn/kernel/tools/lib/bpf
 $ make libbpf
 ...
 gcc -g -Wall -DHAVE_LIBELF_MMAP_SUPPORT -DHAVE_ELF_GETPHDRNUM_SUPPORT 
-Wbad-function-cast -Wdeclaration-after-statement -Wformat-security 
-Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes 
-Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked 
-Wredundant-decls -Wshadow -Wstrict-aliasing=3 -Wstrict-prototypes 
-Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat 
-Werror -Wall -fPIC -I. -I/home/wn/kernel-hydrogen/tools/include 
-I/home/wn/kernel-hydrogen/tools/arch/x86/include/uapi 
-I/home/wn/kernel-hydrogen/tools/include/uapilibbpf.c all   -o libbpf

 gcc: error: all: No such file or directory
 make: *** [libbpf] Error 1

Thank you.


It is not 'caused' by your patch. 'make libbpf' fails without
your change because it tries to build an executable from
libbpf.c, but main() is missing.

I think libbpf should never be used as a make target. Your
new dependency looks strange.

Thank you.

Re: [PATCHv3 perf/core 5/7] samples/bpf: Switch over to libbpf

2016-12-08 Thread Wangnan (F)




On 2016/12/9 10:46, Joe Stringer wrote:

Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.

Signed-off-by: Joe Stringer 
---
v3: First post.
---
  samples/bpf/Makefile   |  60 +-
  samples/bpf/README.rst |   4 +-
  samples/bpf/libbpf.c   | 111 -
  samples/bpf/libbpf.h   |  19 +
  tools/lib/bpf/Makefile |   2 +
  5 files changed, 38 insertions(+), 158 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 72c58675973e..c8f7ed37b2de 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -29,35 +29,38 @@ hostprogs-y += trace_event
  hostprogs-y += sampleip
  hostprogs-y += tc_l2_redirect
  
-test_verifier-objs := test_verifier.o libbpf.o

-test_maps-objs := test_maps.o libbpf.o
-sock_example-objs := sock_example.o libbpf.o
-fds_example-objs := bpf_load.o libbpf.o fds_example.o
-sockex1-objs := bpf_load.o libbpf.o sockex1_user.o
-sockex2-objs := bpf_load.o libbpf.o sockex2_user.o
-sockex3-objs := bpf_load.o libbpf.o sockex3_user.o
-tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
-tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
-tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
-tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
-tracex5-objs := bpf_load.o libbpf.o tracex5_user.o
-tracex6-objs := bpf_load.o libbpf.o tracex6_user.o
-test_probe_write_user-objs := bpf_load.o libbpf.o test_probe_write_user_user.o
-trace_output-objs := bpf_load.o libbpf.o trace_output_user.o
-lathist-objs := bpf_load.o libbpf.o lathist_user.o
-offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
-spintest-objs := bpf_load.o libbpf.o spintest_user.o
-map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
-test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
-test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
-xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
+# Libbpf dependencies
+LIBBPF := libbpf.o ../../tools/lib/bpf/bpf.o
+
+test_verifier-objs := test_verifier.o $(LIBBPF)
+test_maps-objs := test_maps.o $(LIBBPF)
+sock_example-objs := sock_example.o $(LIBBPF)
+fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o
+sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o
+sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o
+sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o
+tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o
+tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o
+tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
+tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
+tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
+tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
+trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
+lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
+offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o
+spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o
+map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o
+test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
+test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
+xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
  # reuse xdp1 source intentionally
-xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
-test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \
+xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) \
   test_current_task_under_cgroup_user.o
-trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
-sampleip-objs := bpf_load.o libbpf.o sampleip_user.o
-tc_l2_redirect-objs := bpf_load.o libbpf.o tc_l2_redirect_user.o
+trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o
+sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o
+tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o
  
  # Tell kbuild to always build the programs

  always := $(hostprogs-y)
@@ -89,7 +92,7 @@ always += test_current_task_under_cgroup_kern.o
  always += trace_event_kern.o
  always += sampleip_kern.o
  
-HOSTCFLAGS += -I$(objtree)/usr/include

+HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/
  


Should use srctree like this:

+HOSTCFLAGS += -I$(objtree)/usr/include -I$(srctree)/tools/lib/

Or you will see following failure when doing off-tree build:

 $ mkdir buildkernel
 $ cd buildkernel
 $ make -C ../ O=`pwd` menuconfig
 $ make -j64
 $ make samples/bpf/
  CHK include/config/kernel.release
  Using .. as source for kernel
  GEN ./Makefile
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALL../scripts/checksyscalls.sh
  HOSTCC  samples/bpf/test_verifier.o
 In file included from ../samples/bpf/test_verifier.c:20:0:
 .

Re: [PATCHv3 perf/core 3/7] tools lib bpf: Add flags to bpf_create_map()

2016-12-08 Thread Wangnan (F)




On 2016/12/9 10:46, Joe Stringer wrote:

The map_flags argument to bpf_create_map() was previously not exposed.
By exposing it, users can access flags such as whether or not to
preallocate the map.

Signed-off-by: Joe Stringer 


Please mention commit 6c90598174322b029e40dd84a4eb01f56afe in
commit message:

Commit 6c905981743 ("bpf: pre-allocate hash map elements") introduces
map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new
parameter in libbpf.

Acked-by: Wang Nan 


---
v3: Split from "tools lib bpf: Sync with samples/bpf/libbpf".
---
  tools/lib/bpf/bpf.c| 3 ++-
  tools/lib/bpf/bpf.h| 2 +-
  tools/lib/bpf/libbpf.c | 3 ++-
  3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 89e8e8e5b60e..d0afb26c2e0f 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -54,7 +54,7 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
  }
  
  int bpf_create_map(enum bpf_map_type map_type, int key_size,

-  int value_size, int max_entries)
+  int value_size, int max_entries, __u32 map_flags)
  {
union bpf_attr attr;
  
@@ -64,6 +64,7 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,

attr.key_size = key_size;
attr.value_size = value_size;
attr.max_entries = max_entries;
+   attr.map_flags = map_flags;
  
  	return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));

  }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 61130170a6ad..7fcdce16fd62 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -24,7 +24,7 @@
  #include 
  
  int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,

-  int max_entries);
+  int max_entries, __u32 map_flags);
  
  /* Recommend log buffer size */

  #define BPF_LOG_BUF_SIZE 65536
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2e974593f3e8..84e6b35da4bd 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -854,7 +854,8 @@ bpf_object__create_maps(struct bpf_object *obj)
*pfd = bpf_create_map(def->type,
  def->key_size,
  def->value_size,
- def->max_entries);
+ def->max_entries,
+ 0);
if (*pfd < 0) {
size_t j;
int err = *pfd;

Re: [PATCHv3 perf/core 2/7] tools lib bpf: use __u32 from linux/types.h

2016-12-08 Thread Wangnan (F)




On 2016/12/9 10:46, Joe Stringer wrote:

Fixes the following issue when building without access to 'u32' type:

./tools/lib/bpf/bpf.h:27:23: error: unknown type name ‘u32’

Signed-off-by: Joe Stringer 
---
v3: Split from "tools lib bpf: Sync with samples/bpf/libbpf"
---
  tools/lib/bpf/bpf.c | 4 ++--
  tools/lib/bpf/bpf.h | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)


Acked-by: Wang Nan 


diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 8143536b462a..89e8e8e5b60e 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -70,7 +70,7 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
  
  int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,

 size_t insns_cnt, char *license,
-u32 kern_version, char *log_buf, size_t log_buf_sz)
+__u32 kern_version, char *log_buf, size_t log_buf_sz)
  {
int fd;
union bpf_attr attr;
@@ -98,7 +98,7 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn 
*insns,
  }
  
  int bpf_map_update_elem(int fd, void *key, void *value,

-   u64 flags)
+   __u64 flags)
  {
union bpf_attr attr;
  
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h

index 253c3dbb06b4..61130170a6ad 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -30,11 +30,11 @@ int bpf_create_map(enum bpf_map_type map_type, int 
key_size, int value_size,
  #define BPF_LOG_BUF_SIZE 65536
  int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
 size_t insns_cnt, char *license,
-u32 kern_version, char *log_buf,
+__u32 kern_version, char *log_buf,
 size_t log_buf_sz);
  
  int bpf_map_update_elem(int fd, void *key, void *value,

-   u64 flags);
+   __u64 flags);
  
  int bpf_map_lookup_elem(int fd, void *key, void *value);

  int bpf_map_delete_elem(int fd, void *key);

Re: [PATCHv2 perf/core 2/2] tools lib bpf: Sync with samples/bpf/libbpf

2016-11-16 Thread Wangnan (F)




On 2016/11/17 10:46, Joe Stringer wrote:

On 16 November 2016 at 18:10, Wangnan (F)  wrote:

I'm also working on improving bpf.c. Please have a look at:

https://lkml.org/lkml/2016/11/14/1078

Since bpf.c is simple, I think we can add more functions and fixes
gradually, instead of a full copy.

See my inline comment below.

Ah, I missed this, my apologies. It looks like it will provide much of
what I need, I can reassess this patch with your series in mind.

One comment though for your patch (I don't have the original thread to
respond to unfortunately): The map_pin and map_get functions in your
patch series can be used to pin progs too, so maybe there is a better
name? You'll see that this patch uses bpf_obj_{pin,get}() - although I
wouldn't want those to be confused with the libbpf.c objects so maybe
there's a clearer name that could be used.

I also have some patches to rework the samples/bpf/* code to use
libbpf instead of the sample code that is there, is it worth me
submitting that? It will need to wait for your patch to go in, plus a
merge with davem's tree.


I'm happy to see you are trying to replace samples/bpf 's own
libbpf with tools/lib/bpf. I think you can submit your work
on libbpf and patches on samples together if they are ready.
Arnaldo can pick up patches good for him, and you can improve
other patches based on his newest branch.

Thank you.

Re: [PATCHv2 perf/core 2/2] tools lib bpf: Sync with samples/bpf/libbpf

2016-11-16 Thread Wangnan (F)




On 2016/11/17 10:46, Joe Stringer wrote:

On 16 November 2016 at 18:10, Wangnan (F)  wrote:

I'm also working on improving bpf.c. Please have a look at:

https://lkml.org/lkml/2016/11/14/1078

Since bpf.c is simple, I think we can add more functions and fixes
gradually, instead of a full copy.

See my inline comment below.

Ah, I missed this, my apologies. It looks like it will provide much of
what I need, I can reassess this patch with your series in mind.

One comment though for your patch (I don't have the original thread to
respond to unfortunately): The map_pin and map_get functions in your
patch series can be used to pin progs too, so maybe there is a better
name? You'll see that this patch uses bpf_obj_{pin,get}() - although I
wouldn't want those to be confused with the libbpf.c objects so maybe
there's a clearer name that could be used.


The full thread can be found:

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1272045.html

(lkml.kernel.org is not working for me, sorry)

In that patch set, bpf_map_pin/get is linked into perf hooks, so BPF script
can pin a map to sysfs. I think this feature would be useful, but I don't
have an example to show how to use it. I didn't provide program pinning/get
interface because in perf hook they are not useful. After rethinking your
suggestion now I think it is okay to provide bpf_obj_{pin,get} in bpf.c
and export bpf_map_pin to perf hook. I'll adjust my own patch.


I also have some patches to rework the samples/bpf/* code to use
libbpf instead of the sample code that is there, is it worth me
submitting that? It will need to wait for your patch to go in, plus a
merge with davem's tree.


On 2016/11/17 1:43, Joe Stringer wrote:


[SNIP]


   /*
@@ -53,24 +60,71 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr
*attr,
 return syscall(__NR_bpf, cmd, attr, size);
   }
   -int bpf_create_map(enum bpf_map_type map_type, int key_size,
-  int value_size, int max_entries)
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int
value_size,
+  int max_entries, int map_flags)
   {
-   union bpf_attr attr;
+   union bpf_attr attr = {
+   .map_type = map_type,
+   .key_size = key_size,
+   .value_size = value_size,
+   .max_entries = max_entries,
+   .map_flags = map_flags,
+   };
   - memset(&attr, '\0', sizeof(attr));
+   return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}



I lost map_flags in original bpf.c. Thanks to your patch. map_flags is
useful
when creating BPF_MAP_TYPE_PERCPU_HASH: BPF_F_NO_PREALLOC is meanful in this
case.

Do you want me to resubmit this piece as a separate patch or will you
address this?


Please send it.

Thank you.

Re: [PATCHv2 perf/core 2/2] tools lib bpf: Sync with samples/bpf/libbpf

2016-11-16 Thread Wangnan (F)


I'm also working on improving bpf.c. Please have a look at:

https://lkml.org/lkml/2016/11/14/1078

Since bpf.c is simple, I think we can add more functions and fixes
gradually, instead of a full copy.

See my inline comment below.

On 2016/11/17 1:43, Joe Stringer wrote:

Extend the tools/ version of libbpf to include all of the functionality
provided in the samples/bpf version.

Signed-off-by: Joe Stringer 
---
v2: Don't shift non-bpf changes across.
 Various type cleanups, removal of extraneous declarations
---
  tools/lib/bpf/bpf.c| 107 --
  tools/lib/bpf/bpf.h| 202 +++--
  tools/lib/bpf/libbpf.c |   3 +-
  3 files changed, 279 insertions(+), 33 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4212ed62235b..5e061851ac00 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -20,10 +20,17 @@
   */
  
  #include 

-#include 
+#include 
  #include 
  #include 
+#include 
+#include 
  #include 
+#include 
+#include 
+#include 
+#include 
+#include 
  #include "bpf.h"
  


Why we need these network related headers?


  /*
@@ -53,24 +60,71 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
return syscall(__NR_bpf, cmd, attr, size);
  }
  
-int bpf_create_map(enum bpf_map_type map_type, int key_size,

-  int value_size, int max_entries)
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+  int max_entries, int map_flags)
  {
-   union bpf_attr attr;
+   union bpf_attr attr = {
+   .map_type = map_type,
+   .key_size = key_size,
+   .value_size = value_size,
+   .max_entries = max_entries,
+   .map_flags = map_flags,
+   };
  
-	memset(&attr, '\0', sizeof(attr));

+   return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}
  


I lost map_flags in original bpf.c. Thanks to your patch. map_flags is 
useful

when creating BPF_MAP_TYPE_PERCPU_HASH: BPF_F_NO_PREALLOC is meanful in this
case.

Although it is okay in samples, I still prefer a explicit bzero() or 
memset(),
because kernel checks if unused field in this union is zero. However 
I'll check

c standard to see how unused field would be initialized.





diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index e8ba54087497..4dba36995771 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -23,16 +23,202 @@
  
  #include 
  
+struct bpf_insn;

+
  int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
-  int max_entries);
+  int max_entries, int map_flags);
+int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags);
+int bpf_lookup_elem(int fd, void *key, void *value);
+int bpf_delete_elem(int fd, void *key);
+int bpf_get_next_key(int fd, void *key, void *next_key);
+
+int bpf_load_program(enum bpf_prog_type prog_type,
+const struct bpf_insn *insns, int insn_len,
+const char *license, int kern_version,
+char *log_buf, size_t log_buf_sz);
+
+int bpf_obj_pin(int fd, const char *pathname);
+int bpf_obj_get(const char *pathname);
  
-/* Recommend log buffer size */

  #define BPF_LOG_BUF_SIZE 65536
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
-size_t insns_cnt, char *license,
-u32 kern_version, char *log_buf,
-size_t log_buf_sz);
  
-int bpf_map_update_elem(int fd, void *key, void *value,

-   u64 flags);
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC)\
+   ((struct bpf_insn) {\
+   .code  = BPF_ALU64 | BPF_OP(OP) | BPF_X,\
+   .dst_reg = DST, \
+   .src_reg = SRC, \
+   .off   = 0, \
+   .imm   = 0 })
+


Should we define these macros here? They are in include/linux/filter.h
and duplicated in tools/include/linux/filter.h. Redefining them here
would cause conflict.

Thank you.

Re: [PATCHv2 perf/core 1/2] tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h

2016-11-16 Thread Wangnan (F)




On 2016/11/17 1:43, Joe Stringer wrote:

The tools version of this header is out of date; update it to the latest
version from the kernel headers.

Signed-off-by: Joe Stringer 
---
v2: No change.
---
  tools/include/uapi/linux/bpf.h | 51 ++
  1 file changed, 51 insertions(+)


Acked-by: Wang Nan 


diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9e5fc168c8a3..f09c70b97eca 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -95,6 +95,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SCHED_ACT,
BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_XDP,
+   BPF_PROG_TYPE_PERF_EVENT,
  };
  
  #define BPF_PSEUDO_MAP_FD	1

@@ -375,6 +376,56 @@ enum bpf_func_id {
 */
BPF_FUNC_probe_write_user,
  
+	/**

+* bpf_current_task_under_cgroup(map, index) - Check cgroup2 membership 
of current task
+* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
+* @index: index of the cgroup in the bpf_map
+* Return:
+*   == 0 current failed the cgroup2 descendant test
+*   == 1 current succeeded the cgroup2 descendant test
+*< 0 error
+*/
+   BPF_FUNC_current_task_under_cgroup,
+
+   /**
+* bpf_skb_change_tail(skb, len, flags)
+* The helper will resize the skb to the given new size,
+* to be used f.e. with control messages.
+* @skb: pointer to skb
+* @len: new skb length
+* @flags: reserved
+* Return: 0 on success or negative error
+*/
+   BPF_FUNC_skb_change_tail,
+
+   /**
+* bpf_skb_pull_data(skb, len)
+* The helper will pull in non-linear data in case the
+* skb is non-linear and not all of len are part of the
+* linear section. Only needed for read/write with direct
+* packet access.
+* @skb: pointer to skb
+* @len: len to make read/writeable
+* Return: 0 on success or negative error
+*/
+   BPF_FUNC_skb_pull_data,
+
+   /**
+* bpf_csum_update(skb, csum)
+* Adds csum into skb->csum in case of CHECKSUM_COMPLETE.
+* @skb: pointer to skb
+* @csum: csum to add
+* Return: csum on success or negative error
+*/
+   BPF_FUNC_csum_update,
+
+   /**
+* bpf_set_hash_invalid(skb)
+* Invalidate current skb>hash.
+* @skb: pointer to skb
+*/
+   BPF_FUNC_set_hash_invalid,
+
__BPF_FUNC_MAX_ID,
  };

Re: [PATCH 7/8] tools lib bpf: fix maps resolution

2016-11-08 Thread Wangnan (F)


Hi Eric,

During testing this patch I find a segfault, please see inline comment.

In addition, since both the BPF map array and map names should be done
after symbol table is collected, merging bpf_object__init_maps and
bpf_object__init_maps_name would be a good practice, making code
simpler.

So I prepare a new patch. Please have a look at:

http://lkml.kernel.org/g/20161108215734.28905-1-wangn...@huawei.com

New version ensure not crashing in any case user provides a corrupted
maps section, including array of bpf maps, maps with different definition
structures and very short map definition.

Thank you.

On 2016/10/16 14:18, Eric Leblond wrote:

It is not correct to assimilate the elf data of the maps section
to an array of map definition. In fact the sizes differ. The
offset provided in the symbol section has to be used instead.

This patch fixes a bug causing a elf with two maps not to load
correctly.

Signed-off-by: Eric Leblond 
---
  tools/lib/bpf/libbpf.c | 50 +++---
  1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1fe4532..f72628b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -186,6 +186,7 @@ struct bpf_program {
  struct bpf_map {
int fd;
char *name;
+   size_t offset;
struct bpf_map_def def;
void *priv;
bpf_map_clear_priv_t clear_priv;
@@ -529,13 +530,6 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,
  
  	pr_debug("maps in %s: %zd bytes\n", obj->path, size);
  
-	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));

-   if (!obj->maps) {
-   pr_warning("alloc maps for object failed\n");
-   return -ENOMEM;
-   }
-   obj->nr_maps = nr_maps;
-
for (i = 0; i < nr_maps; i++) {
struct bpf_map_def *def = &obj->maps[i].def;
  
@@ -547,23 +541,42 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,

obj->maps[i].fd = -1;
  
  		/* Save map definition into obj->maps */

-   *def = ((struct bpf_map_def *)data)[i];
+   *def = *(struct bpf_map_def *)(data + obj->maps[i].offset);
}


Here, nr_maps is still size / sizeof(struct bpf_map_def), so obj->maps[i]
can be invalid.


return 0;
  }
  
  static int

-bpf_object__init_maps_name(struct bpf_object *obj)
+bpf_object__init_maps_symbol(struct bpf_object *obj)
  {
int i;
+   int nr_maps = 0;
Elf_Data *symbols = obj->efile.symbols;
+   size_t map_idx = 0;
  
  	if (!symbols || obj->efile.maps_shndx < 0)

return -EINVAL;
  
+	/* get the number of maps */

+   for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+   GElf_Sym sym;
+
+   if (!gelf_getsym(symbols, i, &sym))
+   continue;
+   if (sym.st_shndx != obj->efile.maps_shndx)
+   continue;
+   nr_maps++;
+   }
+
+   obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
+   if (!obj->maps) {
+   pr_warning("alloc maps for object failed\n");
+   return -ENOMEM;
+   }
+   obj->nr_maps = nr_maps;
+
for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
GElf_Sym sym;
-   size_t map_idx;
const char *map_name;
  
  		if (!gelf_getsym(symbols, i, &sym))

@@ -574,12 +587,12 @@ bpf_object__init_maps_name(struct bpf_object *obj)
map_name = elf_strptr(obj->efile.elf,
  obj->efile.strtabidx,
  sym.st_name);
-   map_idx = sym.st_value / sizeof(struct bpf_map_def);
if (map_idx >= obj->nr_maps) {
pr_warning("index of map \"%s\" is buggy: %zu > %zu\n",
   map_name, map_idx, obj->nr_maps);
continue;
}
+   obj->maps[map_idx].offset = sym.st_value;
obj->maps[map_idx].name = strdup(map_name);
if (!obj->maps[map_idx].name) {
pr_warning("failed to alloc map name\n");
@@ -587,6 +600,7 @@ bpf_object__init_maps_name(struct bpf_object *obj)
}
pr_debug("map %zu is \"%s\"\n", map_idx,
 obj->maps[map_idx].name);
+   map_idx++;
}
return 0;
  }
@@ -647,8 +661,6 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
data->d_buf,
data->d_size);
else if (strcmp(name, "maps") == 0) {
-   err = bpf_object__init_maps(obj, data->d_buf,
-   data->d_size);
obj->efile.maps_shndx = idx;
} else if (sh.sh_type == S

Re: [PATCH 7/8] tools lib bpf: fix maps resolution

2016-11-07 Thread Wangnan (F)


Hi Eric,

Are you still working in this patch set?

Now I know why maps section is not a simple array
from a patch set from Joe Stringer:

https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html

So I think this patch is really useful.

Are you going to resend the whole patch set? If not, let me collect
this patch 7/8 into my local code base and send to Arnaldo
with my other patches.

Thank you.

On 2016/10/17 5:18, Eric Leblond wrote:

It is not correct to assimilate the elf data of the maps section
to an array of map definition. In fact the sizes differ. The
offset provided in the symbol section has to be used instead.

This patch fixes a bug causing a elf with two maps not to load
correctly.

Signed-off-by: Eric Leblond 
---
  tools/lib/bpf/libbpf.c | 50 +++---
  1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1fe4532..f72628b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -186,6 +186,7 @@ struct bpf_program {
  struct bpf_map {
int fd;
char *name;
+   size_t offset;
struct bpf_map_def def;
void *priv;
bpf_map_clear_priv_t clear_priv;
@@ -529,13 +530,6 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,
  
  	pr_debug("maps in %s: %zd bytes\n", obj->path, size);
  
-	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));

-   if (!obj->maps) {
-   pr_warning("alloc maps for object failed\n");
-   return -ENOMEM;
-   }
-   obj->nr_maps = nr_maps;
-
for (i = 0; i < nr_maps; i++) {
struct bpf_map_def *def = &obj->maps[i].def;
  
@@ -547,23 +541,42 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,

obj->maps[i].fd = -1;
  
  		/* Save map definition into obj->maps */

-   *def = ((struct bpf_map_def *)data)[i];
+   *def = *(struct bpf_map_def *)(data + obj->maps[i].offset);
}
return 0;
  }
  
  static int

-bpf_object__init_maps_name(struct bpf_object *obj)
+bpf_object__init_maps_symbol(struct bpf_object *obj)
  {
int i;
+   int nr_maps = 0;
Elf_Data *symbols = obj->efile.symbols;
+   size_t map_idx = 0;
  
  	if (!symbols || obj->efile.maps_shndx < 0)

return -EINVAL;
  
+	/* get the number of maps */

+   for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+   GElf_Sym sym;
+
+   if (!gelf_getsym(symbols, i, &sym))
+   continue;
+   if (sym.st_shndx != obj->efile.maps_shndx)
+   continue;
+   nr_maps++;
+   }
+
+   obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
+   if (!obj->maps) {
+   pr_warning("alloc maps for object failed\n");
+   return -ENOMEM;
+   }
+   obj->nr_maps = nr_maps;
+
for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
GElf_Sym sym;
-   size_t map_idx;
const char *map_name;
  
  		if (!gelf_getsym(symbols, i, &sym))

@@ -574,12 +587,12 @@ bpf_object__init_maps_name(struct bpf_object *obj)
map_name = elf_strptr(obj->efile.elf,
  obj->efile.strtabidx,
  sym.st_name);
-   map_idx = sym.st_value / sizeof(struct bpf_map_def);
if (map_idx >= obj->nr_maps) {
pr_warning("index of map \"%s\" is buggy: %zu > %zu\n",
   map_name, map_idx, obj->nr_maps);
continue;
}
+   obj->maps[map_idx].offset = sym.st_value;
obj->maps[map_idx].name = strdup(map_name);
if (!obj->maps[map_idx].name) {
pr_warning("failed to alloc map name\n");
@@ -587,6 +600,7 @@ bpf_object__init_maps_name(struct bpf_object *obj)
}
pr_debug("map %zu is \"%s\"\n", map_idx,
 obj->maps[map_idx].name);
+   map_idx++;
}
return 0;
  }
@@ -647,8 +661,6 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
data->d_buf,
data->d_size);
else if (strcmp(name, "maps") == 0) {
-   err = bpf_object__init_maps(obj, data->d_buf,
-   data->d_size);
obj->efile.maps_shndx = idx;
} else if (sh.sh_type == SHT_SYMTAB) {
if (obj->efile.symbols) {
@@ -698,8 +710,16 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
pr_warning("Corrupted ELF file: index of strtab invalid\n");
return LIBBPF_ERRNO__FORMAT;
}
-   if (obj->efile.

Re: [PATCH 1/8] tools lib bpf: add error functions

2016-10-18 Thread Wangnan (F)




On 2016/10/19 6:52, Joe Stringer wrote:

On 16 October 2016 at 14:18, Eric Leblond  wrote:

The include of err.h is not explicitely needed in exported
functions and it was causing include conflict with some existing
code due to redefining some macros.

To fix this, let's have error handling functions provided by the
library. Furthermore this will allow user to have an homogeneous
API.

Signed-off-by: Eric Leblond 

Does it need to return the error like this or should we just fix up
the bpf_object__open() API to return errors in a simpler form?

There's already libbpf_set_print(...) for outputting errors, is it
reasonable to just change the library to return NULLs in error cases
instead?


Returning error code to caller so caller knows what happen.
Other subsystems in perf also do this.

Perf hides libbpf's error output (make it silent unless -v),
so it needs a way for receiving libbpf's error code.

I think this patch is good, decouple libbpf.h and kernel headers.

Thank you.

Re: [PATCH 7/8] tools lib bpf: fix maps resolution

2016-10-16 Thread Wangnan (F)




On 2016/10/17 5:18, Eric Leblond wrote:

It is not correct to assimilate the elf data of the maps section
to an array of map definition. In fact the sizes differ. The
offset provided in the symbol section has to be used instead.

This patch fixes a bug causing a elf with two maps not to load
correctly.


Could you please give an example so we can understand why
section 'maps' is not an array?

Thank you.

Re: [PATCH 5/8] tools lib bpf: add missing functions

2016-10-16 Thread Wangnan (F)




On 2016/10/17 5:18, Eric Leblond wrote:

Some functions were missing in the library to be able to use it
in the case where the userspace is handling the maps in kernel.

The patch also renames functions to have a homogeneous naming
convention.

Signed-off-by: Eric Leblond 
---
  tools/lib/bpf/bpf.c| 35 ++-
  tools/lib/bpf/bpf.h|  2 --
  tools/lib/bpf/libbpf.h |  5 +
  3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4212ed6..c0e07bd 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include "bpf.h"
+#include "libbpf.h"
  
  /*

   * When building perf, unistd.h is overrided. __NR_bpf is
@@ -97,7 +98,7 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn 
*insns,
return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
  }
  
-int bpf_map_update_elem(int fd, void *key, void *value,

+int bpf_map__update_elem(int fd, void *key, void *value,
u64 flags)


Please don't use '__' style API here. It is easily be confused with
bpf_map__*() in libbpf.h. They are APIs at different level.

bpf_map__*() are APIs for 'struct bpf_map's, they are object introduced
by libbpf, defined in libbpf.h. bpf_map_*() APIs operate on fd, they are
objects defined by kernel. bpf_map_*() APIs are declared in bpf.h.

In libbpf, bpf.h directly operates on kernel objects (fd), APIs in it
are named bpf_map_*(); libbpf.h operates on 'struct bpf_map' object,
APIs in it are named using bpf_map__*(). libbpf.h and bpf.h are independent
with each other.


  {
union bpf_attr attr;
@@ -110,3 +111,35 @@ int bpf_map_update_elem(int fd, void *key, void *value,
  
  	return sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));

  }
+
+int bpf_map__lookup_elem(int fd, void *key, void *value)
+{
+   union bpf_attr attr = {
+   .map_fd = fd,
+   .key = ptr_to_u64(key),
+   .value = ptr_to_u64(value),
+   };
+
+   return sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_map__delete_elem(int fd, void *key)
+{
+   union bpf_attr attr = {
+   .map_fd = fd,
+   .key = ptr_to_u64(key),
+   };
+
+   return sys_bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_map__get_next_key(int fd, void *key, void *next_key)
+{
+   union bpf_attr attr = {
+   .map_fd = fd,
+   .key = ptr_to_u64(key),
+   .next_key = ptr_to_u64(next_key),
+   };
+
+   return sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index e8ba540..5ca834a 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -33,6 +33,4 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn 
*insns,
 u32 kern_version, char *log_buf,
 size_t log_buf_sz);
  
-int bpf_map_update_elem(int fd, void *key, void *value,

-   u64 flags);
  #endif
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index a18783b..dfb46d0 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -207,6 +207,11 @@ bpf_map__next(struct bpf_map *map, struct bpf_object *obj);
  int bpf_map__fd(struct bpf_map *map);
  const struct bpf_map_def *bpf_map__def(struct bpf_map *map);
  const char *bpf_map__name(struct bpf_map *map);
+int bpf_map__update_elem(int fd, void *key, void *value,
+   uint64_t flags);
+int bpf_map__lookup_elem(int fd, void *key, void *value);
+int bpf_map__delete_elem(int fd, void *key);
+int bpf_map__get_next_key(int fd, void *key, void *next_key);


As what we have discussed, the newly introduced functions should be added
in bpf.h.

Thank you.

Re: [PATCH 6/8] tools lib bpf: improve warning

2016-10-16 Thread Wangnan (F)




On 2016/10/17 5:18, Eric Leblond wrote:

Signed-off-by: Eric Leblond 


Please add some commit messages.

Thank you.


---
  tools/lib/bpf/libbpf.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7cd341e..1fe4532 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -802,7 +802,8 @@ bpf_object__create_maps(struct bpf_object *obj)
size_t j;
int err = *pfd;
  
-			pr_warning("failed to create map: %s\n",

+   pr_warning("failed to create map (name: '%s'): %s\n",
+  obj->maps[i].name,
   strerror(errno));
for (j = 0; j < i; j++)
zclose(obj->maps[j].fd);

Re: [PATCH 4/8] tools lib bpf: export function to set type

2016-10-16 Thread Wangnan (F)




On 2016/10/17 5:18, Eric Leblond wrote:

Current API was not allowing the user to set a type like socket
filter. To avoid a setter function for each type, the patch simply
exports a set function that takes the type in parameter.

Signed-off-by: Eric Leblond 
---
  tools/lib/bpf/libbpf.c | 19 +--
  tools/lib/bpf/libbpf.h |  3 +++
  2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 90932f1..7cd341e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1336,26 +1336,25 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n)
return fd;
  }
  
-static void bpf_program__set_type(struct bpf_program *prog,

- enum bpf_prog_type type)
+int bpf_program__set_type(struct bpf_program *prog, unsigned int type)
  {
+   if (!prog)
+   return -EINVAL;
+   if (type >= __MAX_BPF_PROG_TYPE)
+   return -EINVAL;
+
prog->type = type;
+   return 0;
  }
  
  int bpf_program__set_tracepoint(struct bpf_program *prog)

  {
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
-   return 0;
+   return bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
  }
  
  int bpf_program__set_kprobe(struct bpf_program *prog)

  {
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE);
-   return 0;
+   return bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE);
  }
  
  static bool bpf_program__is_type(struct bpf_program *prog,

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e40c8d3..a18783b 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -173,6 +173,9 @@ int bpf_program__set_kprobe(struct bpf_program *prog);
  bool bpf_program__is_tracepoint(struct bpf_program *prog);
  bool bpf_program__is_kprobe(struct bpf_program *prog);
  
+int bpf_program__set_type(struct bpf_program *prog,

+ unsigned int type);
+


Although you don't include uapi/linux/bpf.h in this patch, logically
you add this dependency.

Please continously add bpf_program__set_socket_filter() and
bpf_program__is_socket_filter() like what we do for tracepoint.
This way libbpf.h is indenpendent from kernel header.

We can use macro in both .h and .c.

Thank you.

Re: [PATCH 3/8] tools: Sync tools/include/uapi/linux/bpf.h with the kernel

2016-10-16 Thread Wangnan (F)




On 2016/10/17 5:18, Eric Leblond wrote:

Signed-off-by: Eric Leblond 


Commit message is required.

Thank you.


---
  tools/include/uapi/linux/bpf.h | 52 ++
  1 file changed, 52 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9e5fc16..570287f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -95,6 +95,8 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SCHED_ACT,
BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_XDP,
+   BPF_PROG_TYPE_PERF_EVENT,
+   __MAX_BPF_PROG_TYPE,
  };
  
  #define BPF_PSEUDO_MAP_FD	1

@@ -375,6 +377,56 @@ enum bpf_func_id {
 */
BPF_FUNC_probe_write_user,
  
+	/**

+* bpf_current_task_under_cgroup(map, index) - Check cgroup2 membership 
of current task
+* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
+* @index: index of the cgroup in the bpf_map
+* Return:
+*   == 0 current failed the cgroup2 descendant test
+*   == 1 current succeeded the cgroup2 descendant test
+*< 0 error
+*/
+   BPF_FUNC_current_task_under_cgroup,
+
+   /**
+* bpf_skb_change_tail(skb, len, flags)
+* The helper will resize the skb to the given new size,
+* to be used f.e. with control messages.
+* @skb: pointer to skb
+* @len: new skb length
+* @flags: reserved
+* Return: 0 on success or negative error
+*/
+   BPF_FUNC_skb_change_tail,
+
+   /**
+* bpf_skb_pull_data(skb, len)
+* The helper will pull in non-linear data in case the
+* skb is non-linear and not all of len are part of the
+* linear section. Only needed for read/write with direct
+* packet access.
+* @skb: pointer to skb
+* @len: len to make read/writeable
+* Return: 0 on success or negative error
+*/
+   BPF_FUNC_skb_pull_data,
+
+   /**
+* bpf_csum_update(skb, csum)
+* Adds csum into skb->csum in case of CHECKSUM_COMPLETE.
+* @skb: pointer to skb
+* @csum: csum to add
+* Return: csum on success or negative error
+*/
+   BPF_FUNC_csum_update,
+
+   /**
+* bpf_set_hash_invalid(skb)
+* Invalidate current skb>hash.
+* @skb: pointer to skb
+*/
+   BPF_FUNC_set_hash_invalid,
+
__BPF_FUNC_MAX_ID,
  };

Re: [PATCH 1/8] tools lib bpf: add error functions

2016-10-16 Thread Wangnan (F)




On 2016/10/17 5:18, Eric Leblond wrote:

The include of err.h is not explicitely needed in exported
functions and it was causing include conflict with some existing
code due to redefining some macros.

To fix this, let's have error handling functions provided by the
library. Furthermore this will allow user to have an homogeneous
API.

Signed-off-by: Eric Leblond 
---
  tools/lib/bpf/libbpf.c | 11 +++
  tools/lib/bpf/libbpf.h |  4 +++-
  2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b699aea..90932f1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -31,6 +31,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -1447,3 +1448,13 @@ bpf_object__find_map_by_name(struct bpf_object *obj, const char *name)

}
return NULL;
  }
+
+bool bpf__is_error(const void *ptr)


Please use libbpf_is_error(), like libbpf_set_print. We use '__' because 
we want

to use the OO concept. This utility is not OO.


+{
+   return IS_ERR(ptr);
+}
+
+long bpf__get_error(const void *ptr)


Same, please call it libbpf_get_error().

Thank you.

Re: [PATCH net-next 2/2] tools lib bpf: export function to set type

2016-08-14 Thread Wangnan (F)




On 2016/8/13 20:17, Eric Leblond wrote:

Current API was not allowing the user to set a type like socket
filter. To avoid a setter function for each type, the patch simply
exports a set function that takes the type in parameter.

Signed-off-by: Eric Leblond 
---
  tools/lib/bpf/libbpf.c | 15 ++-
  tools/lib/bpf/libbpf.h |  3 +++
  2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7872ff6..ff2a8c6 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1336,26 +1336,23 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n)
return fd;
  }
  
-static void bpf_program__set_type(struct bpf_program *prog,

+int bpf_program__set_type(struct bpf_program *prog,
  enum bpf_prog_type type)
  {
+   if (!prog)
+   return -EINVAL;
prog->type = type;
+   return 0;
  }
  


We'd better add a sanity check here to prevent setting invalid
program type. Plase have a look at

https://lkml.org/lkml/2015/12/17/13


  int bpf_program__set_tracepoint(struct bpf_program *prog)
  {
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
-   return 0;
+   return bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
  }
  
  int bpf_program__set_kprobe(struct bpf_program *prog)

  {
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE);
-   return 0;
+   return bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE);
  }
  
  static bool bpf_program__is_type(struct bpf_program *prog,

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index a6c5cde..6a84d7a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -173,6 +173,9 @@ int bpf_program__set_kprobe(struct bpf_program *prog);
  bool bpf_program__is_tracepoint(struct bpf_program *prog);
  bool bpf_program__is_kprobe(struct bpf_program *prog);
  
+int bpf_program__set_type(struct bpf_program *prog,

+ enum bpf_prog_type type);
+


This makes libbpf.h depend on kernel's uapi/linux/bpf.h.
Although I did this in the above patch, I don't like this
dependency. This is the reason why I introduce

int bpf_program__set_tracepoint(struct bpf_program *prog);
int bpf_program__set_kprobe(struct bpf_program *prog);

bool bpf_program__is_tracepoint(struct bpf_program *prog);
bool bpf_program__is_kprobe(struct bpf_program *prog);

and hide bpf_program__set_type inside the .c file.

Please add type setting functions, or pass int value to
bpf_program__set_type and add sanity checking.

Thank you.

Re: [PATCH net-next 1/2] tools lib bpf: suppress useless include

2016-08-14 Thread Wangnan (F)




On 2016/8/13 20:17, Eric Leblond wrote:

The include of err.h is not explicitely needed in exported
functions and it was causing include conflict with some existing
code due to redefining some macros.

Signed-off-by: Eric Leblond 
---
  tools/lib/bpf/libbpf.c | 1 +
  tools/lib/bpf/libbpf.h | 1 -
  2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b699aea..7872ff6 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -31,6 +31,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h

index dd7a513..a6c5cde 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -23,7 +23,6 @@
  
  #include 

  #include 
-#include 
  


Functions declared in this header require PTR_ERR
to decode error number. Logically speaking, this
header depends linux/err.h, so this include is
required. If not, there must have another way for
caller to decode error numbers.

I know the problem you try to solve. Please have a look at thread

https://lkml.org/lkml/2015/12/17/20

At that time I think we should create a wrapper in libbpf.h like this:

#ifdef USE_LINUX_ERR
#include 
#endif
...
int libbpf_ptr_err(void *ptr);
...

It is acceptable but ugly. Can you find a better method?

Thank you.


  enum libbpf_errno {
__LIBBPF_ERRNO__START = 4000,

Re: [PATCH net-next 0/2] libbpf: minor fix and API update

2016-08-14 Thread Wangnan (F)


I think you should cc linux-ker...@vger.kernel.org for code in
tools/ .

Thank you.

On 2016/8/13 20:17, Eric Leblond wrote:

Hello,

Here's a small patchset on libbpf fixing two issues I've encountered
when adding some eBPF related features to Suricata.

Patchset statistics:
  tools/lib/bpf/libbpf.c | 16 +++-
  tools/lib/bpf/libbpf.h |  4 +++-
  2 files changed, 10 insertions(+), 10 deletions(-)

BR,
--
Eric Leblond

Re: [RFC PATCH net-next 0/4] perf tools: Support receiving output through BPF programs

2015-10-28 Thread Wangnan (F)




On 2015/10/28 18:55, Wang Nan wrote:

Alexei provided a patchset to allow BPF programs output data to ring
buffer using helper bpf_perf_event_output() [1].  and have been merged
into net-next as commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf:
introduce bpf_perf_event_output() helper).

This patchset introduces perf side code to utilize that helper,

This patchset only supports output data to CTF. It is enough for me
because my workflow is 'perf record' -> 'convert to CTF' -> 'python'.
However, I know some people heavily rely on 'perf script' to parse
trace in perf.data. Here I'd like discuss the way to show output data
using 'perf script'. Currently the only way to watch the output data
using perf script is to use '-D'.

[1] http://lkml.kernel.org/r/1445396556-4854-1-git-send-email-...@kernel.org


This patchset is based on my perf ebpf support patches. Workable
code can be found from github:

https://github.com/WangNan0/linux/commits/ebpf

It should be merged after net-next merged into mainline. I post
them now for demostration and discussion.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-26 Thread Wangnan (F)




On 2015/10/26 20:32, Peter Zijlstra wrote:

On Sun, Oct 25, 2015 at 09:23:36AM -0700, Alexei Starovoitov wrote:

bpf_perf_event_read() muxes of -EINVAL into return value, but it's non
ambiguous to the program whether it got an error or real counter value.

How can that be, the (u64)-EINVAL value is a valid counter value..
unlikely maybe, but still quite possible.
In our real usecase we simply treat return value larger than 
0x7fff
as error result. We can make it even larger, for example, to 
0x.
Resuling values can be pre-processed by a script to filter potential 
error result

out so it is not a very big problem for our real usecases.

For a better interface, I suggest

 u64 bpf_perf_event_read(bool *perror);

which still returns counter value through its return value but put error 
code

to stack. Then BPF program can pass NULL to the function if BPF problem
doesn't want to deal with error code.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/3] bpf: introduce bpf_perf_event_output() helper

2015-10-25 Thread Wangnan (F)




On 2015/10/24 1:25, Alexei Starovoitov wrote:

On 10/23/15 9:42 AM, Peter Zijlstra wrote:

On Fri, Oct 23, 2015 at 08:02:00AM -0700, Alexei Starovoitov wrote:

On 10/23/15 7:39 AM, Peter Zijlstra wrote:

On Tue, Oct 20, 2015 at 08:02:34PM -0700, Alexei Starovoitov wrote:

+static const struct bpf_func_proto bpf_perf_event_output_proto = {
+.func= bpf_perf_event_output,
+.gpl_only= false,

Oh ?


no particular reason. key helper bpf_probe_read() is gpl, so all
bpf for tracing progs have to be gpl.
If you feel strongly about it, I can change it.


All the perf symbols are export GPL, so I suppose this should be true.


ok. will send a patch.



Can we (or have we already) setup some rules for licensing? Which part
should be GPL? Who has the response to decide it?

Thank you.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-22 Thread Wangnan (F)




On 2015/10/23 8:10, Alexei Starovoitov wrote:

Fix safety checks for bpf_perf_event_read():
- only non-inherited events can be added to perf_event_array map
   (do this check statically at map insertion time)
- dynamically check that event is local and !pmu->count
Otherwise buggy bpf program can cause kernel splat.

Also fix error path after perf_event_attrs()
and remove redundant 'extern'.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the 
selected hardware PMU conuter")
Signed-off-by: Alexei Starovoitov 


Tested-by: Wang Nan 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-22 Thread Wangnan (F)




On 2015/10/23 8:10, Alexei Starovoitov wrote:

Fix safety checks for bpf_perf_event_read():
- only non-inherited events can be added to perf_event_array map
   (do this check statically at map insertion time)
- dynamically check that event is local and !pmu->count
Otherwise buggy bpf program can cause kernel splat.

Also fix error path after perf_event_attrs()
and remove redundant 'extern'.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the 
selected hardware PMU conuter")
Signed-off-by: Alexei Starovoitov 
---
v2->v3:
. refactor checks based on Wangnan's and Peter's feedback
while refactoring realized that these two issues need fixes as well:
. fix perf_event_attrs() error path
. remove redundant extern

v1->v2: fix compile in case of !CONFIG_PERF_EVENTS

Even in the worst case the crash is not possible.
Only warn_on_once, so imo net-next is ok.

  include/linux/bpf.h  |1 -
  kernel/bpf/arraymap.c|   25 -
  kernel/trace/bpf_trace.c |7 ++-
  3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e3a51b74e275..75718fa28260 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -194,7 +194,6 @@ extern const struct bpf_func_proto 
bpf_map_lookup_elem_proto;
  extern const struct bpf_func_proto bpf_map_update_elem_proto;
  extern const struct bpf_func_proto bpf_map_delete_elem_proto;
  
-extern const struct bpf_func_proto bpf_perf_event_read_proto;

  extern const struct bpf_func_proto bpf_get_prandom_u32_proto;
  extern const struct bpf_func_proto bpf_get_smp_processor_id_proto;
  extern const struct bpf_func_proto bpf_tail_call_proto;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index e3cfe46b074f..3f4c99e06c6b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -292,16 +292,23 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map, int fd)
  
  	attr = perf_event_attrs(event);

if (IS_ERR(attr))
-   return (void *)attr;
+   goto err;
  
-	if (attr->type != PERF_TYPE_RAW &&

-   !(attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
-   attr->type != PERF_TYPE_HARDWARE) {
-   perf_event_release_kernel(event);
-   return ERR_PTR(-EINVAL);
-   }
-   return event;
+   if (attr->inherit)
+   goto err;
+


Since Peter suggest it is pointless for a system-wide perf_event
has inherit bit set [1], I think it should be safe to enable
system-wide perf_event pass this check?

I'll check code to make sure.

[1] 
http://lkml.kernel.org/r/20151022124142.gq17...@twins.programming.kicks-ass.net



+   if (attr->type == PERF_TYPE_RAW)
+   return event;
+
+   if (attr->type == PERF_TYPE_HARDWARE)
+   return event;
+
+   if (attr->type == PERF_TYPE_SOFTWARE &&
+   attr->config == PERF_COUNT_SW_BPF_OUTPUT)
+   return event;
+err:
+   perf_event_release_kernel(event);
+   return ERR_PTR(-EINVAL);
  }
  
  static void perf_event_fd_array_put_ptr(void *ptr)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 47febbe7998e..003df3887287 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -199,6 +199,11 @@ static u64 bpf_perf_event_read(u64 r1, u64 index, u64 r3, 
u64 r4, u64 r5)
if (!event)
return -ENOENT;
  
+	/* make sure event is local and doesn't have pmu::count */

+   if (event->oncpu != smp_processor_id() ||
+   event->pmu->count)
+   return -EINVAL;
+
/*
 * we don't know if the function is run successfully by the
 * return value. It can be judged in other places, such as
@@ -207,7 +212,7 @@ static u64 bpf_perf_event_read(u64 r1, u64 index, u64 r3, 
u64 r4, u64 r5)
return perf_event_read_local(event);
  }
  
-const struct bpf_func_proto bpf_perf_event_read_proto = {

+static const struct bpf_func_proto bpf_perf_event_read_proto = {
.func   = bpf_perf_event_read,
.gpl_only   = false,
.ret_type   = RET_INTEGER,



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-22 Thread Wangnan (F)




On 2015/10/22 6:58, Alexei Starovoitov wrote:

[SNIP]

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index e3cfe46b074f..75529cc94304 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -294,10 +294,11 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map, int fd)
if (IS_ERR(attr))
return (void *)attr;
  
-	if (attr->type != PERF_TYPE_RAW &&

-   !(attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
-   attr->type != PERF_TYPE_HARDWARE) {
+   if ((attr->type != PERF_TYPE_RAW &&
+!(attr->type == PERF_TYPE_SOFTWARE &&
+  attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+attr->type != PERF_TYPE_HARDWARE) ||
+   attr->inherit) {
perf_event_release_kernel(event);
return ERR_PTR(-EINVAL);
}


I have a question on inherit, not related to this patch:
Is it safe for perf to disable attr->inherit if the event is system wide?
I haven't read relate code completely. In my current knowledge the behavior
of a system wide perf event should be same whether inherit is set or not.
Is that true?

Thank you.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-22 Thread Wangnan (F)




On 2015/10/22 17:06, Peter Zijlstra wrote:

On Wed, Oct 21, 2015 at 02:19:49PM -0700, Alexei Starovoitov wrote:


Urgh, that's still horridly inconsistent. Can we please come up with a
consistent interface to perf?

My suggestion was to do ioctl(enable/disable) of events from userspace
after receiving notification from kernel via my bpf_perf_event_output()
helper.
Wangnan's argument was that display refresh happens often and it's fast,
so the time taken by user space to enable events on all cpus is too
slow and ioctl does ipi and disturbs other cpus even more.
So soft_disable done by the program to enable/disable particular events
on all cpus kinda makes sense.

And this all makes me think I still have no clue what you're all trying
to do here.

Who cares about display updates and why. And why should there be an
active userspace part to eBPF programs?


So you want the background story? OK, let me describe it. This mail is not
short so please be patient.

On a smartphone, if time between two frames is longer than 16ms, user
can aware it. This is a display glitch. We want to check those glitches
with perf to find what cause them. The basic idea is: use 'perf record'
to collect enough information, find those samples generated just before
the glitch form perf.data then analysis them offline.

There are many works need to be done before perf can support such
analysis. One improtant thing is to reduce the overhead from perf to
avoid perf itself become the reason of glitches. We can do this by reduce
the number of events perf collects, but then we won't have enough
information to analysis when glitch happen. Another way we are trying to 
implement

now is to dynamically turn events on and off, or at least enable/disable
sampling dynamically because the overhead of copying those samples
is a big part of perf's total overhead. After that we can trace as many
event as possible, but only fetch data from them when we detect a glitch.

BPF program is the best choice to describe our relative complex glitch
detection model. For simplicity, you can think our glitch detection model
contain 3 points at user programs: point A means display refreshing begin,
point B means display refreshing has last for 16ms, point C means
display refreshing finished. They can be found in user programs and
probed through uprobe, on which we can attach BPF programs on.

Then we want to use perf this way:

Sample 'cycles' event to collect callgraph so we know what all CPUs are
doing during the refreshing, but only start sampling when we detect a
frame refreshing last for 16ms.

We can translate the above logic into BPF programs: at point B, enable
'cycles' event to generate samples; at point C, disable 'cycles' event
to avoid useless samples to be generated.

Then, make 'perf record -e cycles' to collect call graph and other
information through cycles event. From perf.data, we can use 'perf script'
and other tools to analysis resuling data.

We have consider some potential solution and find them inapproate or need
too much work to do:

 1. As you may prefer, create BPF functions to call pmu->stop() /
pmu->start() for perf event on the CPU on which BPF programs get
triggered.

The shortcoming of this method is we can only turn on the perf event on
the CPU execute point B. We are unable to know what other CPU are doing
during glitching. But what we want is system-wide information. In 
addition,
point C and point B are not necessarily be executed at one core, so 
we may
shut down wrong event if scheduler decide to run point C on another 
core.


 2. As Alexei's suggestion, output something through his 
bpf_perf_event_output(),

let perf disable and enable those events using ioctl in userspace.

This is a good idea, but introduces asynchronization problem.
bpf_perf_event_output() output something to perf's ring buffer, but 
perf
get noticed about this message when epoll_wait() return. We tested 
a tracepoint

event and found that an event may awared by perf sereval seconds after
it generated. One solution is to use --no-buffer, but it still need 
perf to be
scheduled on time and parse its ringbuffer quickly. Also, please 
note that point
C is possible to appear very shortly after point B because some 
APPs are optimized

to make their refreshing time very near to 16ms.

This is the background story. Looks like whether we implement some 
corss-CPU controling
or we satisified with coarse grained controlling. The best method we can 
think is to
use atomic operation to soft enable/disable perf events. We believe it 
is simple enough

and won't cause problem.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-22 Thread Wangnan (F)




On 2015/10/22 15:39, Ingo Molnar wrote:

* Wangnan (F)  wrote:


[SNIP]


In summary, your either-or logic doesn't hold in BPF world. A BPF
program can only access perf event in a highly restricted way. We
don't allow it calling perf_event_read_local() across core, so it
can't.

Urgh, that's still horridly inconsistent. Can we please come up with a
consistent interface to perf?

BPF program and kernel module are two different worlds as I said before.

I don't think making them to share a common interface is a good idea because
such sharing will give BPF programs too much freedom than it really need, then
it will be hard prevent them to do something bad. If we really need kernel
interface, I think what we need is kernel module, not BPF program.

What do you mean, as this does not parse for me.


Because I'm not very sure what the meaning of "inconsistent" in
Peter's words...

I think what Peter want us to do is to provide similar (consistent) 
interface

between kernel and eBPF that, if kernel reads from a perf_event through
perf_event_read_local(struct perf_event *), BPF program should
do this work with similar code, or at least similar logic, so
we need to create handler for a perf event, and provide a BPF function
called BPF_FUNC_perf_event_read_local then pass such handler to it.

I don't think like this because if we want kernel interface we'd
better use kernel module, not eBPF so I mentioned kernel module here.

Ingo, do you think BPF inerface should be *consistent* with anything?

Thank you.


We obviously can (and very likely should) make certain perf functionality
available to BPF programs.

It should still be a well defined yet flexible iterface, with safe behavior,
obviously - all in line with existing BPF sandboxing principles.

'Kernel modules' don't enter this consideration at all, not sure why you mention
them - all this functionality is also available if CONFIG_MODULES is turned off
completely.

Thanks,

Ingo




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-22 Thread Wangnan (F)




On 2015/10/22 14:21, Alexei Starovoitov wrote:

On 10/21/15 10:31 PM, Wangnan (F) wrote:

+if ((attr->type != PERF_TYPE_RAW &&
+ !(attr->type == PERF_TYPE_SOFTWARE &&
+   attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+ attr->type != PERF_TYPE_HARDWARE) ||
+attr->inherit) {


This 'if' statement is so complex. What about using a inline function
instead?


hmm. don't see how inline function will help readability.



For example (not tested):

 static inline bool perf_event_can_insert_to_map(struct perf_event_attr 
*attr)

 {
/* is inherit? */
if (attr->inherit)
return false;

/* is software event? */
if (attr->type == PERF_TYPE_SOFTWARE)
if (attr->config == PERF_COUNT_SW_BPF_OUTPUT)
return true;
else
return false;

/* Comment... */
if (attr->type == PERF_TYPE_RAW)
return true;
if (attr->type == PERF_TYPE_HARDWARE)
return true;
return false;
 }

 ...
 if (!perf_event_can_insert_to_map(attr))


Do you think redability is improved?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-21 Thread Wangnan (F)




On 2015/10/22 6:58, Alexei Starovoitov wrote:

Fix safety checks for bpf_perf_event_read():
- only non-inherited events can be added to perf_event_array map
   (do this check statically at map insertion time)
- dynamically check that event is local and !pmu->count
Otherwise buggy bpf program can cause kernel splat.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the 
selected hardware PMU conuter")
Signed-off-by: Alexei Starovoitov 
---
v1->v2: fix compile in case of !CONFIG_PERF_EVENTS

This patch is on top of
http://patchwork.ozlabs.org/patch/533585/
to avoid conflicts.
Even in the worst case the crash is not possible.
Only warn_on_once, so imo net-next is ok.

  kernel/bpf/arraymap.c |9 +
  kernel/events/core.c  |   16 ++--
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index e3cfe46b074f..75529cc94304 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -294,10 +294,11 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map, int fd)
if (IS_ERR(attr))
return (void *)attr;
  
-	if (attr->type != PERF_TYPE_RAW &&

-   !(attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
-   attr->type != PERF_TYPE_HARDWARE) {
+   if ((attr->type != PERF_TYPE_RAW &&
+!(attr->type == PERF_TYPE_SOFTWARE &&
+  attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+attr->type != PERF_TYPE_HARDWARE) ||
+   attr->inherit) {


This 'if' statement is so complex. What about using a inline function 
instead?


Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-21 Thread Wangnan (F)




On 2015/10/22 13:00, Alexei Starovoitov wrote:

On 10/21/15 9:49 PM, Wangnan (F) wrote:

After applying this patch I'm unable to use perf passing perf_event
again like this:


please do not top post and trim your replies.


  # perf record -a -e evt=cycles -e
./test_config_map.c/maps.pmu_map.event=evt/ --exclude-perf ls

With -v it output:

...
adding perf_bpf_probe:func_write
adding perf_bpf_probe:func_write to 0x367d6a0
add bpf event perf_bpf_probe:func_write_return and attach bpf program 6
adding perf_bpf_probe:func_write_return
adding perf_bpf_probe:func_write_return to 0x3a7fc40
mmap size 528384B
ERROR: failed to insert value to pmu_map[0]
ERROR: Apply config to BPF failed: Invalid option for map, add -v to see
detail
Opening /sys/kernel/debug/tracing//kprobe_events write=
...

Looks like perf sets attr.inherit for cycles? I'll look into this 
problem.


yes. that's perf default.
How did it even work before?!
I was testing with your samples/bpf/tracex6 that sets inherit to zero.



Tested perf record -i option and it works for me:

# echo "" > /sys/kernel/debug/tracing/trace
# perf record -i -a -e evt=cycles -e 
./test_config_map.c/maps.pmu_map.event=evt/ --exclude-perf ls

# cat /sys/kernel/debug/tracing/trace  | grep ls
  ls-8227  [001] dN..  2526.184611: : pmu inc: 82270
  ls-8227  [001] dN..  2526.184626: : pmu inc: 40951
  ls-8227  [001] dN..  2526.184642: : pmu inc: 50659
  ls-8227  [001] dN..  2526.184657: : pmu inc: 43511
  ls-8227  [001] dN..  2526.184675: : pmu inc: 56921
...
And no warning message found in dmesg.

So I think your fix is good, we should improve perf.

Thank you.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-21 Thread Wangnan (F)

After applying this patch I'm unable to use perf passing perf_event 
again like this:


 # perf record -a -e evt=cycles -e 
./test_config_map.c/maps.pmu_map.event=evt/ --exclude-perf ls


With -v it output:

...
adding perf_bpf_probe:func_write
adding perf_bpf_probe:func_write to 0x367d6a0
add bpf event perf_bpf_probe:func_write_return and attach bpf program 6
adding perf_bpf_probe:func_write_return
adding perf_bpf_probe:func_write_return to 0x3a7fc40
mmap size 528384B
ERROR: failed to insert value to pmu_map[0]
ERROR: Apply config to BPF failed: Invalid option for map, add -v to see 
detail

Opening /sys/kernel/debug/tracing//kprobe_events write=
...

Looks like perf sets attr.inherit for cycles? I'll look into this problem.

Thank you.

On 2015/10/22 6:58, Alexei Starovoitov wrote:

Fix safety checks for bpf_perf_event_read():
- only non-inherited events can be added to perf_event_array map
   (do this check statically at map insertion time)
- dynamically check that event is local and !pmu->count
Otherwise buggy bpf program can cause kernel splat.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the 
selected hardware PMU conuter")
Signed-off-by: Alexei Starovoitov 
---
v1->v2: fix compile in case of !CONFIG_PERF_EVENTS

This patch is on top of
http://patchwork.ozlabs.org/patch/533585/
to avoid conflicts.
Even in the worst case the crash is not possible.
Only warn_on_once, so imo net-next is ok.

  kernel/bpf/arraymap.c |9 +
  kernel/events/core.c  |   16 ++--
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index e3cfe46b074f..75529cc94304 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -294,10 +294,11 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map, int fd)
if (IS_ERR(attr))
return (void *)attr;
  
-	if (attr->type != PERF_TYPE_RAW &&

-   !(attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
-   attr->type != PERF_TYPE_HARDWARE) {
+   if ((attr->type != PERF_TYPE_RAW &&
+!(attr->type == PERF_TYPE_SOFTWARE &&
+  attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+attr->type != PERF_TYPE_HARDWARE) ||
+   attr->inherit) {
perf_event_release_kernel(event);
return ERR_PTR(-EINVAL);
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 64754bfecd70..0b6333265872 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3258,7 +3258,7 @@ static inline u64 perf_event_count(struct perf_event 
*event)
  u64 perf_event_read_local(struct perf_event *event)
  {
unsigned long flags;
-   u64 val;
+   u64 val = -EINVAL;
  
  	/*

 * Disabling interrupts avoids all counter scheduling (context
@@ -3267,12 +3267,14 @@ u64 perf_event_read_local(struct perf_event *event)
local_irq_save(flags);
  
  	/* If this is a per-task event, it must be for current */

-   WARN_ON_ONCE((event->attach_state & PERF_ATTACH_TASK) &&
-event->hw.target != current);
+   if ((event->attach_state & PERF_ATTACH_TASK) &&
+   event->hw.target != current)
+   goto out;
  
  	/* If this is a per-CPU event, it must be for this CPU */

-   WARN_ON_ONCE(!(event->attach_state & PERF_ATTACH_TASK) &&
-event->cpu != smp_processor_id());
+   if (!(event->attach_state & PERF_ATTACH_TASK) &&
+   event->cpu != smp_processor_id())
+   goto out;
  
  	/*

 * It must not be an event with inherit set, we cannot read
@@ -3284,7 +3286,8 @@ u64 perf_event_read_local(struct perf_event *event)
 * It must not have a pmu::count method, those are not
 * NMI safe.
 */
-   WARN_ON_ONCE(event->pmu->count);
+   if (event->pmu->count)
+   goto out;
  
  	/*

 * If the event is currently on this CPU, its either a per-task event,
@@ -3295,6 +3298,7 @@ u64 perf_event_read_local(struct perf_event *event)
event->pmu->read(event);
  
  	val = local64_read(&event->count);

+out:
local_irq_restore(flags);
  
  	return val;



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Wangnan (F)




On 2015/10/22 11:09, Alexei Starovoitov wrote:

On 10/21/15 6:56 PM, Wangnan (F) wrote:

One alternative solution I can image is to attach a BPF program
at sampling like kprobe, and return 0 if we don't want sampling
take action. Thought?


Do you think attaching BPF programs to sampling is an acceptable idea?


If you mean to extend 'filter' concept to sampling events?
So instead of soft_disable of non-local events, you'll attach bpf
program to sampling events and use map lookup to decide whether
to filter out or not such sampling event?


Yes.


What pt_regs would be in such case?



Sampling is based on interruption. We can use pt_reg captured by the IRQ 
handler,

or we can simply pass NULL to those BPF program.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Wangnan (F)




On 2015/10/22 0:57, Peter Zijlstra wrote:

On Wed, Oct 21, 2015 at 11:06:47PM +0800, pi3orama wrote:

So explain; how does this eBPF stuff work.

I think I get your point this time, and let me explain the eBPF stuff to you.

You are aware that BPF programmer can break the system in this way:

A=get_non_local_perf_event()
perf_event_read_local(A)
BOOM!

However the above logic is impossible because BPF program can't work this
way.

First of all, it is impossible for a BPF program directly invoke a
kernel function.  Doesn't like kernel module, BPF program can only
invoke functions designed for them, like what this patch does. So the
ability of BPF programs is strictly restricted by kernel. If we don't
allow BPF program call perf_event_read_local() across core, we can
check this and return error in function we provide for them.

Second: there's no way for a BPF program directly access a perf event.
All perf events have to be wrapped by a map and be accessed by BPF
functions described above. We don't allow BPF program fetch array
element from that map. So pointers of perf event is safely protected
from BPF program.

In summary, your either-or logic doesn't hold in BPF world. A BPF
program can only access perf event in a highly restricted way. We
don't allow it calling perf_event_read_local() across core, so it
can't.

Urgh, that's still horridly inconsistent. Can we please come up with a
consistent interface to perf?


BPF program and kernel module are two different worlds as I said before.

I don't think making them to share a common interface is a good idea
because such sharing will give BPF programs too much freedom than it
really need, then it will be hard prevent them to do something bad.
If we really need kernel interface, I think what we need is kernel
module, not BPF program.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Wangnan (F)


Hi Alexei,

On 2015/10/21 21:42, Wangnan (F) wrote:



One alternative solution I can image is to attach a BPF program
at sampling like kprobe, and return 0 if we don't want sampling
take action. Thought?


Do you think attaching BPF programs to sampling is an acceptable idea?

Thank you.


Actually speaking I don't like it very much
because the principle of soft-disable is much simpler and safe, but
if you really like it I think we can try.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Wangnan (F)




On 2015/10/21 20:17, Peter Zijlstra wrote:

On Wed, Oct 21, 2015 at 07:49:34PM +0800, Wangnan (F) wrote:

If our task is sampling cycle events during a function is running,
and if two cores start that function overlap:

Time:   ...A
Core 0: sys_write\
   \
\
Core 1: sys_write%return
Core 2: sys_write

Then without counter at time A it is highly possible that
BPF program on core 1 and core 2 get conflict with each other.
The final result is we make some of those events be turned on
and others turned off. Using atomic counter can avoid this
problem.

But but, how and why can an eBPF program access a !local event? I
thought we had hard restrictions on that.


How can an eBPF program access a !local event:

when creating perf event array we don't care which perf event
is for which CPU, so perf program can access any perf event in
that array. Which is straightforward. And in soft
disabling/enabling, what need to be done is an atomic
operation on something in 'perf_event' structure, which is safe
enough.

Why we need an eBPF program access a !local event:

I think I have explained why we need an eBPF program to access
a !local event. In summary, without this ability we can't
speak in user's language because they are focus on higher level
principles (display refreshing, application booting, http
processing...) and 'on which CPU' is not in their dictionaries most
of the time. Without cross-core soft-enabling/disabling it is hard
to translate requirements like "start sampling when display refreshing
begin" and "stop sampling when application booted" into eBPF programs
and perf cmdline. Don't you think it is useful for reducing sampling
data and needs to be considered?

One alternative solution I can image is to attach a BPF program
at sampling like kprobe, and return 0 if we don't want sampling
take action. Thought? Actually speaking I don't like it very much
because the principle of soft-disable is much simpler and safe, but
if you really like it I think we can try.

Do you think soft-disable/enable perf events on other cores makes
any real problem?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Wangnan (F)




On 2015/10/21 19:56, Peter Zijlstra wrote:

On Wed, Oct 21, 2015 at 07:34:28PM +0800, Wangnan (F) wrote:

If you want to actually disable the event: pmu->stop() will make it
stop, and you can restart using pmu->start().xiezuo

I also prefer totally disabling event because our goal is to reduce
sampling overhead as mush as possible. However, events in perf is
CPU bounded, one event in perf cmdline becomes multiple 'perf_event'
in kernel in multi-core system. Disabling/enabling events on all CPUs
by a BPF program a hard task due to racing, NMI, ...

But eBPF perf events must already be local afaik. Look at the
constraints perf_event_read_local() places on the events.


I think soft disabling/enabling is free of this constraint,
because technically speaking a soft-disabled perf event is still
running.

What we want to disable is only sampling action to avoid
being overwhelming by sampling and reduce the overhead which
output those unneeded sampling data to perf.data. I don't
care whether the PMU counter is stopped or still running
too much. Even if it still generate interrupts I think it
should be acceptable because interruption handling can be fast.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Wangnan (F)




On 2015/10/21 19:33, Peter Zijlstra wrote:

On Wed, Oct 21, 2015 at 06:31:04PM +0800, xiakaixu wrote:


The RFC patch set contains the necessary commit log [1].

That's of course the wrong place, this should be in the patch's
Changelog. It doesn't become less relevant.


In some scenarios we don't want to output trace data when perf sampling
in order to reduce overhead. For example, perf can be run as daemon to
dump trace data when necessary, such as the system performance goes down.
Just like the example given in the cover letter, we only receive the
samples within sys_write() syscall.

The helper bpf_perf_event_control() in this patch set can control the
data output process and get the samples we are most interested in.
The cpu_function_call is probably too much to do from bpf program, so
I choose current design that like 'soft_disable'.

So, IIRC, we already require eBPF perf events to be CPU-local, which
obviates the entire need for IPIs.


But soft-disable/enable don't require IPI because it is only
a memory store operation.


So calling pmu->stop() seems entirely possible (its even NMI safe).


But we need to turn off sampling across CPUs. Please have a look
at my another email.


This, however, does not explain if you need nesting, your patch seemed
to have a counter, which suggest you do.

To avoid reacing.

If our task is sampling cycle events during a function is running,
and if two cores start that function overlap:

Time:   ...A
Core 0: sys_write\
  \
   \
Core 1: sys_write%return
Core 2: sys_write

Then without counter at time A it is highly possible that
BPF program on core 1 and core 2 get conflict with each other.
The final result is we make some of those events be turned on
and others turned off. Using atomic counter can avoid this
problem.

Thank you.



In any case, you could add perf_event_{stop,start}_local() to mirror the
existing perf_event_read_local(), no? That would stop the entire thing
and reduce even more overhead than simply skipping the overflow handler.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Wangnan (F)




On 2015/10/21 17:12, Peter Zijlstra wrote:

On Tue, Oct 20, 2015 at 03:53:02PM -0700, Alexei Starovoitov wrote:

On 10/20/15 12:22 AM, Kaixu Xia wrote:

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b11756f..5219635 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending);
}

+   if (unlikely(!atomic_read(&event->soft_enable)))
+   return 0;
+
if (event->overflow_handler)
event->overflow_handler(event, data, regs);
else

Peter,
does this part look right or it should be moved right after
if (unlikely(!is_sampling_event(event)))
 return 0;
or even to other function?

It feels to me that it should be moved, since we probably don't
want to active throttling, period adjust and event_limit for events
that are in soft_disabled state.

Depends on what its meant to do. As long as you let the interrupt
happen, I think we should in fact do those things (maybe not the
event_limit), but period adjustment and esp. throttling are important
when the event is enabled.

If you want to actually disable the event: pmu->stop() will make it
stop, and you can restart using pmu->start().xiezuo


I also prefer totally disabling event because our goal is to reduce
sampling overhead as mush as possible. However, events in perf is
CPU bounded, one event in perf cmdline becomes multiple 'perf_event'
in kernel in multi-core system. Disabling/enabling events on all CPUs
by a BPF program a hard task due to racing, NMI, ...

Think about an example scenario: we want to sample cycles in a system
width way to see what the whole system does during a smart phone
refreshing its display, and don't want other samples when display
freezing. We probe at the entry and exit points of Display.refresh() (a
fictional user function), then let two BPF programs to enable 'cycle'
sampling PMU at the entry point and disable it at the exit point.

In this task, we need to start all 'cycles' perf_events when display
start refreshing, and disable all of those events when refreshing is
finished. Only enable the event on the core which executes the entry
point of Display.refresh() is not enough because real workers are
running on other cores, we need them to do the computation cooperativly.
Also, scheduler is possible to schedule the exit point of Display.refresh()
on another core, so we can't simply disable the perf_event on that core and
let other core keel sampling after refreshing finishes.

I have thought a method which can disable sampling in a safe way:
we can call pmu->stop() inside the PMU IRQ handler, so we can ensure
that pmu->stop() always be called by core its event resides.
However, I don't know how to reenable them when safely. Maybe need
something in scheduler?

Thank you.


And I suppose you can wrap that with a counter if you need nesting.

I'm not sure if any of that is a viable solution, because the patch
description is somewhat short on the problem statement.

As is, I'm not too charmed with the patch, but lacking a better
understanding of what exactly we're trying to achieve I'm struggling
with proposing alternatives.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/3] bpf: introduce bpf_perf_event_output() helper

2015-10-21 Thread Wangnan (F)




On 2015/10/21 18:01, He Kuang wrote:

hi, Alexei

I've tested the sample in next patch and it works well. I think more 
work on
the perf side needs to be done for parsing PERF_COUNT_SW_BPF_OUTPUT 
event type,

are you working on that?

Thank you.


We need to add something into parse-event.y/l to let it know the new 
software event.
We can do this simple task. However, as I gussed before, perf is unable 
to print
this new event so there is some work need to be done to let 'perf 
report' and

'perf script' know it.

One thing I can still remember is finding a way to inject format 
information through
those output so 'perf script' and 'perf data convert --to-ctf' can 
decode the raw data
without extra work. Can you remember that we have discussed a solution 
which connects
debuginfo and output raw data using a builtin function in LLVM/clang? We 
already have
clang/LLVM patch on it, but on perf side I haven't do start working on 
it. I think we can

make perf output raw data hexadamically at first.

I'll do the above work and put related patch at the end of my eBPF 
patchset because

they shouldn't be merged until this patchset upstreamed.

Thank you.



On 2015/10/21 11:02, Alexei Starovoitov wrote:

This helper is used to send raw data from eBPF program into
special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
User space needs to perf_event_open() it (either for one or all cpus) 
and

store FD into perf_event_array (similar to bpf_perf_event_read() helper)
before eBPF program can send data into it.

Today the programs triggered by kprobe collect the data and either store
it into the maps or print it via bpf_trace_printk() where latter is 
the debug

facility and not suitable to stream the data. This new helper replaces
such bpf_trace_printk() usage and allows programs to have dedicated
channel into user space for post-processing of the raw data collected.

Signed-off-by: Alexei Starovoitov 
---
  include/uapi/linux/bpf.h|   11 ++
  include/uapi/linux/perf_event.h |1 +
  kernel/bpf/arraymap.c   |2 ++
  kernel/bpf/verifier.c   |3 ++-
  kernel/trace/bpf_trace.c|   46 
+++

  5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 564f1f091991..2e032426cfb7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -287,6 +287,17 @@ enum bpf_func_id {
   * Return: realm if != 0
   */
  BPF_FUNC_get_route_realm,
+
+/**
+ * bpf_perf_event_output(ctx, map, index, data, size) - output 
perf raw sample

+ * @ctx: struct pt_regs*
+ * @map: pointer to perf_event_array map
+ * @index: index of event in the map
+ * @data: data on stack to be output as raw data
+ * @size: size of data
+ * Return: 0 on success
+ */
+BPF_FUNC_perf_event_output,
  __BPF_FUNC_MAX_ID,
  };

diff --git a/include/uapi/linux/perf_event.h 
b/include/uapi/linux/perf_event.h

index 2881145cda86..d3c417615361 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -110,6 +110,7 @@ enum perf_sw_ids {
  PERF_COUNT_SW_ALIGNMENT_FAULTS= 7,
  PERF_COUNT_SW_EMULATION_FAULTS= 8,
  PERF_COUNT_SW_DUMMY= 9,
+PERF_COUNT_SW_BPF_OUTPUT= 10,

  PERF_COUNT_SW_MAX,/* non-ABI */
  };
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index f2d9e698c753..e3cfe46b074f 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -295,6 +295,8 @@ static void *perf_event_fd_array_get_ptr(struct 
bpf_map *map, int fd)

  return (void *)attr;

  if (attr->type != PERF_TYPE_RAW &&
+!(attr->type == PERF_TYPE_SOFTWARE &&
+  attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
  attr->type != PERF_TYPE_HARDWARE) {
  perf_event_release_kernel(event);
  return ERR_PTR(-EINVAL);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1d6b97be79e1..b56cf51f8d42 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -245,6 +245,7 @@ static const struct {
  } func_limit[] = {
  {BPF_MAP_TYPE_PROG_ARRAY, BPF_FUNC_tail_call},
  {BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_FUNC_perf_event_read},
+{BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_FUNC_perf_event_output},
  };

  static void print_verifier_state(struct verifier_env *env)
@@ -910,7 +911,7 @@ static int check_map_func_compatibility(struct 
bpf_map *map, int func_id)

   * don't allow any other map type to be passed into
   * the special func;
   */
-if (bool_map != bool_func)
+if (bool_func && bool_map != bool_func)
  return -EINVAL;
  }

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0fe96c7c8803..47febbe7998e 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -215,6 +215,50 @@ const struct bpf_func_proto 
bpf_perf_event_read_prot

Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-13 Thread Wangnan (F)




On 2015/10/13 18:54, He Kuang wrote:

hi, Alexei


What about using similar
implementation
like PERF_EVENT_IOC_SET_OUTPUT, creating a new ioctl like
PERF_EVENT_IOC_SET_ENABLER,
then let perf to select an event as 'enabler', then BPF can still
control one atomic
variable to enable/disable a set of events.


you lost me on that last sentence. How this 'enabler' will work?
Also I'm still missing what's wrong with perf doing ioctl() on
events on all cpus manually when bpf program tells it to do so.
Is it speed you concerned about or extra work in perf ?




For not having too much wakeups, perf ringbuffer has a watermark
limit to cache events and reduce the wakeups, which causes perf
userspace tool can not receive perf events immediately.

Here's a simple demo expamle to prove it, 'sleep_exec' does some
writes and prints a timestamp every second, and an lable is
printed when perf poll gets events.

  $ perf record -m 2 -e syscalls:sys_enter_write sleep_exec 1000
  userspace sleep time: 0 seconds
  userspace sleep time: 1 seconds
  userspace sleep time: 2 seconds
  userspace sleep time: 3 seconds
  perf record wakeup onetime 0
  userspace sleep time: 4 seconds
  userspace sleep time: 5 seconds
  userspace sleep time: 6 seconds
  userspace sleep time: 7 seconds
  perf record wakeup onetime 1
  userspace sleep time: 8 seconds
  perf record wakeup onetime 2
  ..

  $ perf record -m 1 -e syscalls:sys_enter_write sleep_exec 1000
  userspace sleep time: 0 seconds
  userspace sleep time: 1 seconds
  perf record wakeup onetime 0
  userspace sleep time: 2 seconds
  userspace sleep time: 3 seconds
  perf record wakeup onetime 1
  userspace sleep time: 4 seconds
  userspace sleep time: 5 seconds
  ..

By default, if no mmap_pages is specified, perf tools wakeup only
when the target executalbe finished:

  $ perf record -e syscalls:sys_enter_write sleep_exec 5
  userspace sleep time: 0 seconds
  userspace sleep time: 1 seconds
  userspace sleep time: 2 seconds
  userspace sleep time: 3 seconds
  userspace sleep time: 4 seconds
  perf record wakeup onetime 0
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.006 MB perf.data (54 samples) ]

If we want perf to reflect as soon as our sample event be generated,
--no-buffering should be used, but this option has a greater
impact on performance.

  $ perf record --no-buffering -e syscalls:sys_enter_write sleep_exec 
1000

  userspace sleep time: 0 seconds
  perf record wakeup onetime 0
  perf record wakeup onetime 1
  perf record wakeup onetime 2
  perf record wakeup onetime 3
  perf record wakeup onetime 4
  perf record wakeup onetime 5
  perf record wakeup onetime 6
  ..



Hi Alexei,

Based on He Kuang's test result, if we choose to use perf to control 
perf event

and output trigger event through bof_output_data, with default setting we
have to wait for sereval seconds until perf can get first trigger event 
if the
trigger event's frequency is low. In my display refreshing example, it 
causes

losting of event triggering. From user's view, random frames would miss.

With --no-buffering, things can become faster, but --no-buffering causes 
perf
to be scheduled in faster than normal, which is conflict to the goal of 
event

disabling that we want to reduce recording overhead as much as possible.

Thank you.


Thank you




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-13 Thread Wangnan (F)




On 2015/10/13 13:15, Alexei Starovoitov wrote:

On 10/12/15 9:34 PM, Wangnan (F) wrote:



On 2015/10/13 12:16, Alexei Starovoitov wrote:

On 10/12/15 8:51 PM, Wangnan (F) wrote:

why 'set disable' is needed ?
the example given in cover letter shows the use case where you want
to receive samples only within sys_write() syscall.
The example makes sense, but sys_write() is running on this cpu, so
just
disabling it on the current one is enough.



Our real use case is control of the system-wide sampling. For example,
we need sampling all CPUs when smartphone start refershing its 
display.

We need all CPUs because in Android system there are plenty of threads
get involed into this behavior. We can't achieve this by controling
sampling on only one CPU. This is the reason we need 'set enable'
and 'set disable'.


ok, but that use case may have different enable/disable pattern.
In sys_write example ultra-fast enable/disable is must have, since
the whole syscall is fast and overhead should be minimal.
but for display refresh? we're talking milliseconds, no?
Can you just ioctl() it from user space?
If cost of enable/disable is high or the time range between toggling is
long, then doing it from the bpf program doesn't make sense. Instead
the program can do bpf_perf_event_output() to send a notification to
user space that condition is met and the user space can ioctl() events.



OK. I think I understand your design principle that, everything 
inside BPF

should be as fast as possible.

Make userspace control events using ioctl make things harder. You 
know that

'perf record' itself doesn't care too much about events it reveived. It
only
copies data to perf.data, but what we want is to use perf record simply
like
this:

  # perf record -e evt=cycles -e control.o/pmu=evt/ -a sleep 100

And in control.o we create uprobe point to mark the start and finish of
a frame:

  SEC("target=/a/b/c.o\nstartFrame=0x123456")
  int startFrame(void *) {
bpf_pmu_enable(pmu);
return 1;
  }

  SEC("target=/a/b/c.o\nfinishFrame=0x234568")
  int finishFrame(void *) {
bpf_pmu_disable(pmu);
return 1;
  }

I think it is make sence also.


yes. that looks quite useful,
but did you consider re-entrant startFrame() ?
start << here sampling starts
  start
  finish << here all samples disabled?!
finish
and startFrame()/finishFrame() running on all cpus of that user app ?
One cpu entering into startFrame() while another cpu doing finishFrame
what behavior should be? sampling is still enabled on all cpus? or off?
Either case doesn't seem to work with simple enable/disable.
Few emails in this thread back, I mentioned inc/dec of a flag
to solve that.


Correct.




What about using similar
implementation
like PERF_EVENT_IOC_SET_OUTPUT, creating a new ioctl like
PERF_EVENT_IOC_SET_ENABLER,
then let perf to select an event as 'enabler', then BPF can still
control one atomic
variable to enable/disable a set of events.


you lost me on that last sentence. How this 'enabler' will work?


Like what we did in this patchset: add an atomic flag to perf_event,
make all perf_event connected to this enabler by PERF_EVENT_IOC_SET_ENABLER.
During running, check the enabler's atomic flag. So we use one atomic
variable to control a set of perf_event. Finally create a BPF helper
function to control that atomic variable.


Also I'm still missing what's wrong with perf doing ioctl() on
events on all cpus manually when bpf program tells it to do so.
Is it speed you concerned about or extra work in perf ?



I think both speed and extra work need be concerned.

Say we use perf to enable/disable sampling. Use the above example to
describe, when smartphone starting refresing display, we write something
into ringbuffer, then display refreshing start. We have to wait for
perf be scheduled in, parse event it get (perf record doesn't do this
currently), discover the trigger event then enable sampling perf events
on all cpus. We make trigger and action asynchronous. I'm not sure how
many ns or ms it need, and I believe asynchronization itself introduces
complexity, which I think need to be avoided except we can explain the
advantages asynchronization can bring.

But yes, perf based implementation can shut down the PMU completely, which
is better than current light-weight implementation.

In summary:

 - In next version we will use counter based flag instead of current
   0/1 switcher in considering of reentering problem.

 - I think we both agree we need a light weight solution in which we can
   enable/disable sampling in function level. This light-weight solution
   can be applied to only one perf-event.

 - Our disagreement is whether to introduce a heavy-weight solution based
   on perf to enable/disable a group of perf event. For me, perf-based
   solution can shut down PMU completly, which is good. However, it

Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Wangnan (F)




On 2015/10/13 12:16, Alexei Starovoitov wrote:

On 10/12/15 8:51 PM, Wangnan (F) wrote:

why 'set disable' is needed ?
the example given in cover letter shows the use case where you want
to receive samples only within sys_write() syscall.
The example makes sense, but sys_write() is running on this cpu, so 
just

disabling it on the current one is enough.



Our real use case is control of the system-wide sampling. For example,
we need sampling all CPUs when smartphone start refershing its display.
We need all CPUs because in Android system there are plenty of threads
get involed into this behavior. We can't achieve this by controling
sampling on only one CPU. This is the reason we need 'set enable'
and 'set disable'.


ok, but that use case may have different enable/disable pattern.
In sys_write example ultra-fast enable/disable is must have, since
the whole syscall is fast and overhead should be minimal.
but for display refresh? we're talking milliseconds, no?
Can you just ioctl() it from user space?
If cost of enable/disable is high or the time range between toggling is
long, then doing it from the bpf program doesn't make sense. Instead
the program can do bpf_perf_event_output() to send a notification to
user space that condition is met and the user space can ioctl() events.



OK. I think I understand your design principle that, everything inside BPF
should be as fast as possible.

Make userspace control events using ioctl make things harder. You know that
'perf record' itself doesn't care too much about events it reveived. It only
copies data to perf.data, but what we want is to use perf record simply like
this:

 # perf record -e evt=cycles -e control.o/pmu=evt/ -a sleep 100

And in control.o we create uprobe point to mark the start and finish of 
a frame:


 SEC("target=/a/b/c.o\nstartFrame=0x123456")
 int startFrame(void *) {
   bpf_pmu_enable(pmu);
   return 1;
 }

 SEC("target=/a/b/c.o\nfinishFrame=0x234568")
 int finishFrame(void *) {
   bpf_pmu_disable(pmu);
   return 1;
 }

I think it is make sence also.

I still think perf is not necessary be independent each other. You know 
we have

PERF_EVENT_IOC_SET_OUTPUT which can set multiple events output through one
ringbuffer. This way perf events are connected.

I think the 'set disable/enable' design in this patchset satisify the 
design goal
that in BPF program we only do simple and fast things. The only 
inconvience is
we add something into map, which is ugly. What about using similar 
implementation
like PERF_EVENT_IOC_SET_OUTPUT, creating a new ioctl like 
PERF_EVENT_IOC_SET_ENABLER,
then let perf to select an event as 'enabler', then BPF can still 
control one atomic

variable to enable/disable a set of events.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Wangnan (F)




On 2015/10/13 11:39, Alexei Starovoitov wrote:

On 10/12/15 8:27 PM, Wangnan (F) wrote:
Then how to avoid racing? For example, when one core disabling all 
events

in a map, another core is enabling all of them. This racing may causes
sereval
perf events in a map dump samples while other events not. To avoid such
racing
I think some locking must be introduced, then cost is even higher.

The reason why we introduce an atomic pointer is because each operation
should
controls a set of events, not one event, due to the per-cpu manner of
perf events.


why 'set disable' is needed ?
the example given in cover letter shows the use case where you want
to receive samples only within sys_write() syscall.
The example makes sense, but sys_write() is running on this cpu, so just
disabling it on the current one is enough.



Our real use case is control of the system-wide sampling. For example,
we need sampling all CPUs when smartphone start refershing its display.
We need all CPUs because in Android system there are plenty of threads
get involed into this behavior. We can't achieve this by controling
sampling on only one CPU. This is the reason we need 'set enable'
and 'set disable'.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Wangnan (F)




On 2015/10/13 3:29, Alexei Starovoitov wrote:

On 10/12/15 2:02 AM, Kaixu Xia wrote:

+extern const struct bpf_func_proto bpf_perf_event_sample_enable_proto;
+extern const struct bpf_func_proto bpf_perf_event_sample_disable_proto;


externs are unnecessary. Just make them static.
Also I prefer single helper that takes a flag, so we can extend it
instead of adding func_id for every little operation.

To avoid conflicts if you touch kernel/bpf/* or bpf.h please always
base your patches of net-next.

> +atomic_set(&map->perf_sample_disable, 0);

global flag per map is no go.
events are independent and should be treated as such.



Then how to avoid racing? For example, when one core disabling all events
in a map, another core is enabling all of them. This racing may causes 
sereval
perf events in a map dump samples while other events not. To avoid such 
racing

I think some locking must be introduced, then cost is even higher.

The reason why we introduce an atomic pointer is because each operation 
should
controls a set of events, not one event, due to the per-cpu manner of 
perf events.


Thank you.


Please squash these two patches, since they're part of one logical
feature. Splitting them like this only makes review harder.

--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/2] perf: Add the flag sample_disable not to output data on samples

2015-10-12 Thread Wangnan (F)




On 2015/10/12 20:02, Peter Zijlstra wrote:

On Mon, Oct 12, 2015 at 09:02:42AM +, Kaixu Xia wrote:

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -483,6 +483,8 @@ struct perf_event {
perf_overflow_handler_t overflow_handler;
void*overflow_handler_context;
  
+	atomic_t			*sample_disable;

+
  #ifdef CONFIG_EVENT_TRACING
struct trace_event_call *tp_event;
struct event_filter *filter;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b11756f..f6ef45c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending);
}
  
+	if ((event->sample_disable) && atomic_read(event->sample_disable))

+   return ret;
+
if (event->overflow_handler)
event->overflow_handler(event, data, regs);
else

Try and guarantee sample_disable lives in the same cacheline as
overflow_handler.


Could you please explain why we need them to be in a same cacheline?

Thank you.


I think we should at the very least replace the kzalloc() currently used
with a cacheline aligned alloc, and check the structure layout to verify
these two do in fact share a cacheline.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Wangnan (F)




On 2015/6/19 0:00, Alexei Starovoitov wrote:

On Thu, Jun 18, 2015 at 08:31:45AM +, Wang Nan wrote:

Original code has a problem, cause following code failed to pass verifier:

  r1 <- r10
  r1 -= 8
  r2 = 8
  r3 = unsafe pointer
  call BPF_FUNC_probe_read  <-- R1 type=inv expected=fp

However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
loaded successfully.

This is because the verifier allows only BPF_ADD instruction on a
FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
on FRAME_PTR reigster to get a UNKNOWN_VALUE register.

This patch fix it by adding BPF_SUB in stack_relative checking.

It's not a bug. It's catching ADD only by design.
If we let it recognize SUB then one might argue we should let it
recognize multiply, shifts and all other arithmetic on pointers.
verifier will be getting bigger and bigger. Where do we stop?
llvm only emits canonical ADD. If you've seen llvm doing SUB,
let's fix it there.
So what piece generated this 'r1 -= 8' ?



I hit this problem when writing code of automatical parameter generator. The
instruction is generated by myself. Now I have corrected my code.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

70 matches

Mail list logo