Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-03 Thread Jesper Dangaard Brouer via iovisor-dev
On Fri, 3 Feb 2017 07:43:33 -0800
Alexei Starovoitov  wrote:

> On Fri, Feb 3, 2017 at 5:52 AM, Jesper Dangaard Brouer
>  wrote:
> > On Thu, 2 Feb 2017 20:27:15 -0800
> > Alexei Starovoitov  wrote:
> >  
> >> On Thu, Feb 02, 2017 at 06:00:09PM +0100, Jesper Dangaard Brouer wrote:  
> > [...]  
> >> Comments below:  
> > [...]
> >  
> >> > +Interacting with maps
> >> > +=
> >> > +
> >> > +Interacting with an eBPF maps from **userspace**, happens through the
> >> > +`bpf`_ syscall and a file descriptor.  The kernel
> >> > +`tools/lib/bpf/bpf.h`_ define some ``bpf_map_*()`` helper functions
> >> > +for wrapping the `bpf_cmd`_ relating to manipulating the map elements.
> >> > +
> >> > +.. code-block:: c
> >> > +
> >> > +  enum bpf_cmd {
> >> > +   [...]
> >> > +   BPF_MAP_LOOKUP_ELEM,
> >> > +   BPF_MAP_UPDATE_ELEM,
> >> > +   BPF_MAP_DELETE_ELEM,
> >> > +   BPF_MAP_GET_NEXT_KEY,
> >> > +   [...]
> >> > +  };
> >> > +  /* Corresponding helper functions */
> >> > +  int bpf_map_lookup_elem(int fd, void *key, void *value);
> >> > +  int bpf_map_update_elem(int fd, void *key, void *value, __u64 flags);
> >> > +  int bpf_map_delete_elem(int fd, void *key);
> >> > +  int bpf_map_get_next_key(int fd, void *key, void *next_key);
> >> > +
> >> > +Notice from userspace, there is no call to atomically increment or
> >> > +decrement the value 'in-place'. The bpf_map_update_elem() call will
> >> > +overwrite the existing value.  
> >>
> >> will overwrite in _atomic_ way. Meaning that the whole value will
> >> be replaced atomically regardless whether update() is called from
> >> kernel or from user space.  
> >
> > Looking at the memcpy code in array_map_update_elem(), I don't see
> > anything preventing two processes from data-update racing with
> > each-other, especially when the value_size is large.  
> 
> yes. my comment above applies to hash map only.

Okay, understood.  I did see locks in the hash map code, but I've not
documented that map type yet.

I've completely rewritten the section: "Interacting with maps"
See commit:
 https://github.com/netoptimizer/prototype-kernel/commit/9b6eba65171ce8d83f6
Or html:
 
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#interacting-with-maps

Also inlined here:

[PATCH] doc: ebpf rewrite section Interacting with maps

From: Jesper Dangaard Brouer 

Signed-off-by: Jesper Dangaard Brouer 
---
 kernel/Documentation/bpf/ebpf_maps.rst |  145 +---
 1 file changed, 112 insertions(+), 33 deletions(-)

diff --git a/kernel/Documentation/bpf/ebpf_maps.rst 
b/kernel/Documentation/bpf/ebpf_maps.rst
index 590e1ac72eb3..b549e4716501 100644
--- a/kernel/Documentation/bpf/ebpf_maps.rst
+++ b/kernel/Documentation/bpf/ebpf_maps.rst
@@ -112,13 +112,60 @@ files for defining the maps, but it uses another layout.  
See man-page
 Interacting with maps
 =
 
+Interacting with eBPF maps happens through some **lookup/update/delete**
+primitives.
+
+When writing eBFP programs using load helpers and libraries from
+samples/bpf/ and tools/lib/bpf/.  Common function name API have been
+created that hides the details of how kernel vs. userspace access
+these primitives (which is quite different).
+
+The common function names (parameters and return values differs):
+
+.. code-block:: c
+
+  void bpf_map_lookup_elem(map, void *key. ...);
+  void bpf_map_update_elem(map, void *key, ..., __u64 flags);
+  void bpf_map_delete_elem(map, void *key);
+
+The ``flags`` argument in ``bpf_map_update_elem()`` allows to define
+semantics on whether the element exists:
+
+.. code-block:: c
+
+  /* File: include/uapi/linux/bpf.h */
+  /* flags for BPF_MAP_UPDATE_ELEM command */
+  #define BPF_ANY  0 /* create new element or update existing */
+  #define BPF_NOEXIST  1 /* create new element only if it didn't exist */
+  #define BPF_EXIST2 /* only update existing element */
+
+Userspace
+-
+The userspace API map helpers are defined in `tools/lib/bpf/bpf.h`_
+and looks like this:
+
+.. code-block:: c
+
+  /* Userspace helpers */
+  int bpf_map_lookup_elem(int fd, void *key, void *value);
+  int bpf_map_update_elem(int fd, void *key, void *value, __u64 flags);
+  int bpf_map_delete_elem(int fd, void *key);
+  /* Only userspace: */
+  int bpf_map_get_next_key(int fd, void *key, void *next_key);
+
+
 Interacting with an eBPF map from **userspace**, happens through the
-`bpf`_ syscall and a file descriptor.  The kernel
-`tools/lib/bpf/bpf.h`_ defines some ``bpf_map_*()`` helper functions
-for wrapping the `bpf_cmd`_ related to manipulating the map elements.
+`bpf`_ syscall and a file descriptor.  See how the map handle ``int
+fd`` is a file descriptor .  On success, zero is returned, on
+failures -1 is returned and errno is set.
+
+Wrappers for the bpf syscall is implemented in `tools/lib/bpf/bpf.c`_,
+and ends up 

Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-03 Thread Alexei Starovoitov via iovisor-dev
On Fri, Feb 3, 2017 at 5:52 AM, Jesper Dangaard Brouer
 wrote:
> On Thu, 2 Feb 2017 20:27:15 -0800
> Alexei Starovoitov  wrote:
>
>> On Thu, Feb 02, 2017 at 06:00:09PM +0100, Jesper Dangaard Brouer wrote:
> [...]
>> Comments below:
> [...]
>
>> > +Interacting with maps
>> > +=
>> > +
>> > +Interacting with an eBPF maps from **userspace**, happens through the
>> > +`bpf`_ syscall and a file descriptor.  The kernel
>> > +`tools/lib/bpf/bpf.h`_ define some ``bpf_map_*()`` helper functions
>> > +for wrapping the `bpf_cmd`_ relating to manipulating the map elements.
>> > +
>> > +.. code-block:: c
>> > +
>> > +  enum bpf_cmd {
>> > +   [...]
>> > +   BPF_MAP_LOOKUP_ELEM,
>> > +   BPF_MAP_UPDATE_ELEM,
>> > +   BPF_MAP_DELETE_ELEM,
>> > +   BPF_MAP_GET_NEXT_KEY,
>> > +   [...]
>> > +  };
>> > +  /* Corresponding helper functions */
>> > +  int bpf_map_lookup_elem(int fd, void *key, void *value);
>> > +  int bpf_map_update_elem(int fd, void *key, void *value, __u64 flags);
>> > +  int bpf_map_delete_elem(int fd, void *key);
>> > +  int bpf_map_get_next_key(int fd, void *key, void *next_key);
>> > +
>> > +Notice from userspace, there is no call to atomically increment or
>> > +decrement the value 'in-place'. The bpf_map_update_elem() call will
>> > +overwrite the existing value.
>>
>> will overwrite in _atomic_ way. Meaning that the whole value will
>> be replaced atomically regardless whether update() is called from
>> kernel or from user space.
>
> Looking at the memcpy code in array_map_update_elem(), I don't see
> anything preventing two processes from data-update racing with
> each-other, especially when the value_size is large.

yes. my comment above applies to hash map only.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-03 Thread Jesper Dangaard Brouer via iovisor-dev
On Thu, 2 Feb 2017 20:27:15 -0800
Alexei Starovoitov  wrote:

> On Thu, Feb 02, 2017 at 06:00:09PM +0100, Jesper Dangaard Brouer wrote:
[...]
> Comments below:
[...] 

> > +Interacting with maps
> > +=
> > +
> > +Interacting with an eBPF maps from **userspace**, happens through the
> > +`bpf`_ syscall and a file descriptor.  The kernel
> > +`tools/lib/bpf/bpf.h`_ define some ``bpf_map_*()`` helper functions
> > +for wrapping the `bpf_cmd`_ relating to manipulating the map elements.
> > +
> > +.. code-block:: c
> > +
> > +  enum bpf_cmd {
> > +   [...]
> > +   BPF_MAP_LOOKUP_ELEM,
> > +   BPF_MAP_UPDATE_ELEM,
> > +   BPF_MAP_DELETE_ELEM,
> > +   BPF_MAP_GET_NEXT_KEY,
> > +   [...]
> > +  };
> > +  /* Corresponding helper functions */
> > +  int bpf_map_lookup_elem(int fd, void *key, void *value);
> > +  int bpf_map_update_elem(int fd, void *key, void *value, __u64 flags);
> > +  int bpf_map_delete_elem(int fd, void *key);
> > +  int bpf_map_get_next_key(int fd, void *key, void *next_key);
> > +
> > +Notice from userspace, there is no call to atomically increment or
> > +decrement the value 'in-place'. The bpf_map_update_elem() call will
> > +overwrite the existing value.  
> 
> will overwrite in _atomic_ way. Meaning that the whole value will
> be replaced atomically regardless whether update() is called from
> kernel or from user space.

Looking at the memcpy code in array_map_update_elem(), I don't see
anything preventing two processes from data-update racing with
each-other, especially when the value_size is large.


> Also please mention first that kernel can do atomic increment,
> before saying that user space cannot. Otherwise it's quite confusing.

True, noted, I'll rearrange.

> > The flags argument allows
> > +bpf_map_update_elem() define semantics on weather the element exist:
> > +
> > +.. code-block:: c
> > +
> > +  /* File: include/uapi/linux/bpf.h */
> > +  /* flags for BPF_MAP_UPDATE_ELEM command */
> > +  #define BPF_ANY  0 /* create new element or update existing */
> > +  #define BPF_NOEXIST  1 /* create new element if it didn't exist */
> > +  #define BPF_EXIST2 /* update existing element */
> > +
> > +The eBPF-program running "kernel-side" have almost the same primitives
> > +(lookup/update/delete) for interacting with the map, but it interact
> > +more directly with the map data structures. For example the call
> > +``bpf_map_lookup_elem()`` returns a direct pointer to the 'value'
> > +memory-element inside the kernel (while userspace gets a copy).  This
> > +allows the eBPF-program to atomically increment or decrement the value
> > +'in-place', by using appropiate compiler primitives like
> > +``__sync_fetch_and_add()``, which is understood by LLVM when
> > +generating eBPF instructions.  
> 
> ahh. so you do it here. may be move this paragraph on top ?

yes.

> > +
> > +On the kernel side, implementing a map type requires defining some
> > +function (pointers) via `struct bpf_map_ops`_.  And eBPF programs have
> > +access to ``map_lookup_elem``, ``map_update_elem`` and
> > +``map_delete_elem``, which gets invoked from eBPF via bpf-helpers in  
> 
> this paragraph reads a bit out of place.
> I'm not sure what 'kernel implements the map' means.
> You mean 'access the map' ?

True, it is not very clear.  I'm talking about how a new map type is
implemented, and what functions it need to implement. It is confusing,
I'll try to come up with something better.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-03 Thread Quentin Monnet via iovisor-dev
Hi Jesper,

2017-02-03 (11:54 +0100) ~ Jesper Dangaard Brouer via iovisor-dev

> [...]
>
> I didn't know about TC maps.  Can you or Daniel please point me to some
> TC-eBFP-code that setup a map?  So, I can wrap my head around how it
> does it... I think it would be valuable to have a section here
> describing how TC manage maps.
>
> [...]

Some examples are available in iproute2 package, under examples/bpf/ [1].
File bpf_prog.c, in particular, uses several maps, and a user space
program, but if I remember correctly it was created before the map
pinning feature and sends the file descriptors for the maps to the user
space agent through a UNIX socket. File bpf_shared.c is more recent and
uses map pinning.

[1]
https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/tree/examples/bpf

Best regards,
Quentin

___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-03 Thread Jesper Dangaard Brouer via iovisor-dev
On Thu, 2 Feb 2017 20:27:15 -0800
Alexei Starovoitov  wrote:

> On Thu, Feb 02, 2017 at 06:00:09PM +0100, Jesper Dangaard Brouer wrote:
> > 
> > [PATCH] doc: how interacting with eBPF maps works
> > 
> > Documented by reading the code.  
> 
> looks great!
> please submit the patch to net-next and cc Jonathan ?

Good idea, but I'll take this round of review first, before pushing it
to the kernel tree. 

I also want a build-env section.

Also considering splitting the "Types of maps"[1] section into a
separate file, as this avoids a too long HTML page being generated.

[1] 
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#types-of-maps

> Comments below:
> 
> > I hope someone more knowledgeable will review this and
> > correct me where I misunderstood things.
> > 
> > Signed-off-by: Jesper Dangaard Brouer 
> > ---
> >  kernel/Documentation/bpf/ebpf_maps.rst |  126 
> > ++--
> >  1 file changed, 120 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/Documentation/bpf/ebpf_maps.rst 
> > b/kernel/Documentation/bpf/ebpf_maps.rst
> > index 562edd566e0b..55068c7f3dab 100644
> > --- a/kernel/Documentation/bpf/ebpf_maps.rst
> > +++ b/kernel/Documentation/bpf/ebpf_maps.rst
> > @@ -23,13 +23,128 @@ and accessed by multiple programs (from man-page 
> > `bpf(2)`_):
> >   up to the user process and eBPF program to decide what they store
> >   inside maps.
> >  
> > +Creating a map
> > +==
> > +
> > +A maps is created based on a request from userspace, via the `bpf`_
> > +syscall (`bpf_cmd`_ BPF_MAP_CREATE), and returns a new file descriptor
> > +that refers to the map. These are the setup arguments when creating a
> > +map.
> > +
> > +.. code-block:: c
> > +
> > +  struct { /* anonymous struct used by BPF_MAP_CREATE command */
> > + __u32   map_type;   /* one of enum bpf_map_type */
> > + __u32   key_size;   /* size of key in bytes */
> > + __u32   value_size; /* size of value in bytes */
> > + __u32   max_entries;/* max number of entries in a map */
> > + __u32   map_flags;  /* prealloc or not */
> > +  };
> > +
> > +For programs under samples/bpf/ the ``load_bpf_file()`` call (from
> > +`samples/bpf/bpf_load`_) takes care of parsing elf file compiled by
^^^  

> > +LLVM, pickup 'maps' section and creates maps via BPF syscall.  This is
  ^^ ^

> > +done by defining a ``struct bpf_map_def`` with an elf section
> > +__attribute__ ``SEC("maps")``, in the xxx_kern.c file.  The maps file
> > +descriptor is available in the userspace xxx_user.c file, via global
> > +array variable ``map_fd[]``, and the array map index correspons to the
> > +order the maps sections were defined in elf file of xxx_kern.c file.
> > +
> > +.. code-block:: c
> > +
> > +  struct bpf_map_def {
> > +   unsigned int type;
> > +   unsigned int key_size;
> > +   unsigned int value_size;
> > +   unsigned int max_entries;
> > +   unsigned int map_flags;
> > +  };  
> 
> As Daniel said please mention that this is C program <-> elf loader
> convention. It's not a kernel abi.

I feel I did mention the ELF part above, but I guess should try to make
it more clear, as you missed this.


> perf map defitions are different from tc and from samples/bpf/

I didn't know about TC maps.  Can you or Daniel please point me to some
TC-eBFP-code that setup a map?  So, I can wrap my head around how it
does it... I think it would be valuable to have a section here
describing how TC manage maps.

See what I figured out myself in this commit:
 https://github.com/netoptimizer/prototype-kernel/commit/3640a48df52c

And the new TC section:
 
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#qdisc-traffic-control-convention

> iovisor/bcc is not using elf at all.

True (I do know that bcc doesn't use elf).
 
> > +
> > +  struct bpf_map_def SEC("maps") my_map = {
> > +   .type= BPF_MAP_TYPE_XXX,
> > +   .key_size= sizeof(u32),
> > +   .value_size  = sizeof(u64),
> > +   .max_entries = 42,
> > +   .map_flags   = 0
> > +  };
> > +
> > +.. _samples/bpf/bpf_load:
> > +   
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/bpf_load.c
> > +
> > +
> > +Interacting with maps
> > +=

I'll use another email and commit change to address this "Interacting
with maps" section...


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-02 Thread Daniel Borkmann via iovisor-dev

On 02/02/2017 06:00 PM, Jesper Dangaard Brouer via iovisor-dev wrote:

On Thu, 2 Feb 2017 11:56:19 +0100
Jesper Dangaard Brouer  wrote:


On Tue, 31 Jan 2017 20:54:10 -0800
Alexei Starovoitov  wrote:


On Tue, Jan 31, 2017 at 9:54 AM, Jesper Dangaard Brouer via
iovisor-dev  wrote:


On Sat, 28 Jan 2017 10:14:58 +1100 Brendan Gregg  
wrote:


I did some in the bcc ref guide, but it's incomplete, and the bcc versions:
https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md


Thanks! - this seem rather BCC specific syntax, and I'm looking for
documentation close for the kernel (samples/bpf/).

The best documentation I found was the man-page for the syscall bpf(2):
  http://man7.org/linux/man-pages/man2/bpf.2.html

In lack of a better place, I've started documenting eBPF here:
  https://prototype-kernel.readthedocs.io/en/latest/bpf/index.html

This doc is compatible with the kernels doc format, and I hope we can
push this into the kernel tree, if it turns out to be valuable?
(https://www.kernel.org/doc/html/latest/)


yeah. definitely would be great to add map descriptions to the kernel docs.
So far most of it is in commit logs.
git log kernel/bpf/arraymap.c|tail -33
git log kernel/bpf/hashtab.c|tail -33
will give an overview of key hash and array map principles.


Thanks, I'm using that to write some doc.
  http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html
Gotten to BPF_MAP_TYPE_ARRAY
  
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#bpf-map-type-array

Can you explain the difference between the kernel and userspace side of
the call bpf_map_lookup_elem() ?

Kernel side:
   long *value = bpf_map_lookup_elem(_map, );

Userspace side:
   long long value;
   bpf_map_lookup_elem(map_fd[0], , )

Looks like userspace gets a copy of the memory...
If so, how can userspace then increment the value safely?


I documented this myself, please correct me.
  
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#creating-a-map
  
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#interacting-with-maps

Commit:
  https://github.com/netoptimizer/prototype-kernel/commit/ffbcdc453f3c

Inlined below:
  -

[PATCH] doc: how interacting with eBPF maps works

Documented by reading the code.

I hope someone more knowledgeable will review this and
correct me where I misunderstood things.

Signed-off-by: Jesper Dangaard Brouer 


Would be great if you could submit some of the missing pieces to
linux-...@vger.kernel.org for bpf(2) man-page inclusion while at
it, so this doesn't need to be done twice. Thanks for documenting.

Note that struct bpf_map_def can vary based on the bpf loader
implementation f.e. iproute2 has a different layout and also makes
use of map/prog pinning (see iproute2's lib/bpf.c).

Thanks,
Daniel
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-02 Thread Jesper Dangaard Brouer via iovisor-dev
On Sat, 28 Jan 2017 10:14:58 +1100
Brendan Gregg  wrote:

> I did some in the bcc ref guide, but it's incomplete, and the bcc versions:
> https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md

Hi Brendan,

I also included BCC in my documentation, please correct me:
 http://prototype-kernel.readthedocs.io/en/latest/bpf/bcc_tool_chain.html
 https://github.com/netoptimizer/prototype-kernel/commit/b61a7ecfe08d2cc0

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-02 Thread Jesper Dangaard Brouer via iovisor-dev
On Thu, 2 Feb 2017 11:56:19 +0100
Jesper Dangaard Brouer  wrote:

> On Tue, 31 Jan 2017 20:54:10 -0800
> Alexei Starovoitov  wrote:
> 
> > On Tue, Jan 31, 2017 at 9:54 AM, Jesper Dangaard Brouer via
> > iovisor-dev  wrote:  
> > >
> > > On Sat, 28 Jan 2017 10:14:58 +1100 Brendan Gregg 
> > >  wrote:
> > >
> > >> I did some in the bcc ref guide, but it's incomplete, and the bcc 
> > >> versions:
> > >> https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md
> > >
> > > Thanks! - this seem rather BCC specific syntax, and I'm looking for
> > > documentation close for the kernel (samples/bpf/).
> > >
> > > The best documentation I found was the man-page for the syscall bpf(2):
> > >  http://man7.org/linux/man-pages/man2/bpf.2.html
> > >
> > > In lack of a better place, I've started documenting eBPF here:
> > >  https://prototype-kernel.readthedocs.io/en/latest/bpf/index.html
> > >
> > > This doc is compatible with the kernels doc format, and I hope we can
> > > push this into the kernel tree, if it turns out to be valuable?
> > > (https://www.kernel.org/doc/html/latest/)
> > 
> > yeah. definitely would be great to add map descriptions to the kernel docs.
> > So far most of it is in commit logs.
> > git log kernel/bpf/arraymap.c|tail -33
> > git log kernel/bpf/hashtab.c|tail -33
> > will give an overview of key hash and array map principles.  
> 
> Thanks, I'm using that to write some doc.
>  http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html
> Gotten to BPF_MAP_TYPE_ARRAY
>  
> http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#bpf-map-type-array
>  
> 
> Can you explain the difference between the kernel and userspace side of
> the call bpf_map_lookup_elem() ?
> 
> Kernel side:
>   long *value = bpf_map_lookup_elem(_map, );
> 
> Userspace side:
>   long long value;
>   bpf_map_lookup_elem(map_fd[0], , )
> 
> Looks like userspace gets a copy of the memory...
> If so, how can userspace then increment the value safely?

I documented this myself, please correct me. 
 
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#creating-a-map
 
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#interacting-with-maps

Commit:
 https://github.com/netoptimizer/prototype-kernel/commit/ffbcdc453f3c

Inlined below:
 -

[PATCH] doc: how interacting with eBPF maps works

Documented by reading the code.

I hope someone more knowledgeable will review this and
correct me where I misunderstood things.

Signed-off-by: Jesper Dangaard Brouer 
---
 kernel/Documentation/bpf/ebpf_maps.rst |  126 ++--
 1 file changed, 120 insertions(+), 6 deletions(-)

diff --git a/kernel/Documentation/bpf/ebpf_maps.rst 
b/kernel/Documentation/bpf/ebpf_maps.rst
index 562edd566e0b..55068c7f3dab 100644
--- a/kernel/Documentation/bpf/ebpf_maps.rst
+++ b/kernel/Documentation/bpf/ebpf_maps.rst
@@ -23,13 +23,128 @@ and accessed by multiple programs (from man-page 
`bpf(2)`_):
  up to the user process and eBPF program to decide what they store
  inside maps.
 
+Creating a map
+==
+
+A maps is created based on a request from userspace, via the `bpf`_
+syscall (`bpf_cmd`_ BPF_MAP_CREATE), and returns a new file descriptor
+that refers to the map. These are the setup arguments when creating a
+map.
+
+.. code-block:: c
+
+  struct { /* anonymous struct used by BPF_MAP_CREATE command */
+ __u32   map_type;   /* one of enum bpf_map_type */
+ __u32   key_size;   /* size of key in bytes */
+ __u32   value_size; /* size of value in bytes */
+ __u32   max_entries;/* max number of entries in a map */
+ __u32   map_flags;  /* prealloc or not */
+  };
+
+For programs under samples/bpf/ the ``load_bpf_file()`` call (from
+`samples/bpf/bpf_load`_) takes care of parsing elf file compiled by
+LLVM, pickup 'maps' section and creates maps via BPF syscall.  This is
+done by defining a ``struct bpf_map_def`` with an elf section
+__attribute__ ``SEC("maps")``, in the xxx_kern.c file.  The maps file
+descriptor is available in the userspace xxx_user.c file, via global
+array variable ``map_fd[]``, and the array map index correspons to the
+order the maps sections were defined in elf file of xxx_kern.c file.
+
+.. code-block:: c
+
+  struct bpf_map_def {
+   unsigned int type;
+   unsigned int key_size;
+   unsigned int value_size;
+   unsigned int max_entries;
+   unsigned int map_flags;
+  };
+
+  struct bpf_map_def SEC("maps") my_map = {
+   .type= BPF_MAP_TYPE_XXX,
+   .key_size= sizeof(u32),
+   .value_size  = sizeof(u64),
+   .max_entries = 42,
+   .map_flags   = 0
+  };
+
+.. _samples/bpf/bpf_load:
+   
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/bpf_load.c
+
+

Re: [iovisor-dev] Documentation on eBPF map types?

2017-02-02 Thread Jesper Dangaard Brouer via iovisor-dev
On Tue, 31 Jan 2017 20:54:10 -0800
Alexei Starovoitov  wrote:

> On Tue, Jan 31, 2017 at 9:54 AM, Jesper Dangaard Brouer via
> iovisor-dev  wrote:
> >
> > On Sat, 28 Jan 2017 10:14:58 +1100 Brendan Gregg 
> >  wrote:
> >  
> >> I did some in the bcc ref guide, but it's incomplete, and the bcc versions:
> >> https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md  
> >
> > Thanks! - this seem rather BCC specific syntax, and I'm looking for
> > documentation close for the kernel (samples/bpf/).
> >
> > The best documentation I found was the man-page for the syscall bpf(2):
> >  http://man7.org/linux/man-pages/man2/bpf.2.html
> >
> > In lack of a better place, I've started documenting eBPF here:
> >  https://prototype-kernel.readthedocs.io/en/latest/bpf/index.html
> >
> > This doc is compatible with the kernels doc format, and I hope we can
> > push this into the kernel tree, if it turns out to be valuable?
> > (https://www.kernel.org/doc/html/latest/)  
> 
> yeah. definitely would be great to add map descriptions to the kernel docs.
> So far most of it is in commit logs.
> git log kernel/bpf/arraymap.c|tail -33
> git log kernel/bpf/hashtab.c|tail -33
> will give an overview of key hash and array map principles.

Thanks, I'm using that to write some doc.
 http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html
Gotten to BPF_MAP_TYPE_ARRAY
 
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#bpf-map-type-array
 

Can you explain the difference between the kernel and userspace side of
the call bpf_map_lookup_elem() ?

Kernel side:
  long *value = bpf_map_lookup_elem(_map, );

Userspace side:
  long long value;
  bpf_map_lookup_elem(map_fd[0], , )

Looks like userspace gets a copy of the memory...
If so, how can userspace then increment the value safely?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-01-31 Thread Alexei Starovoitov via iovisor-dev
On Tue, Jan 31, 2017 at 9:54 AM, Jesper Dangaard Brouer via
iovisor-dev  wrote:
>
> On Sat, 28 Jan 2017 10:14:58 +1100 Brendan Gregg  
> wrote:
>
>> I did some in the bcc ref guide, but it's incomplete, and the bcc versions:
>> https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md
>
> Thanks! - this seem rather BCC specific syntax, and I'm looking for
> documentation close for the kernel (samples/bpf/).
>
> The best documentation I found was the man-page for the syscall bpf(2):
>  http://man7.org/linux/man-pages/man2/bpf.2.html
>
> In lack of a better place, I've started documenting eBPF here:
>  https://prototype-kernel.readthedocs.io/en/latest/bpf/index.html
>
> This doc is compatible with the kernels doc format, and I hope we can
> push this into the kernel tree, if it turns out to be valuable?
> (https://www.kernel.org/doc/html/latest/)

yeah. definitely would be great to add map descriptions to the kernel docs.
So far most of it is in commit logs.
git log kernel/bpf/arraymap.c|tail -33
git log kernel/bpf/hashtab.c|tail -33
will give an overview of key hash and array map principles.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-01-31 Thread Jesper Dangaard Brouer via iovisor-dev

On Sat, 28 Jan 2017 10:14:58 +1100 Brendan Gregg  
wrote:

> I did some in the bcc ref guide, but it's incomplete, and the bcc versions:
> https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md

Thanks! - this seem rather BCC specific syntax, and I'm looking for
documentation close for the kernel (samples/bpf/).

The best documentation I found was the man-page for the syscall bpf(2):
 http://man7.org/linux/man-pages/man2/bpf.2.html

In lack of a better place, I've started documenting eBPF here:
 https://prototype-kernel.readthedocs.io/en/latest/bpf/index.html

This doc is compatible with the kernels doc format, and I hope we can
push this into the kernel tree, if it turns out to be valuable?
(https://www.kernel.org/doc/html/latest/)

--Jesper


> On Fri, Jan 27, 2017 at 9:54 PM, Jesper Dangaard Brouer via iovisor-dev <
> iovisor-dev@lists.iovisor.org> wrote:
> 
> > Hi IOvisor/eBPF people,
> >
> > Do we have some documentation on eBPF maps?
> >
> > Like that map types are available, and what they are useful for?
> >
> > Notice, just list listing[1] the enum is not so useful:
> >
> >  enum bpf_map_type {
> > BPF_MAP_TYPE_UNSPEC,
> > BPF_MAP_TYPE_HASH,
> > BPF_MAP_TYPE_ARRAY,
> > BPF_MAP_TYPE_PROG_ARRAY,
> > BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> > BPF_MAP_TYPE_PERCPU_HASH,
> > BPF_MAP_TYPE_PERCPU_ARRAY,
> > BPF_MAP_TYPE_STACK_TRACE,
> > BPF_MAP_TYPE_CGROUP_ARRAY,
> > BPF_MAP_TYPE_LRU_HASH,
> > BPF_MAP_TYPE_LRU_PERCPU_HASH,
> >  };
> >
> > [1] http://lxr.free-electrons.com/source/tools/include/uapi/
> > linux/bpf.h?v=4.9#L78
> >
> > I also lack some documentation about how I interact with these maps...


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Documentation on eBPF map types?

2017-01-27 Thread Brendan Gregg via iovisor-dev
I did some in the bcc ref guide, but it's incomplete, and the bcc versions:
https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md

Brendan

On Fri, Jan 27, 2017 at 9:54 PM, Jesper Dangaard Brouer via iovisor-dev <
iovisor-dev@lists.iovisor.org> wrote:

> Hi IOvisor/eBPF people,
>
> Do we have some documentation on eBPF maps?
>
> Like that map types are available, and what they are useful for?
>
> Notice, just list listing[1] the enum is not so useful:
>
>  enum bpf_map_type {
> BPF_MAP_TYPE_UNSPEC,
> BPF_MAP_TYPE_HASH,
> BPF_MAP_TYPE_ARRAY,
> BPF_MAP_TYPE_PROG_ARRAY,
> BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> BPF_MAP_TYPE_PERCPU_HASH,
> BPF_MAP_TYPE_PERCPU_ARRAY,
> BPF_MAP_TYPE_STACK_TRACE,
> BPF_MAP_TYPE_CGROUP_ARRAY,
> BPF_MAP_TYPE_LRU_HASH,
> BPF_MAP_TYPE_LRU_PERCPU_HASH,
>  };
>
> [1] http://lxr.free-electrons.com/source/tools/include/uapi/
> linux/bpf.h?v=4.9#L78
>
> I also lack some documentation about how I interact with these maps...
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> ___
> iovisor-dev mailing list
> iovisor-dev@lists.iovisor.org
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


[iovisor-dev] Documentation on eBPF map types?

2017-01-27 Thread Jesper Dangaard Brouer via iovisor-dev
Hi IOvisor/eBPF people,

Do we have some documentation on eBPF maps?

Like that map types are available, and what they are useful for?

Notice, just list listing[1] the enum is not so useful:

 enum bpf_map_type {
BPF_MAP_TYPE_UNSPEC,
BPF_MAP_TYPE_HASH,
BPF_MAP_TYPE_ARRAY,
BPF_MAP_TYPE_PROG_ARRAY,
BPF_MAP_TYPE_PERF_EVENT_ARRAY,
BPF_MAP_TYPE_PERCPU_HASH,
BPF_MAP_TYPE_PERCPU_ARRAY,
BPF_MAP_TYPE_STACK_TRACE,
BPF_MAP_TYPE_CGROUP_ARRAY,
BPF_MAP_TYPE_LRU_HASH,
BPF_MAP_TYPE_LRU_PERCPU_HASH,
 };

[1] 
http://lxr.free-electrons.com/source/tools/include/uapi/linux/bpf.h?v=4.9#L78

I also lack some documentation about how I interact with these maps...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev