Re: [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps

2014-08-25 Thread Alexei Starovoitov
On Mon, Aug 25, 2014 at 2:33 PM, Cong Wang  wrote:
> On Sun, Aug 24, 2014 at 1:21 PM, Alexei Starovoitov  wrote:
>> 'maps' is a generic storage of different types for sharing data between 
>> kernel
>> and userspace.
>>
>> The maps are accessed from user space via BPF syscall, which has commands:
>>
>> - create a map with given type and attributes
>>   fd = bpf_map_create(map_type, struct nlattr *attr, int len)
>>   returns fd or negative error
>>
>> - lookup key in a given map referenced by fd
>>   err = bpf_map_lookup_elem(int fd, void *key, void *value)
>>   returns zero and stores found elem into value or negative error
>>
>> - create or update key/value pair in a given map
>>   err = bpf_map_update_elem(int fd, void *key, void *value)
>>   returns zero or negative error
>>
>> - find and delete element by key in a given map
>>   err = bpf_map_delete_elem(int fd, void *key)
>>
>> - iterate map elements (based on input key return next_key)
>>   err = bpf_map_get_next_key(int fd, void *key, void *next_key)
>
>
> I think you need to document the bpf() syscall instead of wrappers on it,
> from a developer's point of view. You will anyway need to document a new
> syscall with a man page as a general rule.

yep. I've mentioned before that man page is on todo list. I'm delaying
writing it, because it's the most difficult part and I don't want to keep
rewriting it when interface changes (like it did from global id to fd).
Once implementation lands, manpage will be the highest priority.

> In the changelog I mean something like:
>
> err = bpf(BPF_MAP_LOOKUP_ELEM, ...);

Are you saying instead of:
err = bpf_map_lookup_elem(int fd, void *key, void *value)
write
err = bpf(BPF_MAP_LOOKUP_ELEM, fd, key, value)
in commit log?
I think that style carries less information per line.

For man page I'll document the syscall in a traditional way, but
for commit log I like to have maximum info in the fewest lines.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps

2014-08-25 Thread Cong Wang
On Sun, Aug 24, 2014 at 1:21 PM, Alexei Starovoitov  wrote:
> 'maps' is a generic storage of different types for sharing data between kernel
> and userspace.
>
> The maps are accessed from user space via BPF syscall, which has commands:
>
> - create a map with given type and attributes
>   fd = bpf_map_create(map_type, struct nlattr *attr, int len)
>   returns fd or negative error
>
> - lookup key in a given map referenced by fd
>   err = bpf_map_lookup_elem(int fd, void *key, void *value)
>   returns zero and stores found elem into value or negative error
>
> - create or update key/value pair in a given map
>   err = bpf_map_update_elem(int fd, void *key, void *value)
>   returns zero or negative error
>
> - find and delete element by key in a given map
>   err = bpf_map_delete_elem(int fd, void *key)
>
> - iterate map elements (based on input key return next_key)
>   err = bpf_map_get_next_key(int fd, void *key, void *next_key)


I think you need to document the bpf() syscall instead of wrappers on it,
from a developer's point of view. You will anyway need to document a new
syscall with a man page as a general rule.

In the changelog I mean something like:

err = bpf(BPF_MAP_LOOKUP_ELEM, ...);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps

2014-08-25 Thread Cong Wang
On Sun, Aug 24, 2014 at 1:21 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 'maps' is a generic storage of different types for sharing data between kernel
 and userspace.

 The maps are accessed from user space via BPF syscall, which has commands:

 - create a map with given type and attributes
   fd = bpf_map_create(map_type, struct nlattr *attr, int len)
   returns fd or negative error

 - lookup key in a given map referenced by fd
   err = bpf_map_lookup_elem(int fd, void *key, void *value)
   returns zero and stores found elem into value or negative error

 - create or update key/value pair in a given map
   err = bpf_map_update_elem(int fd, void *key, void *value)
   returns zero or negative error

 - find and delete element by key in a given map
   err = bpf_map_delete_elem(int fd, void *key)

 - iterate map elements (based on input key return next_key)
   err = bpf_map_get_next_key(int fd, void *key, void *next_key)


I think you need to document the bpf() syscall instead of wrappers on it,
from a developer's point of view. You will anyway need to document a new
syscall with a man page as a general rule.

In the changelog I mean something like:

err = bpf(BPF_MAP_LOOKUP_ELEM, ...);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps

2014-08-25 Thread Alexei Starovoitov
On Mon, Aug 25, 2014 at 2:33 PM, Cong Wang cw...@twopensource.com wrote:
 On Sun, Aug 24, 2014 at 1:21 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 'maps' is a generic storage of different types for sharing data between 
 kernel
 and userspace.

 The maps are accessed from user space via BPF syscall, which has commands:

 - create a map with given type and attributes
   fd = bpf_map_create(map_type, struct nlattr *attr, int len)
   returns fd or negative error

 - lookup key in a given map referenced by fd
   err = bpf_map_lookup_elem(int fd, void *key, void *value)
   returns zero and stores found elem into value or negative error

 - create or update key/value pair in a given map
   err = bpf_map_update_elem(int fd, void *key, void *value)
   returns zero or negative error

 - find and delete element by key in a given map
   err = bpf_map_delete_elem(int fd, void *key)

 - iterate map elements (based on input key return next_key)
   err = bpf_map_get_next_key(int fd, void *key, void *next_key)


 I think you need to document the bpf() syscall instead of wrappers on it,
 from a developer's point of view. You will anyway need to document a new
 syscall with a man page as a general rule.

yep. I've mentioned before that man page is on todo list. I'm delaying
writing it, because it's the most difficult part and I don't want to keep
rewriting it when interface changes (like it did from global id to fd).
Once implementation lands, manpage will be the highest priority.

 In the changelog I mean something like:

 err = bpf(BPF_MAP_LOOKUP_ELEM, ...);

Are you saying instead of:
err = bpf_map_lookup_elem(int fd, void *key, void *value)
write
err = bpf(BPF_MAP_LOOKUP_ELEM, fd, key, value)
in commit log?
I think that style carries less information per line.

For man page I'll document the syscall in a traditional way, but
for commit log I like to have maximum info in the fewest lines.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps

2014-08-24 Thread Alexei Starovoitov
'maps' is a generic storage of different types for sharing data between kernel
and userspace.

The maps are accessed from user space via BPF syscall, which has commands:

- create a map with given type and attributes
  fd = bpf_map_create(map_type, struct nlattr *attr, int len)
  returns fd or negative error

- lookup key in a given map referenced by fd
  err = bpf_map_lookup_elem(int fd, void *key, void *value)
  returns zero and stores found elem into value or negative error

- create or update key/value pair in a given map
  err = bpf_map_update_elem(int fd, void *key, void *value)
  returns zero or negative error

- find and delete element by key in a given map
  err = bpf_map_delete_elem(int fd, void *key)

- iterate map elements (based on input key return next_key)
  err = bpf_map_get_next_key(int fd, void *key, void *next_key)

- close(fd) deletes the map

Signed-off-by: Alexei Starovoitov 
---
 include/linux/bpf.h  |8 ++
 include/uapi/linux/bpf.h |   25 ++
 kernel/bpf/syscall.c |  198 ++
 3 files changed, 231 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 607ca53fe2af..fd1ac4b5ba8b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 struct bpf_map;
 struct nlattr;
@@ -18,6 +19,12 @@ struct bpf_map_ops {
/* funcs callable from userspace (via syscall) */
struct bpf_map *(*map_alloc)(struct nlattr *attrs[BPF_MAP_ATTR_MAX + 
1]);
void (*map_free)(struct bpf_map *);
+   int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
+
+   /* funcs callable from userspace and from eBPF programs */
+   void *(*map_lookup_elem)(struct bpf_map *map, void *key);
+   int (*map_update_elem)(struct bpf_map *map, void *key, void *value);
+   int (*map_delete_elem)(struct bpf_map *map, void *key);
 };
 
 struct bpf_map {
@@ -38,5 +45,6 @@ struct bpf_map_type_list {
 
 void bpf_register_map_type(struct bpf_map_type_list *tl);
 void bpf_map_put(struct bpf_map *map);
+struct bpf_map *bpf_map_get(struct fd f);
 
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index de21c8ecf0bb..f68edb2681f8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -326,6 +326,31 @@ enum bpf_cmd {
 * map is deleted when fd is closed
 */
BPF_MAP_CREATE,
+
+   /* lookup key in a given map
+* err = bpf_map_lookup_elem(int fd, void *key, void *value)
+* returns zero and stores found elem into value
+* or negative error
+*/
+   BPF_MAP_LOOKUP_ELEM,
+
+   /* create or update key/value pair in a given map
+* err = bpf_map_update_elem(int fd, void *key, void *value)
+* returns zero or negative error
+*/
+   BPF_MAP_UPDATE_ELEM,
+
+   /* find and delete elem by key in a given map
+* err = bpf_map_delete_elem(int fd, void *key)
+* returns zero or negative error
+*/
+   BPF_MAP_DELETE_ELEM,
+
+   /* lookup key in a given map and return next key
+* err = bpf_map_get_elem(int fd, void *key, void *next_key)
+* returns zero and stores next key or negative error
+*/
+   BPF_MAP_GET_NEXT_KEY,
 };
 
 enum bpf_map_attributes {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 04cdf7948f8f..45e100ece1b7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static LIST_HEAD(bpf_map_types);
 
@@ -131,6 +132,189 @@ free_attr:
return err;
 }
 
+/* if error is returned, fd is released.
+ * On success caller should complete fd access with matching fdput()
+ */
+struct bpf_map *bpf_map_get(struct fd f)
+{
+   struct bpf_map *map;
+
+   if (!f.file)
+   return ERR_PTR(-EBADF);
+
+   if (f.file->f_op != _map_fops) {
+   fdput(f);
+   return ERR_PTR(-EINVAL);
+   }
+
+   map = f.file->private_data;
+
+   return map;
+}
+
+static int map_lookup_elem(int ufd, void __user *ukey, void __user *uvalue)
+{
+   struct fd f = fdget(ufd);
+   struct bpf_map *map;
+   void *key, *value;
+   int err;
+
+   map = bpf_map_get(f);
+   if (IS_ERR(map))
+   return PTR_ERR(map);
+
+   err = -ENOMEM;
+   key = kmalloc(map->key_size, GFP_USER);
+   if (!key)
+   goto err_put;
+
+   err = -EFAULT;
+   if (copy_from_user(key, ukey, map->key_size) != 0)
+   goto free_key;
+
+   err = -ESRCH;
+   rcu_read_lock();
+   value = map->ops->map_lookup_elem(map, key);
+   if (!value)
+   goto err_unlock;
+
+   err = -EFAULT;
+   if (copy_to_user(uvalue, value, map->value_size) != 0)
+   goto err_unlock;
+
+   err = 0;
+
+err_unlock:
+   rcu_read_unlock();
+free_key:
+   kfree(key);
+err_put:

[PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps

2014-08-24 Thread Alexei Starovoitov
'maps' is a generic storage of different types for sharing data between kernel
and userspace.

The maps are accessed from user space via BPF syscall, which has commands:

- create a map with given type and attributes
  fd = bpf_map_create(map_type, struct nlattr *attr, int len)
  returns fd or negative error

- lookup key in a given map referenced by fd
  err = bpf_map_lookup_elem(int fd, void *key, void *value)
  returns zero and stores found elem into value or negative error

- create or update key/value pair in a given map
  err = bpf_map_update_elem(int fd, void *key, void *value)
  returns zero or negative error

- find and delete element by key in a given map
  err = bpf_map_delete_elem(int fd, void *key)

- iterate map elements (based on input key return next_key)
  err = bpf_map_get_next_key(int fd, void *key, void *next_key)

- close(fd) deletes the map

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 include/linux/bpf.h  |8 ++
 include/uapi/linux/bpf.h |   25 ++
 kernel/bpf/syscall.c |  198 ++
 3 files changed, 231 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 607ca53fe2af..fd1ac4b5ba8b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -9,6 +9,7 @@
 
 #include uapi/linux/bpf.h
 #include linux/workqueue.h
+#include linux/file.h
 
 struct bpf_map;
 struct nlattr;
@@ -18,6 +19,12 @@ struct bpf_map_ops {
/* funcs callable from userspace (via syscall) */
struct bpf_map *(*map_alloc)(struct nlattr *attrs[BPF_MAP_ATTR_MAX + 
1]);
void (*map_free)(struct bpf_map *);
+   int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
+
+   /* funcs callable from userspace and from eBPF programs */
+   void *(*map_lookup_elem)(struct bpf_map *map, void *key);
+   int (*map_update_elem)(struct bpf_map *map, void *key, void *value);
+   int (*map_delete_elem)(struct bpf_map *map, void *key);
 };
 
 struct bpf_map {
@@ -38,5 +45,6 @@ struct bpf_map_type_list {
 
 void bpf_register_map_type(struct bpf_map_type_list *tl);
 void bpf_map_put(struct bpf_map *map);
+struct bpf_map *bpf_map_get(struct fd f);
 
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index de21c8ecf0bb..f68edb2681f8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -326,6 +326,31 @@ enum bpf_cmd {
 * map is deleted when fd is closed
 */
BPF_MAP_CREATE,
+
+   /* lookup key in a given map
+* err = bpf_map_lookup_elem(int fd, void *key, void *value)
+* returns zero and stores found elem into value
+* or negative error
+*/
+   BPF_MAP_LOOKUP_ELEM,
+
+   /* create or update key/value pair in a given map
+* err = bpf_map_update_elem(int fd, void *key, void *value)
+* returns zero or negative error
+*/
+   BPF_MAP_UPDATE_ELEM,
+
+   /* find and delete elem by key in a given map
+* err = bpf_map_delete_elem(int fd, void *key)
+* returns zero or negative error
+*/
+   BPF_MAP_DELETE_ELEM,
+
+   /* lookup key in a given map and return next key
+* err = bpf_map_get_elem(int fd, void *key, void *next_key)
+* returns zero and stores next key or negative error
+*/
+   BPF_MAP_GET_NEXT_KEY,
 };
 
 enum bpf_map_attributes {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 04cdf7948f8f..45e100ece1b7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -13,6 +13,7 @@
 #include linux/syscalls.h
 #include net/netlink.h
 #include linux/anon_inodes.h
+#include linux/file.h
 
 static LIST_HEAD(bpf_map_types);
 
@@ -131,6 +132,189 @@ free_attr:
return err;
 }
 
+/* if error is returned, fd is released.
+ * On success caller should complete fd access with matching fdput()
+ */
+struct bpf_map *bpf_map_get(struct fd f)
+{
+   struct bpf_map *map;
+
+   if (!f.file)
+   return ERR_PTR(-EBADF);
+
+   if (f.file-f_op != bpf_map_fops) {
+   fdput(f);
+   return ERR_PTR(-EINVAL);
+   }
+
+   map = f.file-private_data;
+
+   return map;
+}
+
+static int map_lookup_elem(int ufd, void __user *ukey, void __user *uvalue)
+{
+   struct fd f = fdget(ufd);
+   struct bpf_map *map;
+   void *key, *value;
+   int err;
+
+   map = bpf_map_get(f);
+   if (IS_ERR(map))
+   return PTR_ERR(map);
+
+   err = -ENOMEM;
+   key = kmalloc(map-key_size, GFP_USER);
+   if (!key)
+   goto err_put;
+
+   err = -EFAULT;
+   if (copy_from_user(key, ukey, map-key_size) != 0)
+   goto free_key;
+
+   err = -ESRCH;
+   rcu_read_lock();
+   value = map-ops-map_lookup_elem(map, key);
+   if (!value)
+   goto err_unlock;
+
+   err = -EFAULT;
+   if (copy_to_user(uvalue, value, map-value_size) != 0)
+