Re: [PATCH RFC net-next 13/14] samples: bpf: example of stateful socket filtering

2014-06-28 Thread Alexei Starovoitov
On Fri, Jun 27, 2014 at 5:21 PM, Andy Lutomirski  wrote:
> On Fri, Jun 27, 2014 at 5:06 PM, Alexei Starovoitov  wrote:
>> this socket filter example does:
>>
>> - creates a hashtable in kernel with key 4 bytes and value 8 bytes
>>
>> - populates map[6] = 0; map[17] = 0;  // 6 - tcp_proto, 17 - udp_proto
>>
>> - loads eBPF program:
>>   r0 = skb[14 + 9]; // load one byte of ip->proto
>>   *(u32*)(fp - 4) = r0;
>>   value = bpf_map_lookup_elem(map_id, fp - 4);
>>   if (value)
>>(*(u64*)value) += 1;
>
> In the code below, this is XADD.  Is there anything that validates
> that shared things like this can only be poked at by atomic
> operations?

Correct. The asm code uses xadd to increment packet stats.
It's up to the program itself to decide what it's doing.
Some programs may prefer speed vs accuracy when counting
and they will be using regular "ld, add, st", instead of xadd.
Verifier checks that programs can only access a valid memory
region. The program itself needs to do something sensible with it.
Theoretically I can add a check to verifier that shared map elements
are read-only and xadd-only, but that limits usability and unnecessary.
We actually do have a use case when we do a regular add, since
'lock add' is too costly at high event rates.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 13/14] samples: bpf: example of stateful socket filtering

2014-06-28 Thread Alexei Starovoitov
On Fri, Jun 27, 2014 at 5:21 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Fri, Jun 27, 2014 at 5:06 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 this socket filter example does:

 - creates a hashtable in kernel with key 4 bytes and value 8 bytes

 - populates map[6] = 0; map[17] = 0;  // 6 - tcp_proto, 17 - udp_proto

 - loads eBPF program:
   r0 = skb[14 + 9]; // load one byte of ip-proto
   *(u32*)(fp - 4) = r0;
   value = bpf_map_lookup_elem(map_id, fp - 4);
   if (value)
(*(u64*)value) += 1;

 In the code below, this is XADD.  Is there anything that validates
 that shared things like this can only be poked at by atomic
 operations?

Correct. The asm code uses xadd to increment packet stats.
It's up to the program itself to decide what it's doing.
Some programs may prefer speed vs accuracy when counting
and they will be using regular ld, add, st, instead of xadd.
Verifier checks that programs can only access a valid memory
region. The program itself needs to do something sensible with it.
Theoretically I can add a check to verifier that shared map elements
are read-only and xadd-only, but that limits usability and unnecessary.
We actually do have a use case when we do a regular add, since
'lock add' is too costly at high event rates.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 13/14] samples: bpf: example of stateful socket filtering

2014-06-27 Thread Andy Lutomirski
On Fri, Jun 27, 2014 at 5:06 PM, Alexei Starovoitov  wrote:
> this socket filter example does:
>
> - creates a hashtable in kernel with key 4 bytes and value 8 bytes
>
> - populates map[6] = 0; map[17] = 0;  // 6 - tcp_proto, 17 - udp_proto
>
> - loads eBPF program:
>   r0 = skb[14 + 9]; // load one byte of ip->proto
>   *(u32*)(fp - 4) = r0;
>   value = bpf_map_lookup_elem(map_id, fp - 4);
>   if (value)
>(*(u64*)value) += 1;

In the code below, this is XADD.  Is there anything that validates
that shared things like this can only be poked at by atomic
operations?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC net-next 13/14] samples: bpf: example of stateful socket filtering

2014-06-27 Thread Alexei Starovoitov
this socket filter example does:

- creates a hashtable in kernel with key 4 bytes and value 8 bytes

- populates map[6] = 0; map[17] = 0;  // 6 - tcp_proto, 17 - udp_proto

- loads eBPF program:
  r0 = skb[14 + 9]; // load one byte of ip->proto
  *(u32*)(fp - 4) = r0;
  value = bpf_map_lookup_elem(map_id, fp - 4);
  if (value)
   (*(u64*)value) += 1;

- attaches this program to eth0 raw socket

- every second user space reads map[6] and map[17] to see how many
  TCP and UDP packets were seen on eth0

Signed-off-by: Alexei Starovoitov 
---
 samples/bpf/.gitignore |1 +
 samples/bpf/Makefile   |   13 
 samples/bpf/sock_example.c |  160 
 3 files changed, 174 insertions(+)
 create mode 100644 samples/bpf/.gitignore
 create mode 100644 samples/bpf/Makefile
 create mode 100644 samples/bpf/sock_example.c

diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore
new file mode 100644
index ..5465c6e92a00
--- /dev/null
+++ b/samples/bpf/.gitignore
@@ -0,0 +1 @@
+sock_example
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
new file mode 100644
index ..95c990151644
--- /dev/null
+++ b/samples/bpf/Makefile
@@ -0,0 +1,13 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+# List of programs to build
+hostprogs-y := sock_example
+
+sock_example-objs := sock_example.o libbpf.o
+
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
+
+HOSTCFLAGS_libbpf.o += -I$(objtree)/usr/include
+HOSTCFLAGS_sock_example.o += -I$(objtree)/usr/include
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
new file mode 100644
index ..5cf091571d4f
--- /dev/null
+++ b/samples/bpf/sock_example.c
@@ -0,0 +1,160 @@
+/* eBPF example program:
+ * - creates a hashtable in kernel with key 4 bytes and value 8 bytes
+ *
+ * - populates map[6] = 0; map[17] = 0;  // 6 - tcp_proto, 17 - udp_proto
+ *
+ * - loads eBPF program:
+ *   r0 = skb[14 + 9]; // load one byte of ip->proto
+ *   *(u32*)(fp - 4) = r0;
+ *   value = bpf_map_lookup_elem(map_id, fp - 4);
+ *   if (value)
+ *(*(u64*)value) += 1;
+ *
+ * - attaches this program to eth0 raw socket
+ *
+ * - every second user space reads map[6] and map[17] to see how many
+ *   TCP and UDP packets were seen on eth0
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "libbpf.h"
+
+static int open_raw_sock(const char *name)
+{
+   struct sockaddr_ll sll;
+   struct packet_mreq mr;
+   struct ifreq ifr;
+   int sock;
+
+   sock = socket(PF_PACKET, SOCK_RAW | SOCK_NONBLOCK | SOCK_CLOEXEC, 
htons(ETH_P_ALL));
+   if (sock < 0) {
+   printf("cannot open socket!\n");
+   return -1;
+   }
+
+   memset(, 0, sizeof(ifr));
+   strncpy((char *)ifr.ifr_name, name, IFNAMSIZ);
+   if (ioctl(sock, SIOCGIFINDEX, ) < 0) {
+   printf("ioctl: %s\n", strerror(errno));
+   close(sock);
+   return -1;
+   }
+
+   memset(, 0, sizeof(sll));
+   sll.sll_family = AF_PACKET;
+   sll.sll_ifindex = ifr.ifr_ifindex;
+   sll.sll_protocol = htons(ETH_P_ALL);
+   if (bind(sock, (struct sockaddr *), sizeof(sll)) < 0) {
+   printf("bind: %s\n", strerror(errno));
+   close(sock);
+   return -1;
+   }
+
+   memset(, 0, sizeof(mr));
+   mr.mr_ifindex = ifr.ifr_ifindex;
+   mr.mr_type = PACKET_MR_PROMISC;
+   if (setsockopt(sock, SOL_PACKET, PACKET_ADD_MEMBERSHIP, , 
sizeof(mr)) < 0) {
+   printf("set_promisc: %s\n", strerror(errno));
+   close(sock);
+   return -1;
+   }
+   return sock;
+}
+
+#define MAP_ID 1
+
+static int test_sock(void)
+{
+   static struct sock_filter_int prog[] = {
+   BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+   BPF_LD_ABS(BPF_B, 14 + 9 /* R0 = ip->proto */),
+   BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 
4) = r0 */
+   BPF_ALU64_REG(BPF_MOV, BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
+   BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, MAP_ID), /* r1 = MAP_ID */
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 
BPF_FUNC_map_lookup_elem),
+   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+   BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1), /* r1 = 1 */
+   BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 
0, 0), /* xadd r0 += r1 */
+   BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 0), /* r0 = 0 */
+   BPF_EXIT_INSN(),
+   };
+
+   int sock = -1, prog_id = 1, i, key;
+   long long value = 0, tcp_cnt, udp_cnt;
+
+   if (bpf_create_map(MAP_ID, sizeof(key), 

[PATCH RFC net-next 13/14] samples: bpf: example of stateful socket filtering

2014-06-27 Thread Alexei Starovoitov
this socket filter example does:

- creates a hashtable in kernel with key 4 bytes and value 8 bytes

- populates map[6] = 0; map[17] = 0;  // 6 - tcp_proto, 17 - udp_proto

- loads eBPF program:
  r0 = skb[14 + 9]; // load one byte of ip-proto
  *(u32*)(fp - 4) = r0;
  value = bpf_map_lookup_elem(map_id, fp - 4);
  if (value)
   (*(u64*)value) += 1;

- attaches this program to eth0 raw socket

- every second user space reads map[6] and map[17] to see how many
  TCP and UDP packets were seen on eth0

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 samples/bpf/.gitignore |1 +
 samples/bpf/Makefile   |   13 
 samples/bpf/sock_example.c |  160 
 3 files changed, 174 insertions(+)
 create mode 100644 samples/bpf/.gitignore
 create mode 100644 samples/bpf/Makefile
 create mode 100644 samples/bpf/sock_example.c

diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore
new file mode 100644
index ..5465c6e92a00
--- /dev/null
+++ b/samples/bpf/.gitignore
@@ -0,0 +1 @@
+sock_example
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
new file mode 100644
index ..95c990151644
--- /dev/null
+++ b/samples/bpf/Makefile
@@ -0,0 +1,13 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+# List of programs to build
+hostprogs-y := sock_example
+
+sock_example-objs := sock_example.o libbpf.o
+
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
+
+HOSTCFLAGS_libbpf.o += -I$(objtree)/usr/include
+HOSTCFLAGS_sock_example.o += -I$(objtree)/usr/include
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
new file mode 100644
index ..5cf091571d4f
--- /dev/null
+++ b/samples/bpf/sock_example.c
@@ -0,0 +1,160 @@
+/* eBPF example program:
+ * - creates a hashtable in kernel with key 4 bytes and value 8 bytes
+ *
+ * - populates map[6] = 0; map[17] = 0;  // 6 - tcp_proto, 17 - udp_proto
+ *
+ * - loads eBPF program:
+ *   r0 = skb[14 + 9]; // load one byte of ip-proto
+ *   *(u32*)(fp - 4) = r0;
+ *   value = bpf_map_lookup_elem(map_id, fp - 4);
+ *   if (value)
+ *(*(u64*)value) += 1;
+ *
+ * - attaches this program to eth0 raw socket
+ *
+ * - every second user space reads map[6] and map[17] to see how many
+ *   TCP and UDP packets were seen on eth0
+ */
+#include stdio.h
+#include unistd.h
+#include asm-generic/socket.h
+#include linux/netlink.h
+#include net/ethernet.h
+#include net/if.h
+#include linux/sockios.h
+#include linux/if_packet.h
+#include linux/bpf.h
+#include errno.h
+#include sys/socket.h
+#include sys/ioctl.h
+#include linux/unistd.h
+#include string.h
+#include linux/filter.h
+#include stdlib.h
+#include arpa/inet.h
+#include libbpf.h
+
+static int open_raw_sock(const char *name)
+{
+   struct sockaddr_ll sll;
+   struct packet_mreq mr;
+   struct ifreq ifr;
+   int sock;
+
+   sock = socket(PF_PACKET, SOCK_RAW | SOCK_NONBLOCK | SOCK_CLOEXEC, 
htons(ETH_P_ALL));
+   if (sock  0) {
+   printf(cannot open socket!\n);
+   return -1;
+   }
+
+   memset(ifr, 0, sizeof(ifr));
+   strncpy((char *)ifr.ifr_name, name, IFNAMSIZ);
+   if (ioctl(sock, SIOCGIFINDEX, ifr)  0) {
+   printf(ioctl: %s\n, strerror(errno));
+   close(sock);
+   return -1;
+   }
+
+   memset(sll, 0, sizeof(sll));
+   sll.sll_family = AF_PACKET;
+   sll.sll_ifindex = ifr.ifr_ifindex;
+   sll.sll_protocol = htons(ETH_P_ALL);
+   if (bind(sock, (struct sockaddr *)sll, sizeof(sll))  0) {
+   printf(bind: %s\n, strerror(errno));
+   close(sock);
+   return -1;
+   }
+
+   memset(mr, 0, sizeof(mr));
+   mr.mr_ifindex = ifr.ifr_ifindex;
+   mr.mr_type = PACKET_MR_PROMISC;
+   if (setsockopt(sock, SOL_PACKET, PACKET_ADD_MEMBERSHIP, mr, 
sizeof(mr))  0) {
+   printf(set_promisc: %s\n, strerror(errno));
+   close(sock);
+   return -1;
+   }
+   return sock;
+}
+
+#define MAP_ID 1
+
+static int test_sock(void)
+{
+   static struct sock_filter_int prog[] = {
+   BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+   BPF_LD_ABS(BPF_B, 14 + 9 /* R0 = ip-proto */),
+   BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 
4) = r0 */
+   BPF_ALU64_REG(BPF_MOV, BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
+   BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, MAP_ID), /* r1 = MAP_ID */
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 
BPF_FUNC_map_lookup_elem),
+   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+   BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1), /* r1 = 1 */
+   BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 
0, 0), /* xadd r0 += r1 */
+   BPF_ALU64_IMM(BPF_MOV, 

Re: [PATCH RFC net-next 13/14] samples: bpf: example of stateful socket filtering

2014-06-27 Thread Andy Lutomirski
On Fri, Jun 27, 2014 at 5:06 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 this socket filter example does:

 - creates a hashtable in kernel with key 4 bytes and value 8 bytes

 - populates map[6] = 0; map[17] = 0;  // 6 - tcp_proto, 17 - udp_proto

 - loads eBPF program:
   r0 = skb[14 + 9]; // load one byte of ip-proto
   *(u32*)(fp - 4) = r0;
   value = bpf_map_lookup_elem(map_id, fp - 4);
   if (value)
(*(u64*)value) += 1;

In the code below, this is XADD.  Is there anything that validates
that shared things like this can only be poked at by atomic
operations?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/