Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-26 Thread Naveen N. Rao
On 2016/09/26 11:00AM, Daniel Borkmann wrote:
> On 09/26/2016 10:56 AM, Naveen N. Rao wrote:
> > On 2016/09/24 03:30AM, Alexei Starovoitov wrote:
> > > On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:
> > > > On 09/23/2016 10:35 PM, Naveen N. Rao wrote:
> > > > > Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
> > > > > programs. This can be achieved either by:
> > > > > (1) retaining the stack setup by the first eBPF program and having all
> > > > > subsequent eBPF programs re-using it, or,
> > > > > (2) by unwinding/tearing down the stack and having each eBPF program
> > > > > deal with its own stack as it sees fit.
> > > > > 
> > > > > To ensure that this does not create loops, there is a limit to how 
> > > > > many
> > > > > tail calls can be done (currently 32). This requires the JIT'ed code 
> > > > > to
> > > > > maintain a count of the number of tail calls done so far.
> > > > > 
> > > > > Approach (1) is simple, but requires every eBPF program to have 
> > > > > (almost)
> > > > > the same prologue/epilogue, regardless of whether they need it. This 
> > > > > is
> > > > > inefficient for small eBPF programs which may not sometimes need a
> > > > > prologue at all. As such, to minimize impact of tail call
> > > > > implementation, we use approach (2) here which needs each eBPF program
> > > > > in the chain to use its own prologue/epilogue. This is not ideal when
> > > > > many tail calls are involved and when all the eBPF programs in the 
> > > > > chain
> > > > > have similar prologue/epilogue. However, the impact is restricted to
> > > > > programs that do tail calls. Individual eBPF programs are not 
> > > > > affected.
> > > > > 
> > > > > We maintain the tail call count in a fixed location on the stack and
> > > > > updated tail call count values are passed in through this. The very
> > > > > first eBPF program in a chain sets this up to 0 (the first 2
> > > > > instructions). Subsequent tail calls skip the first two eBPF JIT
> > > > > instructions to maintain the count. For programs that don't do tail
> > > > > calls themselves, the first two instructions are NOPs.
> > > > > 
> > > > > Signed-off-by: Naveen N. Rao 
> > > > 
> > > > Thanks for adding support, Naveen, that's really great! I think 2) seems
> > > > fine as well in this context as prologue size can vary quite a bit here,
> > > > and depending on program types likelihood of tail call usage as well 
> > > > (but
> > > > I wouldn't expect deep nesting). Thanks a lot!
> > > 
> > > Great stuff. In this circumstances approach 2 makes sense to me as well.
> > 
> > Alexie, Daniel,
> > Thanks for the quick review!
> 
> The patches would go via Michael's tree (same way as with the JIT itself
> in the past), right?

Yes, this set is contained within arch/powerpc, so Michael can take this 
through his tree.

The other set with updates to samples/bpf can probably go through 
David's tree.

- Naveen



Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-26 Thread Daniel Borkmann

On 09/26/2016 10:56 AM, Naveen N. Rao wrote:

On 2016/09/24 03:30AM, Alexei Starovoitov wrote:

On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:

On 09/23/2016 10:35 PM, Naveen N. Rao wrote:

Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
programs. This can be achieved either by:
(1) retaining the stack setup by the first eBPF program and having all
subsequent eBPF programs re-using it, or,
(2) by unwinding/tearing down the stack and having each eBPF program
deal with its own stack as it sees fit.

To ensure that this does not create loops, there is a limit to how many
tail calls can be done (currently 32). This requires the JIT'ed code to
maintain a count of the number of tail calls done so far.

Approach (1) is simple, but requires every eBPF program to have (almost)
the same prologue/epilogue, regardless of whether they need it. This is
inefficient for small eBPF programs which may not sometimes need a
prologue at all. As such, to minimize impact of tail call
implementation, we use approach (2) here which needs each eBPF program
in the chain to use its own prologue/epilogue. This is not ideal when
many tail calls are involved and when all the eBPF programs in the chain
have similar prologue/epilogue. However, the impact is restricted to
programs that do tail calls. Individual eBPF programs are not affected.

We maintain the tail call count in a fixed location on the stack and
updated tail call count values are passed in through this. The very
first eBPF program in a chain sets this up to 0 (the first 2
instructions). Subsequent tail calls skip the first two eBPF JIT
instructions to maintain the count. For programs that don't do tail
calls themselves, the first two instructions are NOPs.

Signed-off-by: Naveen N. Rao 


Thanks for adding support, Naveen, that's really great! I think 2) seems
fine as well in this context as prologue size can vary quite a bit here,
and depending on program types likelihood of tail call usage as well (but
I wouldn't expect deep nesting). Thanks a lot!


Great stuff. In this circumstances approach 2 makes sense to me as well.


Alexie, Daniel,
Thanks for the quick review!


The patches would go via Michael's tree (same way as with the JIT itself
in the past), right?


Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-26 Thread Naveen N. Rao
On 2016/09/24 03:30AM, Alexei Starovoitov wrote:
> On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:
> > On 09/23/2016 10:35 PM, Naveen N. Rao wrote:
> > >Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
> > >programs. This can be achieved either by:
> > >(1) retaining the stack setup by the first eBPF program and having all
> > >subsequent eBPF programs re-using it, or,
> > >(2) by unwinding/tearing down the stack and having each eBPF program
> > >deal with its own stack as it sees fit.
> > >
> > >To ensure that this does not create loops, there is a limit to how many
> > >tail calls can be done (currently 32). This requires the JIT'ed code to
> > >maintain a count of the number of tail calls done so far.
> > >
> > >Approach (1) is simple, but requires every eBPF program to have (almost)
> > >the same prologue/epilogue, regardless of whether they need it. This is
> > >inefficient for small eBPF programs which may not sometimes need a
> > >prologue at all. As such, to minimize impact of tail call
> > >implementation, we use approach (2) here which needs each eBPF program
> > >in the chain to use its own prologue/epilogue. This is not ideal when
> > >many tail calls are involved and when all the eBPF programs in the chain
> > >have similar prologue/epilogue. However, the impact is restricted to
> > >programs that do tail calls. Individual eBPF programs are not affected.
> > >
> > >We maintain the tail call count in a fixed location on the stack and
> > >updated tail call count values are passed in through this. The very
> > >first eBPF program in a chain sets this up to 0 (the first 2
> > >instructions). Subsequent tail calls skip the first two eBPF JIT
> > >instructions to maintain the count. For programs that don't do tail
> > >calls themselves, the first two instructions are NOPs.
> > >
> > >Signed-off-by: Naveen N. Rao 
> > 
> > Thanks for adding support, Naveen, that's really great! I think 2) seems
> > fine as well in this context as prologue size can vary quite a bit here,
> > and depending on program types likelihood of tail call usage as well (but
> > I wouldn't expect deep nesting). Thanks a lot!
> 
> Great stuff. In this circumstances approach 2 makes sense to me as well.

Alexie, Daniel,
Thanks for the quick review!

- Naveen



Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-24 Thread Alexei Starovoitov
On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:
> On 09/23/2016 10:35 PM, Naveen N. Rao wrote:
> >Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
> >programs. This can be achieved either by:
> >(1) retaining the stack setup by the first eBPF program and having all
> >subsequent eBPF programs re-using it, or,
> >(2) by unwinding/tearing down the stack and having each eBPF program
> >deal with its own stack as it sees fit.
> >
> >To ensure that this does not create loops, there is a limit to how many
> >tail calls can be done (currently 32). This requires the JIT'ed code to
> >maintain a count of the number of tail calls done so far.
> >
> >Approach (1) is simple, but requires every eBPF program to have (almost)
> >the same prologue/epilogue, regardless of whether they need it. This is
> >inefficient for small eBPF programs which may not sometimes need a
> >prologue at all. As such, to minimize impact of tail call
> >implementation, we use approach (2) here which needs each eBPF program
> >in the chain to use its own prologue/epilogue. This is not ideal when
> >many tail calls are involved and when all the eBPF programs in the chain
> >have similar prologue/epilogue. However, the impact is restricted to
> >programs that do tail calls. Individual eBPF programs are not affected.
> >
> >We maintain the tail call count in a fixed location on the stack and
> >updated tail call count values are passed in through this. The very
> >first eBPF program in a chain sets this up to 0 (the first 2
> >instructions). Subsequent tail calls skip the first two eBPF JIT
> >instructions to maintain the count. For programs that don't do tail
> >calls themselves, the first two instructions are NOPs.
> >
> >Signed-off-by: Naveen N. Rao 
> 
> Thanks for adding support, Naveen, that's really great! I think 2) seems
> fine as well in this context as prologue size can vary quite a bit here,
> and depending on program types likelihood of tail call usage as well (but
> I wouldn't expect deep nesting). Thanks a lot!

Great stuff. In this circumstances approach 2 makes sense to me as well.



Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-23 Thread Daniel Borkmann

On 09/23/2016 10:35 PM, Naveen N. Rao wrote:

Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
programs. This can be achieved either by:
(1) retaining the stack setup by the first eBPF program and having all
subsequent eBPF programs re-using it, or,
(2) by unwinding/tearing down the stack and having each eBPF program
deal with its own stack as it sees fit.

To ensure that this does not create loops, there is a limit to how many
tail calls can be done (currently 32). This requires the JIT'ed code to
maintain a count of the number of tail calls done so far.

Approach (1) is simple, but requires every eBPF program to have (almost)
the same prologue/epilogue, regardless of whether they need it. This is
inefficient for small eBPF programs which may not sometimes need a
prologue at all. As such, to minimize impact of tail call
implementation, we use approach (2) here which needs each eBPF program
in the chain to use its own prologue/epilogue. This is not ideal when
many tail calls are involved and when all the eBPF programs in the chain
have similar prologue/epilogue. However, the impact is restricted to
programs that do tail calls. Individual eBPF programs are not affected.

We maintain the tail call count in a fixed location on the stack and
updated tail call count values are passed in through this. The very
first eBPF program in a chain sets this up to 0 (the first 2
instructions). Subsequent tail calls skip the first two eBPF JIT
instructions to maintain the count. For programs that don't do tail
calls themselves, the first two instructions are NOPs.

Signed-off-by: Naveen N. Rao 


Thanks for adding support, Naveen, that's really great! I think 2) seems
fine as well in this context as prologue size can vary quite a bit here,
and depending on program types likelihood of tail call usage as well (but
I wouldn't expect deep nesting). Thanks a lot!


[PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-23 Thread Naveen N. Rao
Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
programs. This can be achieved either by:
(1) retaining the stack setup by the first eBPF program and having all
subsequent eBPF programs re-using it, or,
(2) by unwinding/tearing down the stack and having each eBPF program
deal with its own stack as it sees fit.

To ensure that this does not create loops, there is a limit to how many
tail calls can be done (currently 32). This requires the JIT'ed code to
maintain a count of the number of tail calls done so far.

Approach (1) is simple, but requires every eBPF program to have (almost)
the same prologue/epilogue, regardless of whether they need it. This is
inefficient for small eBPF programs which may not sometimes need a
prologue at all. As such, to minimize impact of tail call
implementation, we use approach (2) here which needs each eBPF program
in the chain to use its own prologue/epilogue. This is not ideal when
many tail calls are involved and when all the eBPF programs in the chain
have similar prologue/epilogue. However, the impact is restricted to
programs that do tail calls. Individual eBPF programs are not affected.

We maintain the tail call count in a fixed location on the stack and
updated tail call count values are passed in through this. The very
first eBPF program in a chain sets this up to 0 (the first 2
instructions). Subsequent tail calls skip the first two eBPF JIT
instructions to maintain the count. For programs that don't do tail
calls themselves, the first two instructions are NOPs.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h |   2 +
 arch/powerpc/net/bpf_jit.h|   2 +
 arch/powerpc/net/bpf_jit64.h  |   1 +
 arch/powerpc/net/bpf_jit_comp64.c | 149 +++---
 4 files changed, 126 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 127ebf5..54ff8ce 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -236,6 +236,7 @@
 #define PPC_INST_STWU  0x9400
 #define PPC_INST_MFLR  0x7c0802a6
 #define PPC_INST_MTLR  0x7c0803a6
+#define PPC_INST_MTCTR 0x7c0903a6
 #define PPC_INST_CMPWI 0x2c00
 #define PPC_INST_CMPDI 0x2c20
 #define PPC_INST_CMPW  0x7c00
@@ -250,6 +251,7 @@
 #define PPC_INST_SUB   0x7c50
 #define PPC_INST_BLR   0x4e800020
 #define PPC_INST_BLRL  0x4e800021
+#define PPC_INST_BCTR  0x4e800420
 #define PPC_INST_MULLD 0x7c0001d2
 #define PPC_INST_MULLW 0x7c0001d6
 #define PPC_INST_MULHWU0x7c16
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index d5301b6..89f7007 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -40,6 +40,8 @@
 #define PPC_BLR()  EMIT(PPC_INST_BLR)
 #define PPC_BLRL() EMIT(PPC_INST_BLRL)
 #define PPC_MTLR(r)EMIT(PPC_INST_MTLR | ___PPC_RT(r))
+#define PPC_BCTR() EMIT(PPC_INST_BCTR)
+#define PPC_MTCTR(r)   EMIT(PPC_INST_MTCTR | ___PPC_RT(r))
 #define PPC_ADDI(d, a, i)  EMIT(PPC_INST_ADDI | ___PPC_RT(d) |   \
 ___PPC_RA(a) | IMM_L(i))
 #define PPC_MR(d, a)   PPC_OR(d, a, a)
diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index a1645d7..038e00b 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -88,6 +88,7 @@ DECLARE_LOAD_FUNC(sk_load_byte);
 #define SEEN_FUNC  0x1000 /* might call external helpers */
 #define SEEN_STACK 0x2000 /* uses BPF stack */
 #define SEEN_SKB   0x4000 /* uses sk_buff */
+#define SEEN_TAILCALL  0x8000 /* uses tail calls */
 
 struct codegen_context {
/*
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 5f8c91f..3ec29d6 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bpf_jit64.h"
 
@@ -77,6 +78,11 @@ static int bpf_jit_stack_local(struct codegen_context *ctx)
return -(BPF_PPC_STACK_SAVE + 16);
 }
 
+static int bpf_jit_stack_tailcallcnt(struct codegen_context *ctx)
+{
+   return bpf_jit_stack_local(ctx) + 8;
+}
+
 static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
 {
if (reg >= BPF_PPC_NVR_MIN && reg < 32)
@@ -102,33 +108,25 @@ static void bpf_jit_emit_skb_loads(u32 *image, struct 
codegen_context *ctx)
PPC_BPF_LL(b2p[SKB_DATA_REG], 3, offsetof(struct sk_buff, data));
 }
 
-static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, 
u64 func)
+static void bpf_jit_build_prologue(u32 *image, struct