Re: [PATCH net] x86: bpf_jit: fix compilation of large bpf programs

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 13:40 +, David Laight wrote:

 If the JIT compiler is only changing the encoding of the constants
 in the x86 instructions (rather than changing the instructions themselves)
 then there is likely to me an unmeasurable change in the execution time.
 For instance I don't remember there being a difference in execution time
 between long and short branches - the only difference is the amount of
 cache they use.

icache is precisely the matter here. In the end, it makes a difference.

You could check this interesting study Ingo did recently :

https://lkml.org/lkml/2015/5/19/1009


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net] x86: bpf_jit: fix compilation of large bpf programs

2015-05-26 Thread David Laight
From: Alexei Starovoitov
 Sent: 22 May 2015 23:43
 x86 has variable length encoding. x86 JIT compiler is trying
 to pick the shortest encoding for given bpf instruction.
 While doing so the jump targets are changing, so JIT is doing
 multiple passes over the program. Typical program needs 3 passes.
 Some very short programs converge with 2 passes. Large programs
 may need 4 or 5. But specially crafted bpf programs may hit the
 pass limit and if the program converges on the last iteration
 the JIT compiler will be producing an image full of 'int 3' insns.
 Fix this corner case by doing final iteration over bpf program.

If the JIT compiler is only changing the encoding of the constants
in the x86 instructions (rather than changing the instructions themselves)
then there is likely to me an unmeasurable change in the execution time.
For instance I don't remember there being a difference in execution time
between long and short branches - the only difference is the amount of
cache they use.

David

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net] x86: bpf_jit: fix compilation of large bpf programs

2015-05-26 Thread David Laight
From: Eric Dumazet
 Sent: 26 May 2015 15:35
 On Tue, 2015-05-26 at 13:40 +, David Laight wrote:
 
  If the JIT compiler is only changing the encoding of the constants
  in the x86 instructions (rather than changing the instructions themselves)
  then there is likely to me an unmeasurable change in the execution time.
  For instance I don't remember there being a difference in execution time
  between long and short branches - the only difference is the amount of
  cache they use.
 
 icache is precisely the matter here. In the end, it makes a difference.
 
 You could check this interesting study Ingo did recently :
 
 https://lkml.org/lkml/2015/5/19/1009

Yes, interesting, a benchmark that manages to run a lot of code 'cold cache'.
Possibly dominated by having a full cache line of code to execute at the
beginning of each function.

My guess is that aligning function bodies helps, aligning branch targets
(as some old versions of gcc used to do aggressively) is more likely to
have a negative effect (increases the icache footprint too much).
Unrolling loops further than needed to avoid data stalls needs very
careful study.

For JIT, once the obviously short branches have been reduced, further code
size reduction might not be worth while.

David

N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCH net] x86: bpf_jit: fix compilation of large bpf programs

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 15:13 +, David Laight wrote:

 Yes, interesting, a benchmark that manages to run a lot of code 'cold cache'.

We have binaries here at Google with 400 or 500 MBytes of text.

Not benchmark, super real workloads you know.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net] x86: bpf_jit: fix compilation of large bpf programs

2015-05-26 Thread David Laight
From: Eric Dumazet 
 Sent: 26 May 2015 16:30
 
  Yes, interesting, a benchmark that manages to run a lot of code 'cold 
  cache'.
 
 We have binaries here at Google with 400 or 500 MBytes of text.
 
 Not benchmark, super real workloads you know.

Indeed, and a lot of the code is likely to be running 'cold cache'.

I was alluding to the problem where people will benchmark a small function
by running in 1000s of times in a tight loop with exactly the same data.
Not only is it 'hot cache' but any dynamic branch prediction is 'trained'
to the specific data.

David


 



Re: [PATCH net] x86: bpf_jit: fix compilation of large bpf programs

2015-05-24 Thread David Miller
From: Alexei Starovoitov a...@plumgrid.com
Date: Fri, 22 May 2015 15:42:55 -0700

 x86 has variable length encoding. x86 JIT compiler is trying
 to pick the shortest encoding for given bpf instruction.
 While doing so the jump targets are changing, so JIT is doing
 multiple passes over the program. Typical program needs 3 passes.
 Some very short programs converge with 2 passes. Large programs
 may need 4 or 5. But specially crafted bpf programs may hit the
 pass limit and if the program converges on the last iteration
 the JIT compiler will be producing an image full of 'int 3' insns.
 Fix this corner case by doing final iteration over bpf program.
 
 Fixes: 0a14842f5a3c (net: filter: Just In Time compiler for x86-64)
 Reported-by: Daniel Borkmann dan...@iogearbox.net
 Signed-off-by: Alexei Starovoitov a...@plumgrid.com

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] x86: bpf_jit: fix compilation of large bpf programs

2015-05-22 Thread Daniel Borkmann

On 05/23/2015 12:42 AM, Alexei Starovoitov wrote:

x86 has variable length encoding. x86 JIT compiler is trying
to pick the shortest encoding for given bpf instruction.
While doing so the jump targets are changing, so JIT is doing
multiple passes over the program. Typical program needs 3 passes.
Some very short programs converge with 2 passes. Large programs
may need 4 or 5. But specially crafted bpf programs may hit the
pass limit and if the program converges on the last iteration
the JIT compiler will be producing an image full of 'int 3' insns.
Fix this corner case by doing final iteration over bpf program.

Fixes: 0a14842f5a3c (net: filter: Just In Time compiler for x86-64)
Reported-by: Daniel Borkmann dan...@iogearbox.net
Signed-off-by: Alexei Starovoitov a...@plumgrid.com


LGTM, thanks!

Tested-by: Daniel Borkmann dan...@iogearbox.net
Acked-by: Daniel Borkmann dan...@iogearbox.net
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html