Re: [PATCH stable 4.9 1/8] x86: bpf_jit: small optimization in emit_bpf_tail_call()

2018-01-29 Thread Willy Tarreau
Hi Eric,

On Mon, Jan 29, 2018 at 06:04:30AM -0800, Eric Dumazet wrote:
> > If these 4 bytes matter, why not use
> > cmpq with an immediate value instead, which saves 2 extra bytes ? :
> >
> >   - the mov above is 11 bytes total :
> >
> >0:   48 8b 84 d6 78 56 34mov0x12345678(%rsi,%rdx,8),%rax
> >7:   12
> >8:   48 85 c0test   %rax,%rax
> >
> >   - the equivalent cmp is only 9 bytes :
> >
> >0:   48 83 bc d6 78 56 34cmpq   $0x0,0x12345678(%rsi,%rdx,8)
> >7:   12 00
> >
> > And as a bonus, it doesn't even clobber rax.
> >
> > Just my two cents,
> 
> 
> Hi Willy
> 
> Please look more closely at following instructions.
> 
> We need the value later, not only testing it being zero :)

Ah OK that makes total sense then ;-)

Thanks,
willy


Re: [PATCH stable 4.9 1/8] x86: bpf_jit: small optimization in emit_bpf_tail_call()

2018-01-29 Thread Eric Dumazet
On Sun, Jan 28, 2018 at 10:39 PM, Willy Tarreau  wrote:
> Hi,
>
> [ replaced stable@ and greg@ by netdev@ as my question below is not
>   relevant to stable ]
>
> On Mon, Jan 29, 2018 at 02:48:54AM +0100, Daniel Borkmann wrote:
>> From: Eric Dumazet 
>>
>> [ upstream commit 84ccac6e7854ebbfb56d2fc6d5bef9be49bb304c ]
>>
>> Saves 4 bytes replacing following instructions :
>>
>> lea rax, [rsi + rdx * 8 + offsetof(...)]
>> mov rax, qword ptr [rax]
>> cmp rax, 0
>>
>> by :
>>
>> mov rax, [rsi + rdx * 8 + offsetof(...)]
>> test rax, rax
>
> I've just noticed this on stable@. If these 4 bytes matter, why not use
> cmpq with an immediate value instead, which saves 2 extra bytes ? :
>
>   - the mov above is 11 bytes total :
>
>0:   48 8b 84 d6 78 56 34mov0x12345678(%rsi,%rdx,8),%rax
>7:   12
>8:   48 85 c0test   %rax,%rax
>
>   - the equivalent cmp is only 9 bytes :
>
>0:   48 83 bc d6 78 56 34cmpq   $0x0,0x12345678(%rsi,%rdx,8)
>7:   12 00
>
> And as a bonus, it doesn't even clobber rax.
>
> Just my two cents,


Hi Willy

Please look more closely at following instructions.

We need the value later, not only testing it being zero :)


Re: [PATCH stable 4.9 1/8] x86: bpf_jit: small optimization in emit_bpf_tail_call()

2018-01-28 Thread Willy Tarreau
Hi,

[ replaced stable@ and greg@ by netdev@ as my question below is not
  relevant to stable ]

On Mon, Jan 29, 2018 at 02:48:54AM +0100, Daniel Borkmann wrote:
> From: Eric Dumazet 
> 
> [ upstream commit 84ccac6e7854ebbfb56d2fc6d5bef9be49bb304c ]
> 
> Saves 4 bytes replacing following instructions :
> 
> lea rax, [rsi + rdx * 8 + offsetof(...)]
> mov rax, qword ptr [rax]
> cmp rax, 0
> 
> by :
> 
> mov rax, [rsi + rdx * 8 + offsetof(...)]
> test rax, rax

I've just noticed this on stable@. If these 4 bytes matter, why not use
cmpq with an immediate value instead, which saves 2 extra bytes ? :

  - the mov above is 11 bytes total :

   0:   48 8b 84 d6 78 56 34mov0x12345678(%rsi,%rdx,8),%rax
   7:   12 
   8:   48 85 c0test   %rax,%rax

  - the equivalent cmp is only 9 bytes :

   0:   48 83 bc d6 78 56 34cmpq   $0x0,0x12345678(%rsi,%rdx,8)
   7:   12 00 

And as a bonus, it doesn't even clobber rax.

Just my two cents,
Willy