Thanks for all the feedbacks from the iovisor summit. Since then,
we've tried a couple of workarounds for this issue, the following
shares the observations and code.

=== Original code, xdp9.c ===
xdp9.c is to parse IPv4, TCP, UDP, ICMP and update checksum.
https://gist.github.com/williamtu/759eedbefa335c7dbc5700acc0c43ee9#file-xdp9-c
which generated from xdp9.p4
https://github.com/vmware/p4c-xdp/blob/master/tests/xdp9.p4

its objdump, with max stack size = 600
https://gist.github.com/williamtu/759eedbefa335c7dbc5700acc0c43ee9#file-xdp9-objdump-L551

[Result]: fail to load into kernel
The culprit is "struct Headers hd" which is < 200 byte, but due to
register spill, it uses lots of stack.

=== Work around 1 ===
Reduce the live range of the registers by declaring another "struct
Headers hd__", and use "hd__" at the lower half of the program.

https://gist.github.com/williamtu/759eedbefa335c7dbc5700acc0c43ee9#file-workaround1-xdp9-c
https://gist.github.com/williamtu/759eedbefa335c7dbc5700acc0c43ee9#file-workaround1-xdp9-objdump

[Result]: max stack size reduces to 384, runs OK, but this introduce extra copy.
https://gist.github.com/williamtu/759eedbefa335c7dbc5700acc0c43ee9#file-workaround1-xdp9-objdump-L404

=== Work around 2 ===
Save the "struct Headers hd" in a per cpu array map.

https://gist.github.com/williamtu/759eedbefa335c7dbc5700acc0c43ee9#file-workaround2-xdp9-c

[Result]: since it's not using stack, max stack size < 200, program
runs OK. Not sure what's the performance overhead until we stress test
it.

=== Work around 3 ===
(experimental) declare 'struct Headers hd' as 'volatile'
In this case, access to the 'hd' will be read/write from memory,
ideally should use less registers.

https://gist.github.com/williamtu/759eedbefa335c7dbc5700acc0c43ee9#file-volatile-xdp9-c
https://gist.github.com/williamtu/759eedbefa335c7dbc5700acc0c43ee9#file-volatile-xdp9-objdump

[Result:] the generated code is always in order, the verifier hit the
max insn it can processed:
BPF program is too large. Processed 65537 insn
Error fetching program/map!

Regards,
William

On Fri, Feb 10, 2017 at 10:59 AM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Fri, Feb 10, 2017 at 9:08 AM, William Tu <u9012...@gmail.com> wrote:
>> my program is too huge so I start with simple example using xdp1_kern.c
>> I tried adding an ethernet header as local variable:
>>
>> --- a/samples/bpf/xdp1_kern.c
>> +++ b/samples/bpf/xdp1_kern.c
>> @@ -51,10 +51,19 @@ int xdp_prog1(struct xdp_md *ctx)
>>         u64 nh_off;
>>         u32 ipproto;
>>
>> +       struct ethhdr myeth;
>> +
>>         nh_off = sizeof(*eth);
>>         if (data + nh_off > data_end)
>>                 return rc;
>>
>> +       myeth.h_dest[0] = eth->h_source[0];
>> +       myeth.h_dest[1] = eth->h_source[1];
>> +       myeth.h_dest[2] = eth->h_source[2];
>> +       myeth.h_dest[3] = eth->h_source[3];
>> +       myeth.h_dest[4] = eth->h_source[4];
>> +       myeth.h_dest[5] = eth->h_source[5];
>> +
>>         h_proto = eth->h_proto;
>>
>>         if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) 
>> {
>> @@ -87,6 +96,7 @@ int xdp_prog1(struct xdp_md *ctx)
>>         if (value)
>>                 *value += 1;
>>
>> +       memcpy(eth, &myeth, sizeof(myeth));
>>         return rc;
>>  }
>> ---------
>> Then using llvm-objdump, it shows:
>>
>> ; myeth.h_dest[5] = eth->h_source[5];
>>       24: r4 = *(u8 *)(r1 + 11)
>> ; myeth.h_dest[4] = eth->h_source[4];
>>       25: r5 = *(u8 *)(r1 + 10)
>> ; myeth.h_dest[3] = eth->h_source[3];
>>       26: r0 = *(u8 *)(r1 + 9)
>> ; myeth.h_dest[2] = eth->h_source[2];
>>       27: r6 = *(u8 *)(r1 + 8)
>> ; myeth.h_dest[1] = eth->h_source[1];
>>       28: r7 = *(u8 *)(r1 + 7)
>> ; myeth.h_dest[0] = eth->h_source[0];
>>       29: r8 = *(u8 *)(r1 + 6)
>>       30: r1 += 6
>> ; if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
>>       31: *(u64 *)(r10 - 72) = r1
>>       32: *(u64 *)(r10 - 80) = r2
>>       33: *(u64 *)(r10 - 88) = r3
>>       34: *(u64 *)(r10 - 96) = r4
>>       35: *(u64 *)(r10 - 104) = r5
>>       36: *(u64 *)(r10 - 112) = r0
>>       37: *(u64 *)(r10 - 120) = r6
>>       38: *(u64 *)(r10 - 128) = r7
>>       39: *(u64 *)(r10 - 136) = r8
>> ----
>> by observing that it's always 8 byte offset, r10 - 136, 128, 120, 112....
>> the local variable myeth.h_dest[6] seems to take 6 * 8 = 48 Byte on stack.
>> and my ethernet header will take 8 * (6+6) + 2 = 50 Byte
>
> interesting.
> in this case 'struct ethhdr myeth' doesn't actually exist on stack.
> Compiler kept it in regsiters, but due to function call it had to spill
> those registers to stack. Every register is 8-byte, hence you see so much
> stack usage. With llvm trunk I see better code, but it still spills.
> I don't have a workaround yet.
_______________________________________________
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

Reply via email to