Re: [PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-04-30 Thread Shahab Vahedi
Shahab Vahedi writes:
> 
> Björn Töpel  writes:
>>
>> Please try to avoid static inline in the C-files. The compiler usually
>> knows better.
> 
> I will replace them with "static" then.

I have tried [1] this and the test execution time took a performance hit of
35%. Therefore, I have not included it in the rework [2].


Cheers,
Shahab

[1] GCC configuration used for building the Linux image
"GCC 12.2.1 20230306" with 3 different optimisations: "-O{2,3,g}"

[2] [PATCH bpf-next v2] ARC: Add eBPF JIT support
https://lore.kernel.org/bpf/20240430145604.38592-1-list+...@vahedi.org/

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-03-06 Thread Björn Töpel
Shahab Vahedi  writes:

> Hi Björn,
>
> Thank you very much for your inputs. Please find my remarks below.
>
> Björn Töpel  writes:
>
>> Shahab Vahedi  writes:
>> 
>> What's the easiest way to test test this w/o ARC HW? Is there a qemu
>> port avaiable?
>
> Yes, there is a (downstream) port available on GitHub [1]. If one is
> interested, there are also guides about building QEMU for ARC targets [2]
> and how to run eBPF tests for ARC Linux [3].
>
> [1] ARC QEMU port
> https://github.com/foss-for-synopsys-dwc-arc-processors/qemu
>
> [2] Building ARC QEMU
> https://foss-for-synopsys-dwc-arc-processors.github.io/experimental-documentation/2023.09/simulators/qemu/
>
> [3] Runing eBPF tests for ARC Linux
> https://foss-for-synopsys-dwc-arc-processors.github.io/experimental-documentation/2023.09/linux/ebpf/build/

Cool, TY.

>> I don't know much about ARC -- Is v2 compatible with v3?
>
> No, they're not. For what it's worth, ARCv3 comes in {32,64}-bit
> flavours which are not compatible with each other either.
>
>> I'm curious about the missing support; tailcall/atomic/division/extable
>> support. Would it require a lot of work to add that support in the
>> initial change set?
>
> If you're asking whether it is possible that I add those features now,
> my answer unfortunately would be "no". However, the way that things
> are implemented, it will be a straightforward addition.

Ok! Did you try building the kselftest/bpf suite? Would be interesting
to see the pass/fail rate of test_progs.

>> There are a lot of checkpatch/kernel style issues. Run, e.g.,
>> "checkpatch --strict -g HEAD" and you'll get a bunch of issues. Most of
>> them are just basic style issues. Please try to fix most of them for the
>> next rev.
>
> I did run the "checkpatch" before submitting. I've fixed all the "errors"
> and most of the "warnings". But now that you brought it up, I will try to
> fix as many "warnings"/"checks" as make sense.

Ok. I noticed a lot non-kernel style in your patch (checking against
NULL e.g.)

>> You should add yourself to the MAINTAINERS file.
>
> I will. Thanks!
>
>> Please try to avoid static inline in the C-files. The compiler usually
>> knows better.
>
> I will replace them with "static" then.
>
>> > +/* Sane initial values for the globals */
>> > +bool emit = true;
>> > +bool zext_thyself = true;
>> 
>> Hmm, this is racy. Can we move this into the jit context? Also, is
>> zext_thyself even used?
>
> I will get rid of those. For the record, "zext_thyself" is used by
> calling "zext()" after handling "BPF_ALU" operations.

Ah, indeed!

>> > +#define CHECK_RET(cmd)\
>> > +  do {\
>> > +  ret = (cmd);\
>> > +  if (ret < 0)\
>> > +  return ret; \
>> > +  } while (0)
>> > +
>> 
>> Nit/personal taste, but I prefer not having these kind of macros. I
>> think it makes it harder to read the code.
>
> At some point, I found myself distracted from seeing the bigger picture
> while the code was interspersed by the menial "return checking"s. If
> you don't mind, I'd rather keep it as is, unless you feel strong about
> it or Vineet also agrees with you.
>
>> Care to elaborate a bit more on ARC_BPF_JIT_DEBUG. This smells of
>> duplicated funtionality with bpf_jit_dump(), and the BUG()s are scary.
>
> ARC_BPF_JIT_DEBUG is supposed to be enabled for development purposes.
> It enables:
>
> 1. A set of assert-like condition checking which makes the code
> slow and can lead to ungraceful terminations.
>
> 2. Use of a custom version of hex dumps. The most important difference
> with bpf_jit_dump() is that bpf_jit_dump() cannot be used for dumping
> the input BPF byte stream. Rest, I can live with. An example follows:
>
> Using only "bpf_jit_dump" (ARC_BPF_JIT_DEBUG is not defined)
>
>   flen=2 proglen=20 pass=1 image=2e8c6fb9 from=hello pid=127
>   JIT code: : 8a 20 00 10 8a 21 00 10 0a 20 00 02 0a 21 40 02
>   JIT code: 0010: e0 20 c0 07
>
> vs.
>
> Using the custom version (ARC_BPF_JIT_DEBUG is defined)
>   -[  VM   ]-
>   0xb7, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>   0x95, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>   -[ JIT:1 ]-
>   0x8a, 0x20, 0x00, 0x10, 0x8a, 0x21, 0x00, 0x10
>   0x0a, 0x20, 0x00, 0x02, 0x0a, 0x21, 0x40, 0x02
>   0xe0, 0x20, 0xc0, 0x07
>
>> > +static int jit_ctx_init(struct jit_context *ctx, struct bpf_prog *prog)
>> > +{
>> > +   ...
>> 
>> I'd just make sure that ctx is zeroed, and init the non-zero members here.
>
> Very good point! I will implement it that way.
>
> If you have read this far, I'd like to thank you again for spending time
> on reviewing this patch. It is much appreciated.

Looking forward for the next revision!


Björn

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-03-05 Thread Shahab Vahedi
Hi Björn,

Thank you very much for your inputs. Please find my remarks below.

Björn Töpel  writes:

> Shahab Vahedi  writes:
> 
> What's the easiest way to test test this w/o ARC HW? Is there a qemu
> port avaiable?

Yes, there is a (downstream) port available on GitHub [1]. If one is
interested, there are also guides about building QEMU for ARC targets [2]
and how to run eBPF tests for ARC Linux [3].

[1] ARC QEMU port
https://github.com/foss-for-synopsys-dwc-arc-processors/qemu

[2] Building ARC QEMU
https://foss-for-synopsys-dwc-arc-processors.github.io/experimental-documentation/2023.09/simulators/qemu/

[3] Runing eBPF tests for ARC Linux
https://foss-for-synopsys-dwc-arc-processors.github.io/experimental-documentation/2023.09/linux/ebpf/build/

> I don't know much about ARC -- Is v2 compatible with v3?

No, they're not. For what it's worth, ARCv3 comes in {32,64}-bit
flavours which are not compatible with each other either.

> I'm curious about the missing support; tailcall/atomic/division/extable
> support. Would it require a lot of work to add that support in the
> initial change set?

If you're asking whether it is possible that I add those features now,
my answer unfortunately would be "no". However, the way that things
are implemented, it will be a straightforward addition.

> There are a lot of checkpatch/kernel style issues. Run, e.g.,
> "checkpatch --strict -g HEAD" and you'll get a bunch of issues. Most of
> them are just basic style issues. Please try to fix most of them for the
> next rev.

I did run the "checkpatch" before submitting. I've fixed all the "errors"
and most of the "warnings". But now that you brought it up, I will try to
fix as many "warnings"/"checks" as make sense.

> You should add yourself to the MAINTAINERS file.

I will. Thanks!

> Please try to avoid static inline in the C-files. The compiler usually
> knows better.

I will replace them with "static" then.

> > +/* Sane initial values for the globals */
> > +bool emit = true;
> > +bool zext_thyself = true;
> 
> Hmm, this is racy. Can we move this into the jit context? Also, is
> zext_thyself even used?

I will get rid of those. For the record, "zext_thyself" is used by
calling "zext()" after handling "BPF_ALU" operations.

> > +#define CHECK_RET(cmd) \
> > +   do {\
> > +   ret = (cmd);\
> > +   if (ret < 0)\
> > +   return ret; \
> > +   } while (0)
> > +
> 
> Nit/personal taste, but I prefer not having these kind of macros. I
> think it makes it harder to read the code.

At some point, I found myself distracted from seeing the bigger picture
while the code was interspersed by the menial "return checking"s. If
you don't mind, I'd rather keep it as is, unless you feel strong about
it or Vineet also agrees with you.

> Care to elaborate a bit more on ARC_BPF_JIT_DEBUG. This smells of
> duplicated funtionality with bpf_jit_dump(), and the BUG()s are scary.

ARC_BPF_JIT_DEBUG is supposed to be enabled for development purposes.
It enables:

1. A set of assert-like condition checking which makes the code
slow and can lead to ungraceful terminations.

2. Use of a custom version of hex dumps. The most important difference
with bpf_jit_dump() is that bpf_jit_dump() cannot be used for dumping
the input BPF byte stream. Rest, I can live with. An example follows:

Using only "bpf_jit_dump" (ARC_BPF_JIT_DEBUG is not defined)

  flen=2 proglen=20 pass=1 image=2e8c6fb9 from=hello pid=127
  JIT code: : 8a 20 00 10 8a 21 00 10 0a 20 00 02 0a 21 40 02
  JIT code: 0010: e0 20 c0 07

vs.

Using the custom version (ARC_BPF_JIT_DEBUG is defined)
  -[  VM   ]-
  0xb7, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
  0x95, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
  -[ JIT:1 ]-
  0x8a, 0x20, 0x00, 0x10, 0x8a, 0x21, 0x00, 0x10
  0x0a, 0x20, 0x00, 0x02, 0x0a, 0x21, 0x40, 0x02
  0xe0, 0x20, 0xc0, 0x07

> > +static int jit_ctx_init(struct jit_context *ctx, struct bpf_prog *prog)
> > +{
> > +   ...
> 
> I'd just make sure that ctx is zeroed, and init the non-zero members here.

Very good point! I will implement it that way.

If you have read this far, I'd like to thank you again for spending time
on reviewing this patch. It is much appreciated.


Cheers,
Shahab


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-03-03 Thread Björn Töpel
Shahab,

Shahab Vahedi  writes:

> From: Shahab Vahedi 
>
> This will add eBPF JIT support to the 32-bit ARCv2 processors. The
> implementation is qualified by running the BPF tests on a Synopsys HSDK
> board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.

Cool!

I did quick review, mosty focusing on style, and not function. Some
general input/Qs:

What's the easiest way to test test this w/o ARC HW? Is there a qemu
port avaiable?

I don't know much about ARC -- Is v2 compatible with v3?

I'm curious about the missing support; tailcall/atomic/division/extable
support. Would it require a lot of work to add that support in the
inital change set?

There are a lot of checkpatch/kernel style issues. Run, e.g.,
"checkpatch --strict -g HEAD" and you'll get a bunch of issues. Most of
them are just basic style issues. Please try to fix most of them for the
next rev.

You should add yourself to the MAINTAINERS file.

Please try to avoid static inline in the C-files. The compiler usually
knows better.


[...]

> diff --git a/arch/arc/net/bpf_jit_core.c b/arch/arc/net/bpf_jit_core.c
> new file mode 100644
> index ..730a715d324e
> --- /dev/null
> +++ b/arch/arc/net/bpf_jit_core.c
> @@ -0,0 +1,1425 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * The back-end-agnostic part of Just-In-Time compiler for eBPF bytecode.
> + *
> + * Copyright (c) 2024 Synopsys Inc.
> + * Author: Shahab Vahedi 
> + */
> +#include 
> +#include "bpf_jit.h"
> +
> +/* Sane initial values for the globals */
> +bool emit = true;
> +bool zext_thyself = true;

Hmm, this is racy. Can we move this into the jit context? Also, is
zext_thyself even used?

> +
> +/*
> + * Check for the return value. A pattern used oftenly in this file.
> + * There must be a "ret" variable of type "int" in the scope.
> + */
> +#define CHECK_RET(cmd)   \
> + do {\
> + ret = (cmd);\
> + if (ret < 0)\
> + return ret; \
> + } while (0)
> +

Nit/personal taste, but I prefer not having these kind of macros. I
think it makes it harder to read the code.

> +#ifdef ARC_BPF_JIT_DEBUG
> +/* Dumps bytes in /var/log/messages at KERN_INFO level (4). */
> +static void dump_bytes(const u8 *buf, u32 len, const char *header)
> +{
> + u8 line[64];
> + size_t i, j;
> +
> + pr_info("-[ %s ]-\n", header);
> +
> + for (i = 0, j = 0; i < len; i++) {
> + /* Last input byte? */
> + if (i == len-1) {
> + j += scnprintf(line+j, 64-j, "0x%02x", buf[i]);
> + pr_info("%s\n", line);
> + break;
> + }
> + /* End of line? */
> + else if (i % 8 == 7) {
> + j += scnprintf(line+j, 64-j, "0x%02x", buf[i]);
> + pr_info("%s\n", line);
> + j = 0;
> + } else {
> + j += scnprintf(line+j, 64-j, "0x%02x, ", buf[i]);
> + }
> + }
> +}
> +#endif /* ARC_BPF_JIT_DEBUG */
> +
> +/* JIT context ***/
> +
> +/*
> + * buf:  Translated instructions end up here.
> + * len:  The length of whole block in bytes.
> + * index:The offset at which the _next_ instruction may be put.
> + */
> +struct jit_buffer {
> + u8  *buf;
> + u32 len;
> + u32 index;
> +};
> +
> +/*
> + * This is a subset of "struct jit_context" that its information is deemed
> + * necessary for the next extra pass to come.
> + *
> + * bpf_header:   Needed to finally lock the region.
> + * bpf2insn: Used to find the translation for instructions of interest.
> + *
> + * Things like "jit.buf" and "jit.len" can be retrieved respectively from
> + * "prog->bpf_func" and "prog->jited_len".
> + */
> +struct arc_jit_data {
> + struct bpf_binary_header *bpf_header;
> + u32  *bpf2insn;
> +};
> +
> +/*
> + * The JIT pertinent context that is used by different functions.
> + *
> + * prog: The current eBPF program being handled.
> + * orig_prog:The original eBPF program before any possible 
> change.
> + * jit:  The JIT buffer and its length.
> + * bpf_header:   The JITed program header. "jit.buf" points 
> inside it.
> + * bpf2insn: Maps BPF insn indices to their counterparts in jit.buf.
> + * bpf2insn_valid:   Indicates if "bpf2ins" is populated with the mappings.
> + * jit_data: A piece of memory to transfer data to the next pass.
> + * arc_regs_clobbered:   Each bit status determines if that arc reg is 
> clobbered.
> + * save_blink:   Whether ARC's "blink" register needs to be 
> saved.
> + * frame_size:   Derived from "prog->aux->stack_depth".
> + * epilogue_offset:  Used by early "return"s in the code to jump here.
> + * 

Re: [PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-02-26 Thread Shahab Vahedi
Hello list,

I know this is not a small patch, but could someone skim over it?
If there's anything that I can do to make the review process easier,
please let me know.

I already intend to change the "commit message" in the following ways:

- Fix a typo: interpretor -> interpreter
- Mentioning the version of BPF CPU support: cpu=v4 (the latest)
- Saying a little bit about the performance improvement: 2-10 fold


Thank you in advance,
Shahab
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-02-14 Thread Shahab Vahedi
On 2/14/24 03:39, Alexei Starovoitov wrote:
> On Tue, Feb 13, 2024 at 5:20 AM Shahab Vahedi  wrote:
> 
> Could you share performance numbers interpreter vs JITed ?

I see noticeable improvements on every selected test. To list a few:

---8<--

test_bpf: #0 TAX jited:0 862 857 857 PASS
test_bpf: #0 TAX jited:1 102 101 101 PASS

test_bpf: #29 JGE (jt 0), test 1 jited:0 750 620 625 PASS
test_bpf: #29 JGE (jt 0), test 1 jited:1 124  72  72 PASS

test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120  224  260 PASS

test_bpf: #128 ALU64_MUL_X: 64x64 multiply, high word jited:0 267 PASS
test_bpf: #128 ALU64_MUL_X: 64x64 multiply, high word jited:1  29 PASS

test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1  23 PASS

test_bpf: #288 ALU_END_FROM_LE 32: ... -> 0xefcdab89 jited:0 297 PASS
test_bpf: #288 ALU_END_FROM_LE 32: ... -> 0xefcdab89 jited:1  24 PASS

test_bpf: #319 BPF_LDX_MEM | BPF_W, unaligned positive offset jited:0 313 PASS
test_bpf: #319 BPF_LDX_MEM | BPF_W, unaligned positive offset jited:1  26 PASS

test_bpf: #444 BPF_MAXINSNS: Run/add until end jited:0 82358 PASS
test_bpf: #444 BPF_MAXINSNS: Run/add until end jited:1  8328 PASS

test_bpf: #697 ALU64_ARSH_K: all shift values jited:0 48240 PASS
test_bpf: #697 ALU64_ARSH_K: all shift values jited:1  3417 PASS

test_bpf: #776 JMP32_JGE_K: all immediate value magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all immediate value magnitudes jited:1 1020022 PASS

---8<--

The complete results are attached for reference. They can be retrieved by
uudecode(1).

begin 644 arc_bpf_jit_res.tgz
M'XL(`^1=:V\;.9;-Y_D57#1FD$S'2?%95<`:>MQ(E?:;ET_^#5^!_@JE3/^EH0RRWZD0V;_%UQ,JI.)4!IRS)P'E7`1/B/QW
M"-/\NM]L9VM"GFR^S+[,+MK*)>O-?T*@_^S7K&'_^7*;K._6B?X^W!H8,G.GF,J*G>Q7A92,]$=EGV;A,"V\]WI<%0X#H3\\^S0+
M1T7AZ=%1K5_!2<0BIVQ<2&&654H+$>I/1%U;!"7YR>B\9N=2L[N:H[2D-TN'
ME.DZ>/9Q$"Q%?#@\/__[AU&%D)+K#W-+\[3T\=Y9;16FBS%45*1%SR8'>^.#
M[L*9#7_Y./HXZBZ;&?+#V>GYZ<^GM1)C)O4'-#`SY:>CO9/I^=[;;O;(+*ZK
M&8U')[4J=3+,/@XLM^L;K?K1Q'##,/LXW28SZL&>I7-OX$."8R
MD&[)S)8G1WOGYV=U8>TJ(@*TO"ZLO65U.FFHOW'F:H9EIMT[^7DZ.:VE8G&4?9S2BHP_O#LZFKYY9W80+;^*=`>)
MW9ZJ-?9^]$OMZSI`*-UI%'4#A>Z.[]\:W4)W4:7MK4`W9;$N.B)/?]N2X-GS
M++026B.EKH"E']<(/'"1K)H,7'[;R<9LBJJ%9/J$-4&+E*U1[X?FSV\#`F
M47=W?WI&[U5I+PZRH1D,=?A@-@$"B`EZN;N\6R8.A9J6!
M0GNL1*%>DK/1^;3.(4Q;A8'0K5G>G9R_2K,(V:[GW^:SA9%#7*?E85X^S39&
M-Q.N`7EDE*P;S)$RXZHH9P:M`*%>IZ!*8"5V+N;;NCAJH#9U59XSJ[SNZR#U
ML;R\3JF$#%VG>S@F:GTH,TMW7:TIJ_FWZ=>IV$$G+$O?+K\O5[\MT=+5TB[^
M2U;79#U;WB1DZV\N@I/W'X\_C,F/Y.#T:%2/)J((#&-$2:_[#IFX
M?Y?EWX_V)YK14TH;YQ__?$5FBRT9GZ?ETM*5*P7`0_0(,(-
M)E?SV38Q!B=NU-7):N_HHQY=?II.M*%US-TUHBZ-W5RFQ[\.0K!8Q"IDL6R'
M\A2JA+>ZR(U'N@L`#*H0@64EZR=2H=T!P$K`_H/36CPD$X)KXNO]&?-
ME/^G^8?:41!C9#)N;F?:N9;)S6P[_Y:TMR)N`?Y/LEZ1Y&&;+#?SU=(PF)LV
M=%(T>!:SM>ZVO00(:0MPB`#,L#XT(-)^0/

Re: [PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-02-13 Thread Vineet Gupta
+CC Bjorn

On 2/13/24 18:39, Alexei Starovoitov wrote:
> On Tue, Feb 13, 2024 at 5:20 AM Shahab Vahedi  wrote:
>> From: Shahab Vahedi 
>>
>> This will add eBPF JIT support to the 32-bit ARCv2 processors. The
>> implementation is qualified by running the BPF tests on a Synopsys HSDK
>> board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.
> ...
>> Signed-off-by: Shahab Vahedi 
>> ---
>>  Documentation/admin-guide/sysctl/net.rst |1 +
>>  Documentation/networking/filter.rst  |4 +-
>>  arch/arc/Kbuild  |1 +
>>  arch/arc/Kconfig |1 +
>>  arch/arc/net/Makefile|6 +
>>  arch/arc/net/bpf_jit.h   |  161 ++
>>  arch/arc/net/bpf_jit_arcv2.c | 3001 ++
>>  arch/arc/net/bpf_jit_core.c  | 1425 ++
>>  8 files changed, 4598 insertions(+), 2 deletions(-)
> This is pretty cool to see.
> I'm assuming this will get reviewed and will go through arc.git tree.

I'd be happy to take it via ARC tree and can review some of the arch
specific bits, but I'd hope BPF folks also review it critically.

Thx,
-Vineet

> Could you share performance numbers interpreter vs JITed ?


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-02-13 Thread Alexei Starovoitov
On Tue, Feb 13, 2024 at 5:20 AM Shahab Vahedi  wrote:
>
> From: Shahab Vahedi 
>
> This will add eBPF JIT support to the 32-bit ARCv2 processors. The
> implementation is qualified by running the BPF tests on a Synopsys HSDK
> board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.
...
> Signed-off-by: Shahab Vahedi 
> ---
>  Documentation/admin-guide/sysctl/net.rst |1 +
>  Documentation/networking/filter.rst  |4 +-
>  arch/arc/Kbuild  |1 +
>  arch/arc/Kconfig |1 +
>  arch/arc/net/Makefile|6 +
>  arch/arc/net/bpf_jit.h   |  161 ++
>  arch/arc/net/bpf_jit_arcv2.c | 3001 ++
>  arch/arc/net/bpf_jit_core.c  | 1425 ++
>  8 files changed, 4598 insertions(+), 2 deletions(-)

This is pretty cool to see.
I'm assuming this will get reviewed and will go through arc.git tree.

Could you share performance numbers interpreter vs JITed ?

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH bpf-next v1] ARC: Add eBPF JIT support

2024-02-13 Thread Shahab Vahedi
From: Shahab Vahedi 

This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.

Deployment and structure

The related codes are added to "arch/arc/net":

- bpf_jit.h   -- The interface that a back-end translator must provide
- bpf_jit_core.c  -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic

The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".

1. Normal pass   # The necessary pass
 1a. Dry run   # Get the whole JIT length, epilogue offset, etc.
 1b. Emit phase# Allocate memory and start emitting instructions
2. Extra pass# Only needed if there are relocations to be fixed
 2a. Patch relocations

Support status

This JIT compiler does NOT yet provide support for:

- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)

The result of "test_bpf" test suite on an HSDK board is:

hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf

  test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:

  .---..-.
  | test type |   opcodes  | # of cases  |
  |---++-|
  | atomic| 0xC3, 0xDB | 149 |
  | div64 | 0x37, 0x3F |  22 |
  | mod64 | 0x97, 0x9F |  15 |
  `---^+-|
   | (total) 186 |
   `-'

Setup: build config
---
The following configs must be set to have a working JIT test:

  CONFIG_BPF_JIT=y
  CONFIG_BPF_JIT_ALWAYS_ON=y
  CONFIG_TEST_BPF=m

The following options are not necessary for the tests module,
but are good to have:

  CONFIG_DEBUG_INFO=y # prerequisite for below
  CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h

  CONFIG_FTRACE=y #
  CONFIG_BPF_SYSCALL=y# all these options lead to
  CONFIG_KPROBE_EVENTS=y  # having CONFIG_BPF_EVENTS=y
  CONFIG_PERF_EVENTS=y#

Some BPF programs provide data through /sys/kernel/debug:
  CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug

Setup: elfutils
---
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.

[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7

Setup: pahole
-
The line below in linux/scripts/Makefile.btf must be commented out:

pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats

Or else, the build will fail:

$ make V=1
  ...
  BTF .btf.vmlinux.bin.o
pahole -J --btf_gen_floats\
   -j --lang_exclude=rust \
   --skip_encoding_btf_inconsistent_proto \
   --btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
  ...
  BTFIDS  vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available

This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.

Running the tests
-
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
  ...
  test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
  test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

Acknowledgments
---
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpretor support

Signed-off-by: Shahab Vahedi 
---
 Documentation/admin-guide/sysctl/net.rst |1 +
 Documentation/networking/filter.rst  |4 +-
 arch/arc/Kbuild  |1 +
 arch/arc/Kconfig |1 +
 arch/arc/net/Makefile|6 +
 arch/arc/net/bpf_jit.h   |  161 ++
 arch/arc/net/bpf_jit_arcv2.c | 3001 ++
 arch/arc/net/bpf_jit_core.c  | 1425 ++
 8 files changed, 4598 insertions(+), 2 deletions(-)
 create mode 100644