Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 26/09/17 02:33, Alexei Starovoitov wrote: > On Mon, Sep 25, 2017 at 11:44:02PM +0200, Daniel Borkmann wrote: >> But above cast to be16 also doesn't seem quite C-like in terms >> of what we're actually doing... 3rd option would be my personal >> preference even if it doesn't look C-like, but otoh we also have >> 'call' etc which is neither. > In that sense (be16) cast is pretty much self explanatory. > So I'd like to continue bikesheding in hopes to convince you > to accept either 1 or 2 above ;) I agree with Daniel. 3rd option `r1 = be16 r1` is best, as it's an actual ALU operation, not just a cast. And since it looks like we're drifting vaguely near a consensus on that (even if Alexei still isn't convinced ;-) I'll spin v2 patches with that and `r1 = (u32) -r1`, so we have something concrete to argue about... -Ed
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Mon, Sep 25, 2017 at 11:44:02PM +0200, Daniel Borkmann wrote: > On 09/24/2017 07:50 AM, Alexei Starovoitov wrote: > > On Fri, Sep 22, 2017 at 09:49:10PM -0700, Y Song wrote: > > > On Fri, Sep 22, 2017 at 9:23 AM, Edward Creewrote: > > > > On 22/09/17 16:16, Alexei Starovoitov wrote: > > > > > looks like we're converging on > > > > > "be16/be32/be64/le16/le32/le64 #register" for BPF_END. > > > > > I guess it can live with that. I would prefer more C like syntax > > > > > to match the rest, but llvm parsing point is a strong one. > > > > Yep, agreed. I'll post a v2 once we've settled BPF_NEG. > > > > > For BPG_NEG I prefer to do it in C syntax like interpreter does: > > > > > ALU_NEG: > > > > > DST = (u32) -DST; > > > > > ALU64_NEG: > > > > > DST = -DST; > > > > > Yonghong, does it mean that asmparser will equally suffer? > > > > Correction to my earlier statements: verifier will currently disassemble > > > > neg as: > > > > (87) r0 neg 0 > > > > (84) (u32) r0 neg (u32) 0 > > > > because it pretends 'neg' is a compound-assignment operator like +=. > > > > The analogy with be16 and friends would be to use > > > > neg64 r0 > > > > neg32 r0 > > > > whereas the analogy with everything else would be > > > > r0 = -r0 > > > > r0 = (u32) -r0 > > > > as Alexei says. > > > > I'm happy to go with Alexei's version if it doesn't cause problems for > > > > llvm. > > > > > > I got some time to do some prototyping in llvm and it looks like that > > > I am able to > > > resolve the issue and we are able to use more C-like syntax. That is: > > > for bswap: > > > r1 = (be16) (u16) r1 > > > or > > > r1 = (be16) r1 > > > or > > > r1 = be16 r1 > > > for neg: > > > r0 = -r0 > > > (for 32bit support, llvm may output "w0 = -w0" in the future. But > > > since it is not > > >enabled yet, you can continue to output "r0 = (u32) -r0".) > > > > > > Not sure which syntax is best for bswap. The "r1 = (be16) (u16) r1" is > > > most > > > explicit in its intention. > > > > > > Attaching my llvm patch as well and cc'ing Jiong and Jakub so they can > > > see my > > > implementation and the relative discussion here. (In this patch, I did > > > not implement > > > bswap for little endian yet.) Maybe they can provide additional comments. > > > > This is awesome. In such case I'd like to swing back to the C syntax for > > bpf_end :) > > Any of these > >r1 = (be16) (u16) r1 > >or > >r1 = (be16) r1 > >or > >r1 = be16 r1 > > are better than just > >be16 r1 > > I like 1st the most, since it's explicit in terms of what happens with > > upper bits, > > but 2nd is also ok. 3rd is not quite C-like. > > But above cast to be16 also doesn't seem quite C-like in terms > of what we're actually doing... 3rd option would be my personal > preference even if it doesn't look C-like, but otoh we also have > 'call' etc which is neither. well, we have not quite C, but C-like syntax already (u32) r0 += (u32) r1; Clearly first cast isn't quite C, but it shows that src reg is only using 32-bit in the alu and imo that's better than using w0 += w1; syntax. It's not quite C, but it's intent is way more clear for folks that know C and don't know assembler than 'w0 = w1'. And imo it's better than r0 = (u32) r0 + (u32) r1; that why I did this format long ago and not because of laziness. The 'endian' part, have to confess, was me being lazy, but I'd rather fix it to match the rest of the 'not quite C, but C-like' style. In that sense (be16) cast is pretty much self explanatory. So I'd like to continue bikesheding in hopes to convince you to accept either 1 or 2 above ;)
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 09/24/2017 07:50 AM, Alexei Starovoitov wrote: On Fri, Sep 22, 2017 at 09:49:10PM -0700, Y Song wrote: On Fri, Sep 22, 2017 at 9:23 AM, Edward Creewrote: On 22/09/17 16:16, Alexei Starovoitov wrote: looks like we're converging on "be16/be32/be64/le16/le32/le64 #register" for BPF_END. I guess it can live with that. I would prefer more C like syntax to match the rest, but llvm parsing point is a strong one. Yep, agreed. I'll post a v2 once we've settled BPF_NEG. For BPG_NEG I prefer to do it in C syntax like interpreter does: ALU_NEG: DST = (u32) -DST; ALU64_NEG: DST = -DST; Yonghong, does it mean that asmparser will equally suffer? Correction to my earlier statements: verifier will currently disassemble neg as: (87) r0 neg 0 (84) (u32) r0 neg (u32) 0 because it pretends 'neg' is a compound-assignment operator like +=. The analogy with be16 and friends would be to use neg64 r0 neg32 r0 whereas the analogy with everything else would be r0 = -r0 r0 = (u32) -r0 as Alexei says. I'm happy to go with Alexei's version if it doesn't cause problems for llvm. I got some time to do some prototyping in llvm and it looks like that I am able to resolve the issue and we are able to use more C-like syntax. That is: for bswap: r1 = (be16) (u16) r1 or r1 = (be16) r1 or r1 = be16 r1 for neg: r0 = -r0 (for 32bit support, llvm may output "w0 = -w0" in the future. But since it is not enabled yet, you can continue to output "r0 = (u32) -r0".) Not sure which syntax is best for bswap. The "r1 = (be16) (u16) r1" is most explicit in its intention. Attaching my llvm patch as well and cc'ing Jiong and Jakub so they can see my implementation and the relative discussion here. (In this patch, I did not implement bswap for little endian yet.) Maybe they can provide additional comments. This is awesome. In such case I'd like to swing back to the C syntax for bpf_end :) Any of these r1 = (be16) (u16) r1 or r1 = (be16) r1 or r1 = be16 r1 are better than just be16 r1 I like 1st the most, since it's explicit in terms of what happens with upper bits, but 2nd is also ok. 3rd is not quite C-like. But above cast to be16 also doesn't seem quite C-like in terms of what we're actually doing... 3rd option would be my personal preference even if it doesn't look C-like, but otoh we also have 'call' etc which is neither.
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 09:49:10PM -0700, Y Song wrote: > On Fri, Sep 22, 2017 at 9:23 AM, Edward Creewrote: > > On 22/09/17 16:16, Alexei Starovoitov wrote: > >> looks like we're converging on > >> "be16/be32/be64/le16/le32/le64 #register" for BPF_END. > >> I guess it can live with that. I would prefer more C like syntax > >> to match the rest, but llvm parsing point is a strong one. > > Yep, agreed. I'll post a v2 once we've settled BPF_NEG. > >> For BPG_NEG I prefer to do it in C syntax like interpreter does: > >> ALU_NEG: > >> DST = (u32) -DST; > >> ALU64_NEG: > >> DST = -DST; > >> Yonghong, does it mean that asmparser will equally suffer? > > Correction to my earlier statements: verifier will currently disassemble > > neg as: > > (87) r0 neg 0 > > (84) (u32) r0 neg (u32) 0 > > because it pretends 'neg' is a compound-assignment operator like +=. > > The analogy with be16 and friends would be to use > > neg64 r0 > > neg32 r0 > > whereas the analogy with everything else would be > > r0 = -r0 > > r0 = (u32) -r0 > > as Alexei says. > > I'm happy to go with Alexei's version if it doesn't cause problems for llvm. > > I got some time to do some prototyping in llvm and it looks like that > I am able to > resolve the issue and we are able to use more C-like syntax. That is: > for bswap: > r1 = (be16) (u16) r1 > or > r1 = (be16) r1 > or > r1 = be16 r1 > for neg: > r0 = -r0 > (for 32bit support, llvm may output "w0 = -w0" in the future. But > since it is not > enabled yet, you can continue to output "r0 = (u32) -r0".) > > Not sure which syntax is best for bswap. The "r1 = (be16) (u16) r1" is most > explicit in its intention. > > Attaching my llvm patch as well and cc'ing Jiong and Jakub so they can see my > implementation and the relative discussion here. (In this patch, I did > not implement > bswap for little endian yet.) Maybe they can provide additional comments. This is awesome. In such case I'd like to swing back to the C syntax for bpf_end :) Any of these r1 = (be16) (u16) r1 or r1 = (be16) r1 or r1 = be16 r1 are better than just be16 r1 I like 1st the most, since it's explicit in terms of what happens with upper bits, but 2nd is also ok. 3rd is not quite C-like.
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 9:23 AM, Edward Creewrote: > On 22/09/17 16:16, Alexei Starovoitov wrote: >> looks like we're converging on >> "be16/be32/be64/le16/le32/le64 #register" for BPF_END. >> I guess it can live with that. I would prefer more C like syntax >> to match the rest, but llvm parsing point is a strong one. > Yep, agreed. I'll post a v2 once we've settled BPF_NEG. >> For BPG_NEG I prefer to do it in C syntax like interpreter does: >> ALU_NEG: >> DST = (u32) -DST; >> ALU64_NEG: >> DST = -DST; >> Yonghong, does it mean that asmparser will equally suffer? > Correction to my earlier statements: verifier will currently disassemble > neg as: > (87) r0 neg 0 > (84) (u32) r0 neg (u32) 0 > because it pretends 'neg' is a compound-assignment operator like +=. > The analogy with be16 and friends would be to use > neg64 r0 > neg32 r0 > whereas the analogy with everything else would be > r0 = -r0 > r0 = (u32) -r0 > as Alexei says. > I'm happy to go with Alexei's version if it doesn't cause problems for llvm. I got some time to do some prototyping in llvm and it looks like that I am able to resolve the issue and we are able to use more C-like syntax. That is: for bswap: r1 = (be16) (u16) r1 or r1 = (be16) r1 or r1 = be16 r1 for neg: r0 = -r0 (for 32bit support, llvm may output "w0 = -w0" in the future. But since it is not enabled yet, you can continue to output "r0 = (u32) -r0".) Not sure which syntax is best for bswap. The "r1 = (be16) (u16) r1" is most explicit in its intention. Attaching my llvm patch as well and cc'ing Jiong and Jakub so they can see my implementation and the relative discussion here. (In this patch, I did not implement bswap for little endian yet.) Maybe they can provide additional comments. 0001-bpf-add-support-for-neg-insn-and-change-format-of-bs.patch Description: Binary data
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 22/09/17 16:16, Alexei Starovoitov wrote: > looks like we're converging on > "be16/be32/be64/le16/le32/le64 #register" for BPF_END. > I guess it can live with that. I would prefer more C like syntax > to match the rest, but llvm parsing point is a strong one. Yep, agreed. I'll post a v2 once we've settled BPF_NEG. > For BPG_NEG I prefer to do it in C syntax like interpreter does: > ALU_NEG: > DST = (u32) -DST; > ALU64_NEG: > DST = -DST; > Yonghong, does it mean that asmparser will equally suffer? Correction to my earlier statements: verifier will currently disassemble neg as: (87) r0 neg 0 (84) (u32) r0 neg (u32) 0 because it pretends 'neg' is a compound-assignment operator like +=. The analogy with be16 and friends would be to use neg64 r0 neg32 r0 whereas the analogy with everything else would be r0 = -r0 r0 = (u32) -r0 as Alexei says. I'm happy to go with Alexei's version if it doesn't cause problems for llvm.
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 07:27:29AM -0700, Y Song wrote: > On Fri, Sep 22, 2017 at 7:11 AM, Y Songwrote: > > On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree wrote: > >> On 22/09/17 00:11, Y Song wrote: > >>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree > >>> wrote: > On 21/09/17 20:44, Alexei Starovoitov wrote: > > On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: > >> More intuitive, but agree on the from_be/le. Maybe we should > >> just drop the "to_" prefix altogether, and leave the rest as is since > >> it's not surrounded by braces, it's also not a cast but rather an op. > That works for me. > > 'be16 r4' is ambiguous regarding upper bits. > > > > what about my earlier suggestion: > > r4 = (be16) (u16) r4 > > r4 = (le64) (u64) r4 > > > > It will be pretty clear what instruction is doing (that upper bits > > become zero). > Trouble with that is that's very *not* what C will do with those casts > and it doesn't really capture the bidirectional/symmetry thing. The > closest I could see with that is something like `r4 = (be16/u16) r4`, > but that's quite an ugly mongrel. > I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the > cleanest and clearest. Should it be > r4 = be16 r4 > or just > be16 r4 > ? Personally I incline towards the latter, but admit it doesn't really > match the syntax of other opcodes. > >>> I did some quick prototyping in llvm to make sure we have a syntax > >>> llvm is happy. Apparently, llvm does not like the syntax > >>>r4 = be16 r4orr4 = (be16) (u16) r4. > >>> > >>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp: > >>> > >>> // Verify that any operand is only mentioned once. > >> Wait, how do you deal with (totally legal) r4 += r4? > >> Or r4 = *(r4 +0)? > >> Even jumps can have src_reg == dst_reg, though it doesn't seem useful. > > > > We are talking about dag node here. The above "r4", although using the same > > register, will be different dag nodes. So it will be okay. > > > > The "r4 = be16 r4" tries to use the *same* dag node as both source and > > destination > > in the asm output which is prohibited. > > With second thought, we may allow "r4 = be16 r4" by using different dag nodes. > (I need to do experiment for this.) But we do have constraints that > the two "r4" must > be the same register. "r5 = be16 r4" is not allowed. So from that > perspective, referencing > "r4" only once is a good idea and less confusing. looks like we're converging on "be16/be32/be64/le16/le32/le64 #register" for BPF_END. I guess it can live with that. I would prefer more C like syntax to match the rest, but llvm parsing point is a strong one. For BPG_NEG I prefer to do it in C syntax like interpreter does: ALU_NEG: DST = (u32) -DST; ALU64_NEG: DST = -DST; Yonghong, does it mean that asmparser will equally suffer?
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 7:11 AM, Y Songwrote: > On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree wrote: >> On 22/09/17 00:11, Y Song wrote: >>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree wrote: On 21/09/17 20:44, Alexei Starovoitov wrote: > On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: >> More intuitive, but agree on the from_be/le. Maybe we should >> just drop the "to_" prefix altogether, and leave the rest as is since >> it's not surrounded by braces, it's also not a cast but rather an op. That works for me. > 'be16 r4' is ambiguous regarding upper bits. > > what about my earlier suggestion: > r4 = (be16) (u16) r4 > r4 = (le64) (u64) r4 > > It will be pretty clear what instruction is doing (that upper bits become > zero). Trouble with that is that's very *not* what C will do with those casts and it doesn't really capture the bidirectional/symmetry thing. The closest I could see with that is something like `r4 = (be16/u16) r4`, but that's quite an ugly mongrel. I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the cleanest and clearest. Should it be r4 = be16 r4 or just be16 r4 ? Personally I incline towards the latter, but admit it doesn't really match the syntax of other opcodes. >>> I did some quick prototyping in llvm to make sure we have a syntax >>> llvm is happy. Apparently, llvm does not like the syntax >>>r4 = be16 r4orr4 = (be16) (u16) r4. >>> >>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp: >>> >>> // Verify that any operand is only mentioned once. >> Wait, how do you deal with (totally legal) r4 += r4? >> Or r4 = *(r4 +0)? >> Even jumps can have src_reg == dst_reg, though it doesn't seem useful. > > We are talking about dag node here. The above "r4", although using the same > register, will be different dag nodes. So it will be okay. > > The "r4 = be16 r4" tries to use the *same* dag node as both source and > destination > in the asm output which is prohibited. With second thought, we may allow "r4 = be16 r4" by using different dag nodes. (I need to do experiment for this.) But we do have constraints that the two "r4" must be the same register. "r5 = be16 r4" is not allowed. So from that perspective, referencing "r4" only once is a good idea and less confusing.
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 6:46 AM, Edward Creewrote: > On 22/09/17 00:11, Y Song wrote: >> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree wrote: >>> On 21/09/17 20:44, Alexei Starovoitov wrote: On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: > More intuitive, but agree on the from_be/le. Maybe we should > just drop the "to_" prefix altogether, and leave the rest as is since > it's not surrounded by braces, it's also not a cast but rather an op. >>> That works for me. 'be16 r4' is ambiguous regarding upper bits. what about my earlier suggestion: r4 = (be16) (u16) r4 r4 = (le64) (u64) r4 It will be pretty clear what instruction is doing (that upper bits become zero). >>> Trouble with that is that's very *not* what C will do with those casts >>> and it doesn't really capture the bidirectional/symmetry thing. The >>> closest I could see with that is something like `r4 = (be16/u16) r4`, >>> but that's quite an ugly mongrel. >>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the >>> cleanest and clearest. Should it be >>> r4 = be16 r4 >>> or just >>> be16 r4 >>> ? Personally I incline towards the latter, but admit it doesn't really >>> match the syntax of other opcodes. >> I did some quick prototyping in llvm to make sure we have a syntax >> llvm is happy. Apparently, llvm does not like the syntax >>r4 = be16 r4orr4 = (be16) (u16) r4. >> >> In llvm:utils/TableGen/AsmMatcherEmitter.cpp: >> >> // Verify that any operand is only mentioned once. > Wait, how do you deal with (totally legal) r4 += r4? > Or r4 = *(r4 +0)? > Even jumps can have src_reg == dst_reg, though it doesn't seem useful. We are talking about dag node here. The above "r4", although using the same register, will be different dag nodes. So it will be okay. The "r4 = be16 r4" tries to use the *same* dag node as both source and destination in the asm output which is prohibited.
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 22/09/17 00:11, Y Song wrote: > On Thu, Sep 21, 2017 at 12:58 PM, Edward Creewrote: >> On 21/09/17 20:44, Alexei Starovoitov wrote: >>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: More intuitive, but agree on the from_be/le. Maybe we should just drop the "to_" prefix altogether, and leave the rest as is since it's not surrounded by braces, it's also not a cast but rather an op. >> That works for me. >>> 'be16 r4' is ambiguous regarding upper bits. >>> >>> what about my earlier suggestion: >>> r4 = (be16) (u16) r4 >>> r4 = (le64) (u64) r4 >>> >>> It will be pretty clear what instruction is doing (that upper bits become >>> zero). >> Trouble with that is that's very *not* what C will do with those casts >> and it doesn't really capture the bidirectional/symmetry thing. The >> closest I could see with that is something like `r4 = (be16/u16) r4`, >> but that's quite an ugly mongrel. >> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the >> cleanest and clearest. Should it be >> r4 = be16 r4 >> or just >> be16 r4 >> ? Personally I incline towards the latter, but admit it doesn't really >> match the syntax of other opcodes. > I did some quick prototyping in llvm to make sure we have a syntax > llvm is happy. Apparently, llvm does not like the syntax >r4 = be16 r4orr4 = (be16) (u16) r4. > > In llvm:utils/TableGen/AsmMatcherEmitter.cpp: > > // Verify that any operand is only mentioned once. Wait, how do you deal with (totally legal) r4 += r4? Or r4 = *(r4 +0)? Even jumps can have src_reg == dst_reg, though it doesn't seem useful.
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Thu, Sep 21, 2017 at 12:58 PM, Edward Creewrote: > On 21/09/17 20:44, Alexei Starovoitov wrote: >> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: >>> More intuitive, but agree on the from_be/le. Maybe we should >>> just drop the "to_" prefix altogether, and leave the rest as is since >>> it's not surrounded by braces, it's also not a cast but rather an op. > That works for me. >> 'be16 r4' is ambiguous regarding upper bits. >> >> what about my earlier suggestion: >> r4 = (be16) (u16) r4 >> r4 = (le64) (u64) r4 >> >> It will be pretty clear what instruction is doing (that upper bits become >> zero). > Trouble with that is that's very *not* what C will do with those casts > and it doesn't really capture the bidirectional/symmetry thing. The > closest I could see with that is something like `r4 = (be16/u16) r4`, > but that's quite an ugly mongrel. > I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the > cleanest and clearest. Should it be > r4 = be16 r4 > or just > be16 r4 > ? Personally I incline towards the latter, but admit it doesn't really > match the syntax of other opcodes. I did some quick prototyping in llvm to make sure we have a syntax llvm is happy. Apparently, llvm does not like the syntax r4 = be16 r4orr4 = (be16) (u16) r4. In llvm:utils/TableGen/AsmMatcherEmitter.cpp: // Verify that any operand is only mentioned once. // We reject aliases and ignore instructions for now. if (Tok[0] == '$' && !OperandNames.insert(Tok).second) { if (!Hack) PrintFatalError(TheDef->getLoc(), "ERROR: matchable with tied operand '" + Tok + "' can never be matched!"); // FIXME: Should reject these. The ARM backend hits this with $lane in a // bunch of instructions. It is unclear what the right answer is. DEBUG({ errs() << "warning: '" << TheDef->getName() << "': " << "ignoring instruction with tied operand '" << Tok << "'\n"; }); return false; } Later on, such insn will be ignored in table matching and assember will not work any more. Note that here bswap16/32/64 require source and destination register must be the same. So it looks like "be16/be32/be64/le16/le32/le64 #register" is a good idea. We could use "be16 (u16)#register", but not sure whether extra u16 conversion adds value or rather adds more confusion. > > To shed a few more bikes, I did also wonder about the BPF_NEG opcode, > which (if I'm reading the code correctly) currently renders as > r4 = neg r4 0 > (u32) r4 = neg (u32) r4 0 > That printing of the insn->imm, while harmless, is needless and > potentially confusing. Should we get rid of it? Currently, llvm does not issue "neg" insn yet. Maybe you can issue r3 = -r4 // 64bit r3 = (u32) -r4 // 32bit This matches what interpreter does. This will be similar to other ALU operations.
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 21/09/17 20:44, Alexei Starovoitov wrote: > On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: >> More intuitive, but agree on the from_be/le. Maybe we should >> just drop the "to_" prefix altogether, and leave the rest as is since >> it's not surrounded by braces, it's also not a cast but rather an op. That works for me. > 'be16 r4' is ambiguous regarding upper bits. > > what about my earlier suggestion: > r4 = (be16) (u16) r4 > r4 = (le64) (u64) r4 > > It will be pretty clear what instruction is doing (that upper bits become > zero). Trouble with that is that's very *not* what C will do with those casts and it doesn't really capture the bidirectional/symmetry thing. The closest I could see with that is something like `r4 = (be16/u16) r4`, but that's quite an ugly mongrel. I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the cleanest and clearest. Should it be r4 = be16 r4 or just be16 r4 ? Personally I incline towards the latter, but admit it doesn't really match the syntax of other opcodes. To shed a few more bikes, I did also wonder about the BPF_NEG opcode, which (if I'm reading the code correctly) currently renders as r4 = neg r4 0 (u32) r4 = neg (u32) r4 0 That printing of the insn->imm, while harmless, is needless and potentially confusing. Should we get rid of it?
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote: > On 09/21/2017 06:58 PM, Edward Cree wrote: > > On 21/09/17 17:40, Y Song wrote: > > > On Thu, Sep 21, 2017 at 9:24 AM, Edward Creewrote: > > > > On 21/09/17 16:52, Alexei Starovoitov wrote: > > > > > imo > > > > > (u16) r4 endian be > > > > > isn't intuitive. > > > > > Can we come up with some better syntax? > > > > > Like > > > > > bswap16be r4 > > > > > bswap32le r4 > > > > Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on > > > > be. > > Agree, a bit too much 'swap' semantics in the name that could be > confusing perhaps, at least the be/le could be missed easily. > > > > > > or > > > > > > > > > > to_be16 r4 > > > > > to_le32 r4 > > > > And the problem here is that it's not just to_be, it's also from_be. > > More intuitive, but agree on the from_be/le. Maybe we should > just drop the "to_" prefix altogether, and leave the rest as is since > it's not surrounded by braces, it's also not a cast but rather an op. 'be16 r4' is ambiguous regarding upper bits. what about my earlier suggestion: r4 = (be16) (u16) r4 r4 = (le64) (u64) r4 It will be pretty clear what instruction is doing (that upper bits become zero).
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 09/21/2017 06:58 PM, Edward Cree wrote: On 21/09/17 17:40, Y Song wrote: On Thu, Sep 21, 2017 at 9:24 AM, Edward Creewrote: On 21/09/17 16:52, Alexei Starovoitov wrote: imo (u16) r4 endian be isn't intuitive. Can we come up with some better syntax? Like bswap16be r4 bswap32le r4 Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be. Agree, a bit too much 'swap' semantics in the name that could be confusing perhaps, at least the be/le could be missed easily. or to_be16 r4 to_le32 r4 And the problem here is that it's not just to_be, it's also from_be. More intuitive, but agree on the from_be/le. Maybe we should just drop the "to_" prefix altogether, and leave the rest as is since it's not surrounded by braces, it's also not a cast but rather an op. Could you explain what is "from_be" here? Do not quite understand. Taking the example of a little-endian processor: cpu_to_be16() is a byte-swap, converting a u16 (cpu-endian) to a __be16. be16_to_cpu(), to convert a __be16 to a u16, is *also* a byte-swap. Meanwhile, cpu_to_le16() and le16_to_cpu() are both no-ops. More generally, the conversions between cpu-endian and fixed-endian for any given size are self-inverses. eBPF takes advantage of this by only having a single opcode for both the "to" and "from" direction. So to specify an endianness conversion, you need only the size and the fixed endianness (le or be), not the to/from direction. Conversely, when disassembling one of these instructions, you don't know whether it's a cpu_to_be16() or a be16_to_cpu(), because they both look the same at an instruction level (they only differ in what types the programmer thought of the register as holding before and after). Yeah, exactly to the point. :)
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 21/09/17 17:40, Y Song wrote: > On Thu, Sep 21, 2017 at 9:24 AM, Edward Creewrote: >> On 21/09/17 16:52, Alexei Starovoitov wrote: >>> imo >>> (u16) r4 endian be >>> isn't intuitive. >>> Can we come up with some better syntax? >>> Like >>> bswap16be r4 >>> bswap32le r4 >> Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be. >>> or >>> >>> to_be16 r4 >>> to_le32 r4 >> And the problem here is that it's not just to_be, it's also from_be. > Could you explain what is "from_be" here? Do not quite understand. Taking the example of a little-endian processor: cpu_to_be16() is a byte-swap, converting a u16 (cpu-endian) to a __be16. be16_to_cpu(), to convert a __be16 to a u16, is *also* a byte-swap. Meanwhile, cpu_to_le16() and le16_to_cpu() are both no-ops. More generally, the conversions between cpu-endian and fixed-endian for any given size are self-inverses. eBPF takes advantage of this by only having a single opcode for both the "to" and "from" direction. So to specify an endianness conversion, you need only the size and the fixed endianness (le or be), not the to/from direction. Conversely, when disassembling one of these instructions, you don't know whether it's a cpu_to_be16() or a be16_to_cpu(), because they both look the same at an instruction level (they only differ in what types the programmer thought of the register as holding before and after).
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Thu, Sep 21, 2017 at 9:24 AM, Edward Creewrote: > On 21/09/17 16:52, Alexei Starovoitov wrote: >> On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote: >>> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a >>> different structure: it has a size in insn->imm (even if it's BPF_X) and >>> uses the BPF_SRC (X or K) to indicate which endianness to use. So it >>> needs different code to print it. >>> >>> Signed-off-by: Edward Cree >>> --- >>> It's not the same format as the new LLVM asm uses, does that matter? >>> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all >>> endian ops are necessarily swaps (rather than sometimes nops). >> that is being fixed and we will fix asm format too. >> Let's pick good format first. > Agreed. >>> kernel/bpf/verifier.c | 13 +++-- >>> 1 file changed, 11 insertions(+), 2 deletions(-) >>> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >>> index 799b245..e7657a4 100644 >>> --- a/kernel/bpf/verifier.c >>> +++ b/kernel/bpf/verifier.c >>> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct >>> bpf_verifier_env *env, >>> u8 class = BPF_CLASS(insn->code); >>> >>> if (class == BPF_ALU || class == BPF_ALU64) { >>> -if (BPF_SRC(insn->code) == BPF_X) >>> +if (BPF_OP(insn->code) == BPF_END) { >>> +if (class == BPF_ALU64) >>> +verbose("BUG_alu64_%02x\n", insn->code); >>> +else >>> +verbose("(%02x) (u%d) r%d %s %s\n", >>> +insn->code, insn->imm, insn->dst_reg, >>> +bpf_alu_string[BPF_END >> 4], >>> +BPF_SRC(insn->code) == BPF_X ? "be" : >>> "le"); >> yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE. > Good point. >> imo >> (u16) r4 endian be >> isn't intuitive. >> Can we come up with some better syntax? >> Like >> bswap16be r4 >> bswap32le r4 > Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be. >> or >> >> to_be16 r4 >> to_le32 r4 > And the problem here is that it's not just to_be, it's also from_be. Could you explain what is "from_be" here? Do not quite understand. > Otherwise we could write `(be16) r4 = endian (u16) r4` and be much more > explicit about what's happening. > Really the operation is something like `cpu_tofrom_be16 r4`, but that also > seems a bit clumsy and longwinded. Also it's inconsistent with the other > ops that all indicate sizes with these (u16) etc casts. > `endian (be16) r4`, perhaps?
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Thu, Sep 21, 2017 at 05:24:10PM +0100, Edward Cree wrote: > On 21/09/17 16:52, Alexei Starovoitov wrote: > > On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote: > >> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a > >> different structure: it has a size in insn->imm (even if it's BPF_X) and > >> uses the BPF_SRC (X or K) to indicate which endianness to use. So it > >> needs different code to print it. > >> > >> Signed-off-by: Edward Cree> >> --- > >> It's not the same format as the new LLVM asm uses, does that matter? > >> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all > >> endian ops are necessarily swaps (rather than sometimes nops). > > that is being fixed and we will fix asm format too. > > Let's pick good format first. > Agreed. > >> kernel/bpf/verifier.c | 13 +++-- > >> 1 file changed, 11 insertions(+), 2 deletions(-) > >> > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >> index 799b245..e7657a4 100644 > >> --- a/kernel/bpf/verifier.c > >> +++ b/kernel/bpf/verifier.c > >> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct > >> bpf_verifier_env *env, > >>u8 class = BPF_CLASS(insn->code); > >> > >>if (class == BPF_ALU || class == BPF_ALU64) { > >> - if (BPF_SRC(insn->code) == BPF_X) > >> + if (BPF_OP(insn->code) == BPF_END) { > >> + if (class == BPF_ALU64) > >> + verbose("BUG_alu64_%02x\n", insn->code); > >> + else > >> + verbose("(%02x) (u%d) r%d %s %s\n", > >> + insn->code, insn->imm, insn->dst_reg, > >> + bpf_alu_string[BPF_END >> 4], > >> + BPF_SRC(insn->code) == BPF_X ? "be" : > >> "le"); > > yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE. > Good point. > > imo > > (u16) r4 endian be > > isn't intuitive. > > Can we come up with some better syntax? > > Like > > bswap16be r4 > > bswap32le r4 > Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be. > > or > > > > to_be16 r4 > > to_le32 r4 > And the problem here is that it's not just to_be, it's also from_be. > Otherwise we could write `(be16) r4 = endian (u16) r4` and be much more > explicit about what's happening. > Really the operation is something like `cpu_tofrom_be16 r4`, but that also > seems a bit clumsy and longwinded. Also it's inconsistent with the other > ops that all indicate sizes with these (u16) etc casts. > `endian (be16) r4`, perhaps? How about: r4 = (be16) (u16) r4 r4 = (le64) (u64) r4 most C like and most explicit ? it should be easy to grasp that (be16) cast on big-endian arch is a nop and swap on little.
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Thu, Sep 21, 2017 at 8:52 AM, Alexei Starovoitovwrote: > On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote: >> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a >> different structure: it has a size in insn->imm (even if it's BPF_X) and >> uses the BPF_SRC (X or K) to indicate which endianness to use. So it >> needs different code to print it. >> >> Signed-off-by: Edward Cree >> --- >> It's not the same format as the new LLVM asm uses, does that matter? >> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all >> endian ops are necessarily swaps (rather than sometimes nops). > > that is being fixed and we will fix asm format too. > Let's pick good format first. > >> kernel/bpf/verifier.c | 13 +++-- >> 1 file changed, 11 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 799b245..e7657a4 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct >> bpf_verifier_env *env, >> u8 class = BPF_CLASS(insn->code); >> >> if (class == BPF_ALU || class == BPF_ALU64) { >> - if (BPF_SRC(insn->code) == BPF_X) >> + if (BPF_OP(insn->code) == BPF_END) { >> + if (class == BPF_ALU64) >> + verbose("BUG_alu64_%02x\n", insn->code); >> + else >> + verbose("(%02x) (u%d) r%d %s %s\n", >> + insn->code, insn->imm, insn->dst_reg, >> + bpf_alu_string[BPF_END >> 4], >> + BPF_SRC(insn->code) == BPF_X ? "be" : >> "le"); > > yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE. > imo > (u16) r4 endian be > isn't intuitive. > Can we come up with some better syntax? > Like > bswap16be r4 > bswap32le r4 > > or > > to_be16 r4 > to_le32 r4 Currently, llvm bpf backend uses "bswap16 r4" "bswap32 r2" "bswap64 r2" syntax. I prefer something like "to_be16 r4" "to_le32 r4", or "bswap2be16" "bswap2le32" or something similar. This captures what the operation really is. Right the support to bswap2le will be added to LLVM soon. > > It will be more obvious what's happening. >
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 21/09/17 16:52, Alexei Starovoitov wrote: > On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote: >> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a >> different structure: it has a size in insn->imm (even if it's BPF_X) and >> uses the BPF_SRC (X or K) to indicate which endianness to use. So it >> needs different code to print it. >> >> Signed-off-by: Edward Cree>> --- >> It's not the same format as the new LLVM asm uses, does that matter? >> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all >> endian ops are necessarily swaps (rather than sometimes nops). > that is being fixed and we will fix asm format too. > Let's pick good format first. Agreed. >> kernel/bpf/verifier.c | 13 +++-- >> 1 file changed, 11 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 799b245..e7657a4 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct >> bpf_verifier_env *env, >> u8 class = BPF_CLASS(insn->code); >> >> if (class == BPF_ALU || class == BPF_ALU64) { >> -if (BPF_SRC(insn->code) == BPF_X) >> +if (BPF_OP(insn->code) == BPF_END) { >> +if (class == BPF_ALU64) >> +verbose("BUG_alu64_%02x\n", insn->code); >> +else >> +verbose("(%02x) (u%d) r%d %s %s\n", >> +insn->code, insn->imm, insn->dst_reg, >> +bpf_alu_string[BPF_END >> 4], >> +BPF_SRC(insn->code) == BPF_X ? "be" : >> "le"); > yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE. Good point. > imo > (u16) r4 endian be > isn't intuitive. > Can we come up with some better syntax? > Like > bswap16be r4 > bswap32le r4 Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be. > or > > to_be16 r4 > to_le32 r4 And the problem here is that it's not just to_be, it's also from_be. Otherwise we could write `(be16) r4 = endian (u16) r4` and be much more explicit about what's happening. Really the operation is something like `cpu_tofrom_be16 r4`, but that also seems a bit clumsy and longwinded. Also it's inconsistent with the other ops that all indicate sizes with these (u16) etc casts. `endian (be16) r4`, perhaps?
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote: > print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a > different structure: it has a size in insn->imm (even if it's BPF_X) and > uses the BPF_SRC (X or K) to indicate which endianness to use. So it > needs different code to print it. > > Signed-off-by: Edward Cree> --- > It's not the same format as the new LLVM asm uses, does that matter? > AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all > endian ops are necessarily swaps (rather than sometimes nops). that is being fixed and we will fix asm format too. Let's pick good format first. > kernel/bpf/verifier.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 799b245..e7657a4 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct > bpf_verifier_env *env, > u8 class = BPF_CLASS(insn->code); > > if (class == BPF_ALU || class == BPF_ALU64) { > - if (BPF_SRC(insn->code) == BPF_X) > + if (BPF_OP(insn->code) == BPF_END) { > + if (class == BPF_ALU64) > + verbose("BUG_alu64_%02x\n", insn->code); > + else > + verbose("(%02x) (u%d) r%d %s %s\n", > + insn->code, insn->imm, insn->dst_reg, > + bpf_alu_string[BPF_END >> 4], > + BPF_SRC(insn->code) == BPF_X ? "be" : > "le"); yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE. imo (u16) r4 endian be isn't intuitive. Can we come up with some better syntax? Like bswap16be r4 bswap32le r4 or to_be16 r4 to_le32 r4 It will be more obvious what's happening.
[PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a different structure: it has a size in insn->imm (even if it's BPF_X) and uses the BPF_SRC (X or K) to indicate which endianness to use. So it needs different code to print it. Signed-off-by: Edward Cree--- It's not the same format as the new LLVM asm uses, does that matter? AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all endian ops are necessarily swaps (rather than sometimes nops). kernel/bpf/verifier.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 799b245..e7657a4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct bpf_verifier_env *env, u8 class = BPF_CLASS(insn->code); if (class == BPF_ALU || class == BPF_ALU64) { - if (BPF_SRC(insn->code) == BPF_X) + if (BPF_OP(insn->code) == BPF_END) { + if (class == BPF_ALU64) + verbose("BUG_alu64_%02x\n", insn->code); + else + verbose("(%02x) (u%d) r%d %s %s\n", + insn->code, insn->imm, insn->dst_reg, + bpf_alu_string[BPF_END >> 4], + BPF_SRC(insn->code) == BPF_X ? "be" : "le"); + } else if (BPF_SRC(insn->code) == BPF_X) { verbose("(%02x) %sr%d %s %sr%d\n", insn->code, class == BPF_ALU ? "(u32) " : "", insn->dst_reg, bpf_alu_string[BPF_OP(insn->code) >> 4], class == BPF_ALU ? "(u32) " : "", insn->src_reg); - else + } else { verbose("(%02x) %sr%d %s %s%d\n", insn->code, class == BPF_ALU ? "(u32) " : "", insn->dst_reg, bpf_alu_string[BPF_OP(insn->code) >> 4], class == BPF_ALU ? "(u32) " : "", insn->imm); + } } else if (class == BPF_STX) { if (BPF_MODE(insn->code) == BPF_MEM) verbose("(%02x) *(%s *)(r%d %+d) = r%d\n",