Hi,
We're trying to launch bcc on an ARM 32-bit machine. We've managed to
cook a version that works, but with a few quirks.
Besides adding general setup for compiling on ARM (defining registers,
calling convention, syscall number), we made some changes and
corrections to type sizes (mostly ulonglong -> size_t in libbcc.py),
disabled some parts of kernel headers and disabled 128-bit integer support.
(https://github.com/aszyndela/bcc/commit/bed992f2842007ec5563ec75caffe6e03b6bb7c4)
Additionally, we encountered the following quirks:
- 'memset', which comes from rewriting in libbcc, is redefined in ARM
kernel headers
(https://elixir.free-electrons.com/linux/latest/source/arch/arm/include/asm/string.h#L29).
Redefined code calls __memzero, which is not allowed in bpf code because
it is an external function call, so we changed memset to
__builtin_memset when the code is rewritten.
(https://github.com/aszyndela/bcc/commit/6cc9adc4e6e8cc775b1d08641d0bf9c9f48ce3e2)
- return type of bpf_pseudo_fd is 'u64', and it is passed to functions
that take pointers; the compiler generates 'trunc' in IR for this, which
becomes shift-left + shift-right in final bpf code; and bpf verifier
does not like it when pointers are shifted. We worked it around by
changing return type of bpf_pseudo_fd/llvm.bpf.pseudo from 'u64' to
'void *' - both in bcc
(https://github.com/aszyndela/bcc/commit/8eb8fb2e265b93d38c1ad53efcb7da8883a3d643)
and llvm
(include/llvm/IR/IntrinsicsBPF.td):
def int_bpf_pseudo : GCCBuiltin<"__builtin_bpf_pseudo">,
- Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty]>;
+ Intrinsic<[llvm_ptr_ty], [llvm_i64_ty, llvm_i64_ty]>;
Obviously we don't know what we're doing, but at least it works for us.
- when compiling to IR, the compiler considers pointers as 32-bit (as
the target triple is set to something like
armv7l-unknown-linux-gnueabi). But, when compiling from IR to bpf
pointers present in IR code are considered 64-bit. This needs more
explanation probably. Here is an example - hand-written code, which
looks (suspiciously) similar to the code generated by bcc for dereferences:
struct teststruct
{
const char *field;
};
int test_ko(struct pt_regs *ctx) {
void *conn = ctx->uregs[0];
struct teststruct *t = ctx->uregs[1];
const char *c = ({
typeof(void*) _val;
__builtin_memset(&_val, 0, sizeof(_val));
bpf_probe_read(&_val, sizeof(_val), (char *)t
+ offsetof(struct teststruct, field));
_val; });
return ({
typeof(const char) _val;
__builtin_memset(&_val, 0, sizeof(_val));
bpf_probe_read(&_val, sizeof(_val), (char *)c);
_val; });
}
The interesting part is the first __builtin_memset. After compiling to
IR it becomes this:
%2 = alloca i8*, align 4
[...]
%8 = bitcast i8** %2 to i32*
store i32 0, i32* %8, align 4
It explicitly sets 4 bytes on the stack to 0, because sizeof(void*) on
ARM is 4. Thus, 'void * val' should be zeroed, right? No. Because after
compiling to final bpf code, i8* allocated at %2 becomes 64-bit and we
got the following code, rejected by the verifier:
5: R6=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
5: (63) *(u32 *)(r10 -8) = r6
[...]
10: (79) r3 = *(u64 *)(r10 -8)
invalid read from stack off -8+4 size 8
But, if we switch __builtin_memset to explicit initialization, things
get better:
int test_ok(struct pt_regs *ctx) {
void *conn = ctx->uregs[0];
struct teststruct *t = ctx->uregs[1];
const char *c = ({
typeof(void*) _val = 0;
// instead of __builtin_memset(&_val, 0, sizeof(_val));
bpf_probe_read(&_val, sizeof(_val), (char *)t
+ offsetof(struct teststruct, field));
_val; });
return ({
typeof(const char) _val;
__builtin_memset(&_val, 0, sizeof(_val));
bpf_probe_read(&_val, sizeof(_val), (char *)c);
_val; });
}
IR is:
%2 = alloca i8*, align 4
[...]
store i8* null, i8** %2, align 4, !tbaa !14
And final bpf code is:
5: R6=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
5: (7b) *(u64 *)(r10 -8) = r6
[...]
10: (79) r3 = *(u64 *)(r10 -8)
Of course, as a workaround for this we explicitly use bpf_probe_read
with explicit initialization in our tools. Not pretty at all...
With those changes and workarounds applied we are able to run selected
bcc tools on 32-bit ARM. It's very hacky, so our motivation is to fix it
properly, and possibly have it integrated upstream. There is also a
couple of issues with uprobes on 32-bit ARM, but that's a different story.
What do you guys think about all of this? I would be grateful for comments.
The changes we've made are available at https://github.com/aszyndela/bcc
Thanks,
Adrian
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org