This is partially a FYI for Google-ing, but I'd love some guidance from the
ATS devs on how to fix this issue in a 'better' way, as well as if I should
reopen a 8.x github issue (see next paragraph).

Background:
I run 8.1.1 in a fairly traffic-heavy environment with ATS (sitting behind
nginx) doing reverse-proxying to backend webservers. It's ubuntu bionic,
running a grsecurity-enabled kernel (I haven't been able to test the below
scenario on a non-grsec kernel). I've been seeing some crashes lately with
8.1.1 that look like https://github.com/apache/trafficserver/issues/4921,
i.e. "Fatal: HttpSM.cc:2533: failed assertion `magic ==
HTTP_SM_MAGIC_ALIVE`" (and yes, I see the comment in the ticket that
someone should reopen the ticket if they see that on 8.1.x, but I haven't
gotten around to it yet -- see the first line of this email).

Today:
So due to those 8.1.1 crashes, I was trying out 9.0.1 and ran into the same
issue that I initially ran into with 9.0.0: traffic_manager starts up and
then starts traffic_server, and traffic_server crashes with:

[May 11 18:36:41.343] traffic_server ERROR: [ReverseProxy] failed to add
remap rule at /etc/trafficserver/remap.config line 198:
/run/trafficserver/37882f60-2d8b-45e0-b8d8-ed2208c4c221/usr/lib/trafficserver/modules/tslua.so:
failed to map segment from shared objectfailed to remove runtime copy:
Success

and traffic_manager dutifully tries again and again forever.

And we use Lua to do all of our remapping (basically, to set the correct
DNS name for ATS to look up), so it's a must-have for us.

After some strace'ing, I figured out that Bionic by default has /run
mounted with 'noexec'. If I remount /run with 'exec', I don't get this
error anymore and everything loads up just fine.

In the straces I was looking at, it looks like traffic_server is copying
the tslua.so module into that temp directory under /run/trafficserver and
then trying to mmap() the newly copied tslua.so file in /run and failing:

openat(AT_FDCWD,
"/run/trafficserver/e667e0ea-fc3b-43fd-945d-63fc4d9dc6e4/usr/lib/trafficserver/modules/tslua.so",
O_RDONLY|O_CLOEXEC) = 38
read(38, "\177ELF\2\1\1\0\0\0<deleted for brevity sake>0\0\22\0\0\0", 832)
= 832
fstat(38, {st_mode=S_IFREG|0644, st_size=152304, ...}) = 0
mmap(NULL, 2251784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 38, 0)
= -1 EPERM (Operation not permitted)
close(38)                               = 0

With /run mounted with 'exec', that mmap() succeeds and life is great.

So for the ATS devs:

* Is this just me (probably due to my somewhat obscure kernel)? Typically
I'd see system log entries and/or dmesg in the case of grsecurity blocking
it, but nothing here.

* Did I miss something in the upgrade notes? I can't find anything about
this behavior (copying modules to a dir under /run/trafficserver, instead
of mmap'ing them directly from /usr/lib/). The configs I'm using were
untouched from 8.1.1 (haven't yet removed the deprecated stuff). Normally,
I'd chalk stuff up to operator error (and this still likely is), but this
one seemed kind of weird.

* Is there something I can set to prevent that behavior? I'd rather not
remount boxes' /run (not to mention, it's done by systemd automatically, so
it's not clear where to modify mount flags for /run; everything has a
*.mount unit file except for /run itself). Ideally, traffic_server would
just mmap /usr/lib/trafficserver/modules/tslua.so directly. But if that's
not possible, is there a setting to override the location? I tried
setting proxy.config.local_state_dir to something outside of /run; lock
files and sockets get created but traffic_server immediately dies). I
created /tmp/trafficserver and used that for local_state_dir, but
traffic_manager complains that:

Fatal: failed to connect management socket
'/run/trafficserver/processerver.sock': No such file or directory

so it's still using /run/trafficserver for some things. I guess there's
more to it than just the proxy.config.local_state_dir setting. And
"traffic_ctl config match . | grep run" returns only the
"proxy.config.alarm.script_runtime" setting (If I override local_state_dir,
of course, from /run/trafficserver).

* For the 8.1.1 "failed assertion `magic == HTTP_SM_MAGIC_ALIVE`" issue,
should I reopen that github issue or start a new one?

Thanks!

Reply via email to