** Description changed:
- Hi,
- we recorded more than 30% performance regression on Ubuntu Focal for AWS
Graviton instances since Nginx package is not compiled with "-moutline-atomics"
cflag for arm64 architecture (337548 rps with default package and 484453 rps
using the proposed flag).
+ [ Impact ]
- As far as I know, only Ubuntu 20.04 is affected by it and we are
- proposing a fix for it (https://code.launchpad.net/~dipietro-
- salvatore/ubuntu/+source/nginx/+git/nginx/+merge/434825).
+ When using nginx on arm64, the CPU may go under heavy load and drop
+ packets when being stressed.
+
+ A compile-time optimization to use hardware acceleration could be
+ included to help alleviate the CPU utilization for systems that have
+ atomic instructions available.
+
+ [ Test Plan ]
+
+ The test plan requires a decent amount of setup. This setup is covered
+ in full at https://github.com/mitchdz/aws-nginx-testbed.
+
+ In short, the setup involves a focal based arm64 nginx server, 4 amd64
+ focal endpoints running a nodejs applicaiton, and 1 amd64 focal server
+ to initiate the workloads.
+
+ This test is an E2E test which tests both functionality and performance
+ improvements.
+
+ 1) Set up the system as described in https://github.com/mitchdz/aws-
+ nginx-testbed
+
+ 2) Initiate the workload on the DRV instance
+ $ ./wrk/wrk -t36 -c512 -d60s http://172.31.33.24:80/ # Private IP of SUT
+
+ 3) While workload is running, capture results on SUT
+ $ sudo perf record -a -e r6e sleep 20s
+
+ 4) After the test is ran, you should have metrics on both the DRV and
+ SUT system.
+
+ 5) Install new nginx with -moutline atomics
+ $ sudo add-apt-repository ppa:mitchdz/lp2024019-moutline-atomic
+ $ sudo apt update
+ $ Sudo apt install -y nginx
+ $ dpkg -s nginx | grep Version
+ Version: 1.18.0-0ubuntu1.5~focal1
+
+ 6) ensure outline-atomics is installed (ctrl+f to see it, since this is not a
user friendly wall of text)
+ $ nginx -V |& grep moutline
+ configure arguments: --with-cc-opt='-g -O2
-fdebug-prefix-map=/build/nginx-v8ZFDO/nginx-1.18.0=. -fstack-protector-strong
-Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2
-moutline-atomics' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro
-Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf
--http-log-path=/var/log/nginx/access.log
--error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock
--pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules
--http-client-body-temp-path=/var/lib/nginx/body
--http-fastcgi-temp-path=/var/lib/nginx/fastcgi
--http-proxy-temp-path=/var/lib/nginx/proxy
--http-scgi-temp-path=/var/lib/nginx/scgi
--http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat
--with-pcre-jit --with-http_ssl_module --with-http_stub_status_module
--with-http_realip_module --with-http_auth_request_module --with-http_v2_module
--with-http_dav_module --with-http_slice_module --with-threads
--with-http_addition_module --with-http_gunzip_module
--with-http_gzip_static_module --with-http_image_filter_module=dynamic
--with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic
--with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module
+
+ 7) re-run wrk test as shown in steps 2-3
+
+ Results:
+ - without patch:
+- wrk output:
+ ```
+ ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s
http://172.31.40.247:80/
+ Running 1m test @ http://172.31.40.247:80/
+ 36 threads and 512 connections
+ Thread Stats Avg Stdev Max +/- Stdev
+ Latency 3.14ms2.03ms 211.63ms 72.12%
+ Req/Sec 4.67k 0.89k 23.92k76.44%
+ 10056815 requests in 1.00m, 5.20GB read
+ Requests/sec: 167336.42
+ Transfer/sec: 88.57MB
+ ```
+ - STREX count: Samples: 4M of event 'r6e', Event count (approx.): 364457784
+ - SUT CPU utiliz: 100%
+
+ - with patch:
+ - wrk output:
+ ```
+ ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s
http://172.31.40.247:80/
+ Running 1m test @ http://172.31.40.247:80/
+ 36 threads and 512 connections
+ Thread Stats Avg Stdev Max +/- Stdev
+ Latency 1.02ms1.04ms 210.87ms 99.52%
+ Req/Sec13.94k 327.0116.95k79.69%
+ 30006090 requests in 1.00m, 15.51GB read
+ Requests/sec: 499273.74
+ Transfer/sec:264.26MB
+ ```
+ - STREX count: No Samples
+ - SUT CPU utiliz: 20%
+ The important metrics here are the perf metrics, where we can see pre-patch
the CPU is under heavy load, and 864K STREX events are seen, whereas post-patch
the CPU is not under as heavy of a load with only 7 cycle events.
- Set up to reproduce the issue:
- - AWS instance: m6g.metal
- - AWS ami: ami-0aa916c7b0be51092
+ [ Where Problems Could Occur ]
+ Performance Trade-offs:
+ * While this will decrease the CPU load, it will