** Description changed:

- Hi, 
- we recorded more than 30% performance regression on Ubuntu Focal for AWS 
Graviton instances since Nginx package is not compiled with "-moutline-atomics" 
cflag for arm64 architecture (337548 rps with default package and 484453 rps 
using the proposed flag).
+ [ Impact ]
  
- As far as I know, only Ubuntu 20.04 is affected by it and we are
- proposing a fix for it (https://code.launchpad.net/~dipietro-
- salvatore/ubuntu/+source/nginx/+git/nginx/+merge/434825).
+ When using nginx on arm64, the CPU may go under heavy load and drop
+ packets when being stressed.
+ 
+ A compile-time optimization to use hardware acceleration could be
+ included to help alleviate the CPU utilization for systems that have
+ atomic instructions available.
+ 
+ [ Test Plan ]
+ 
+ The test plan requires a decent amount of setup. This setup is covered
+ in full at https://github.com/mitchdz/aws-nginx-testbed.
+ 
+ In short, the setup involves a focal based arm64 nginx server, 4 amd64
+ focal endpoints running a nodejs applicaiton, and 1 amd64 focal server
+ to initiate the workloads.
+ 
+ This test is an E2E test which tests both functionality and performance
+ improvements.
+ 
+ 1) Set up the system as described in https://github.com/mitchdz/aws-
+ nginx-testbed
+ 
+ 2) Initiate the workload on the DRV instance
+ $ ./wrk/wrk -t36 -c512 -d60s http://172.31.33.24:80/ # Private IP of SUT
+ 
+ 3) While workload is running, capture results on SUT
+ $ sudo perf record -a -e r6e sleep 20s
+ 
+ 4) After the test is ran, you should have metrics on both the DRV and
+ SUT system.
+ 
+ 5) Install new nginx with -moutline atomics
+ $ sudo add-apt-repository ppa:mitchdz/lp2024019-moutline-atomic
+ $ sudo apt update
+ $ Sudo apt install -y nginx
+ $ dpkg -s nginx | grep Version
+ Version: 1.18.0-0ubuntu1.5~focal1
+ 
+ 6) ensure outline-atomics is installed (ctrl+f to see it, since this is not a 
user friendly wall of text)
+ $ nginx -V |& grep moutline
+ configure arguments: --with-cc-opt='-g -O2 
-fdebug-prefix-map=/build/nginx-v8ZFDO/nginx-1.18.0=. -fstack-protector-strong 
-Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 
-moutline-atomics' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro 
-Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf 
--http-log-path=/var/log/nginx/access.log 
--error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock 
--pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules 
--http-client-body-temp-path=/var/lib/nginx/body 
--http-fastcgi-temp-path=/var/lib/nginx/fastcgi 
--http-proxy-temp-path=/var/lib/nginx/proxy 
--http-scgi-temp-path=/var/lib/nginx/scgi 
--http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat 
--with-pcre-jit --with-http_ssl_module --with-http_stub_status_module 
--with-http_realip_module --with-http_auth_request_module --with-http_v2_module 
--with-http_dav_module --with-http_slice_module --with-threads 
--with-http_addition_module --with-http_gunzip_module 
--with-http_gzip_static_module --with-http_image_filter_module=dynamic 
--with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic 
--with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module
+ 
+ 7) re-run wrk test as shown in steps 2-3
+ 
+ Results:
+ - without patch:
+    - wrk output:
+     ```
+       ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512  -d60s 
http://172.31.40.247:80/
+       Running 1m test @ http://172.31.40.247:80/
+         36 threads and 512 connections
+         Thread Stats   Avg      Stdev     Max   +/- Stdev
+           Latency     3.14ms    2.03ms 211.63ms   72.12%
+           Req/Sec     4.67k     0.89k   23.92k    76.44%
+         10056815 requests in 1.00m, 5.20GB read
+       Requests/sec: 167336.42
+       Transfer/sec:     88.57MB
+     ```
+   - STREX count: Samples: 4M of event 'r6e', Event count (approx.): 364457784
+   - SUT CPU utiliz: 100%
+ 
+ - with patch:
+   -  wrk output:
+     ```
+       ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512  -d60s 
http://172.31.40.247:80/
+       Running 1m test @ http://172.31.40.247:80/
+         36 threads and 512 connections
+         Thread Stats   Avg      Stdev     Max   +/- Stdev
+           Latency     1.02ms    1.04ms 210.87ms   99.52%
+           Req/Sec    13.94k   327.01    16.95k    79.69%
+         30006090 requests in 1.00m, 15.51GB read
+       Requests/sec: 499273.74
+       Transfer/sec:    264.26MB
+     ```
+   - STREX count: No Samples
+   - SUT CPU utiliz: 20%
  
  
+ The important metrics here are the perf metrics, where we can see pre-patch 
the CPU is under heavy load, and 864K STREX events are seen, whereas post-patch 
the CPU is not under as heavy of a load with only 7 cycle events.
  
- Set up to reproduce the issue:
  
- - AWS instance: m6g.metal  
- - AWS ami: ami-0aa916c7b0be51092
+ [ Where Problems Could Occur ]
+ Performance Trade-offs:
+ * While this will decrease the CPU load, it will increase the utilization of 
atomic operations.
+ * outlining atomics can make debugging more complicated, especially in the 
case of concurrency debugging.
+ * This improves application performance through offloading instructions. This 
can reveal bugs that were not possible before such as race conditions, 
deadlocks, or incorrect synchronization.
+ * This optimization adds a run-time check for the availability of atomic 
instructions. If atomics instructions are not found, ARMv8.0 compatible code is 
executed, so overhead will be added to systems that do not have atomic 
instructions.
  
- - lsb_release -rd:
- Description:    Ubuntu 20.04.6 LTS
- Release:        20.04
+ [ Other Info ]
+ * Why is -moutline-atomics not enabled already?
+ Focal uses gcc-9 which does not enable -moutline-atomics by default. gcc-10 
is when it became enabled by default.
  
- - apt-cache policy nginx:
- nginx:
-   Installed: (none)
-   Candidate: 1.18.0-0ubuntu1.4
-   Version table:
-      1.18.0-0ubuntu1.4 500
-         500 http://us-west-2.ec2.ports.ubuntu.com/ubuntu-ports 
focal-updates/main arm64 Packages
-         500 http://ports.ubuntu.com/ubuntu-ports focal-security/main arm64 
Packages
-      1.17.10-0ubuntu1 500
-         500 http://us-west-2.ec2.ports.ubuntu.com/ubuntu-ports focal/main 
arm64 Packages
+ https://devdocs.io/gcc~9/aarch64-options
+ vs
+ https://devdocs.io/gcc~10/aarch64-options
+ 
+ gcc-9 changelog showing the addition of -moutline-atomics:
+ https://gcc.gnu.org/gcc-9/changes.html
+ 
+ Here is the thread with the discussion to enable by default:
+ https://gcc.gnu.org/pipermail/gcc/2020-April/000490.html
+ 
+ A lengthy discussion also happened in Debian to include this flag by
+ default - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418

** Changed in: haproxy (Ubuntu)
       Status: New => Invalid

** Changed in: haproxy (Ubuntu Focal)
       Status: New => In Progress

** Changed in: haproxy (Ubuntu Focal)
       Status: In Progress => Triaged

** Changed in: nginx (Ubuntu Focal)
       Status: New => In Progress

** Changed in: postgresql-12 (Ubuntu Focal)
       Status: New => Triaged

** Changed in: postgresql-12 (Ubuntu)
       Status: New => Invalid

** Changed in: nginx (Ubuntu)
       Status: New => Invalid

** Description changed:

  [ Impact ]
  
  When using nginx on arm64, the CPU may go under heavy load and drop
  packets when being stressed.
  
  A compile-time optimization to use hardware acceleration could be
  included to help alleviate the CPU utilization for systems that have
  atomic instructions available.
  
  [ Test Plan ]
  
  The test plan requires a decent amount of setup. This setup is covered
  in full at https://github.com/mitchdz/aws-nginx-testbed.
  
  In short, the setup involves a focal based arm64 nginx server, 4 amd64
  focal endpoints running a nodejs applicaiton, and 1 amd64 focal server
  to initiate the workloads.
  
  This test is an E2E test which tests both functionality and performance
  improvements.
  
  1) Set up the system as described in https://github.com/mitchdz/aws-
  nginx-testbed
  
  2) Initiate the workload on the DRV instance
  $ ./wrk/wrk -t36 -c512 -d60s http://172.31.33.24:80/ # Private IP of SUT
  
  3) While workload is running, capture results on SUT
  $ sudo perf record -a -e r6e sleep 20s
  
  4) After the test is ran, you should have metrics on both the DRV and
  SUT system.
  
  5) Install new nginx with -moutline atomics
  $ sudo add-apt-repository ppa:mitchdz/lp2024019-moutline-atomic
  $ sudo apt update
  $ Sudo apt install -y nginx
  $ dpkg -s nginx | grep Version
  Version: 1.18.0-0ubuntu1.5~focal1
  
  6) ensure outline-atomics is installed (ctrl+f to see it, since this is not a 
user friendly wall of text)
  $ nginx -V |& grep moutline
  configure arguments: --with-cc-opt='-g -O2 
-fdebug-prefix-map=/build/nginx-v8ZFDO/nginx-1.18.0=. -fstack-protector-strong 
-Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 
-moutline-atomics' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro 
-Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf 
--http-log-path=/var/log/nginx/access.log 
--error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock 
--pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules 
--http-client-body-temp-path=/var/lib/nginx/body 
--http-fastcgi-temp-path=/var/lib/nginx/fastcgi 
--http-proxy-temp-path=/var/lib/nginx/proxy 
--http-scgi-temp-path=/var/lib/nginx/scgi 
--http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat 
--with-pcre-jit --with-http_ssl_module --with-http_stub_status_module 
--with-http_realip_module --with-http_auth_request_module --with-http_v2_module 
--with-http_dav_module --with-http_slice_module --with-threads 
--with-http_addition_module --with-http_gunzip_module 
--with-http_gzip_static_module --with-http_image_filter_module=dynamic 
--with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic 
--with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module
  
  7) re-run wrk test as shown in steps 2-3
  
  Results:
  - without patch:
-    - wrk output:
-     ```
-       ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512  -d60s 
http://172.31.40.247:80/
-       Running 1m test @ http://172.31.40.247:80/
-         36 threads and 512 connections
-         Thread Stats   Avg      Stdev     Max   +/- Stdev
-           Latency     3.14ms    2.03ms 211.63ms   72.12%
-           Req/Sec     4.67k     0.89k   23.92k    76.44%
-         10056815 requests in 1.00m, 5.20GB read
-       Requests/sec: 167336.42
-       Transfer/sec:     88.57MB
-     ```
-   - STREX count: Samples: 4M of event 'r6e', Event count (approx.): 364457784
-   - SUT CPU utiliz: 100%
+    - wrk output:
+     ```
+       ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512  -d60s 
http://172.31.40.247:80/
+       Running 1m test @ http://172.31.40.247:80/
+         36 threads and 512 connections
+         Thread Stats   Avg      Stdev     Max   +/- Stdev
+           Latency     3.14ms    2.03ms 211.63ms   72.12%
+           Req/Sec     4.67k     0.89k   23.92k    76.44%
+         10056815 requests in 1.00m, 5.20GB read
+       Requests/sec: 167336.42
+       Transfer/sec:     88.57MB
+     ```
+   - STREX count: Samples: 4M of event 'r6e', Event count (approx.): 364457784
+   - SUT CPU utiliz: 100%
  
  - with patch:
-   -  wrk output:
-     ```
-       ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512  -d60s 
http://172.31.40.247:80/
-       Running 1m test @ http://172.31.40.247:80/
-         36 threads and 512 connections
-         Thread Stats   Avg      Stdev     Max   +/- Stdev
-           Latency     1.02ms    1.04ms 210.87ms   99.52%
-           Req/Sec    13.94k   327.01    16.95k    79.69%
-         30006090 requests in 1.00m, 15.51GB read
-       Requests/sec: 499273.74
-       Transfer/sec:    264.26MB
-     ```
-   - STREX count: No Samples
-   - SUT CPU utiliz: 20%
- 
- 
- The important metrics here are the perf metrics, where we can see pre-patch 
the CPU is under heavy load, and 864K STREX events are seen, whereas post-patch 
the CPU is not under as heavy of a load with only 7 cycle events.
+   -  wrk output:
+     ```
+       ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512  -d60s 
http://172.31.40.247:80/
+       Running 1m test @ http://172.31.40.247:80/
+         36 threads and 512 connections
+         Thread Stats   Avg      Stdev     Max   +/- Stdev
+           Latency     1.02ms    1.04ms 210.87ms   99.52%
+           Req/Sec    13.94k   327.01    16.95k    79.69%
+         30006090 requests in 1.00m, 15.51GB read
+       Requests/sec: 499273.74
+       Transfer/sec:    264.26MB
+     ```
+   - STREX count: No Samples
+   - SUT CPU utiliz: 20%
  
  
  [ Where Problems Could Occur ]
  Performance Trade-offs:
  * While this will decrease the CPU load, it will increase the utilization of 
atomic operations.
  * outlining atomics can make debugging more complicated, especially in the 
case of concurrency debugging.
  * This improves application performance through offloading instructions. This 
can reveal bugs that were not possible before such as race conditions, 
deadlocks, or incorrect synchronization.
  * This optimization adds a run-time check for the availability of atomic 
instructions. If atomics instructions are not found, ARMv8.0 compatible code is 
executed, so overhead will be added to systems that do not have atomic 
instructions.
  
  [ Other Info ]
  * Why is -moutline-atomics not enabled already?
  Focal uses gcc-9 which does not enable -moutline-atomics by default. gcc-10 
is when it became enabled by default.
  
  https://devdocs.io/gcc~9/aarch64-options
  vs
  https://devdocs.io/gcc~10/aarch64-options
  
  gcc-9 changelog showing the addition of -moutline-atomics:
  https://gcc.gnu.org/gcc-9/changes.html
  
  Here is the thread with the discussion to enable by default:
  https://gcc.gnu.org/pipermail/gcc/2020-April/000490.html
  
  A lengthy discussion also happened in Debian to include this flag by
  default - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2024019

Title:
  Add GCC atomic support (-moutline-atomics) for arm64 on Focal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/2024019/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to