[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED
Answering question 2. I have done a comprehensive performance analysis based on the benchmark application. Note: The SRU changes how the sys_membarrier syscall is used. The implementation that we want to change to in this SRU never blocks, while the previous implementation does. This makes performance analysis entirely workload dependant. On busy servers with lots of background processes, sys_membarrier will block more often, compared to quiet servers with no background processes. The following is based on a quiet server with no background processes. Test parameters === Ubuntu 18.04.4 KVM, 2 vcpus 0.10.1 liburcu 4.15.0-99-generic Test program "test_urcu[_bp]": http://paste.ubuntu.com/p/5vXVycQjYk/ (only difference is #include or #include ) No changes to source code = ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu 6 2 10 nr_reads 6065490002 nr_writes 237 nr_ops 6065490239 nr_reads 6476219475 nr_writes 186 nr_ops 6476219661 nr_reads 6474789528 nr_writes 183 nr_ops 6474789711 nr_reads 6476326433 nr_writes 188 nr_ops 6476326621 nr_reads 6479298142 nr_writes 179 nr_ops 6479298321 nr_reads 6476429569 nr_writes 186 nr_ops 6476429755 nr_reads 6478019994 nr_writes 191 nr_ops 6478020185 nr_reads 6479117595 nr_writes 183 nr_ops 6479117778 nr_reads 6478302181 nr_writes 185 nr_ops 6478302366 nr_reads 6481003399 nr_writes 191 nr_ops 6481003590 ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu_bp 6 2 10 nr_reads644339902 nr_writes 485 nr_ops644340387 nr_reads644092800 nr_writes 1101 nr_ops644093901 nr_reads644676446 nr_writes 494 nr_ops644676940 nr_reads643845915 nr_writes 500 nr_ops643846415 nr_reads645156053 nr_writes 502 nr_ops645156555 nr_reads644626421 nr_writes 497 nr_ops644626918 nr_reads644710679 nr_writes 495 nr_ops644711174 nr_reads65530 nr_writes 503 nr_ops66033 nr_reads645150707 nr_writes 497 nr_ops645151204 nr_reads643681268 nr_writes 496 nr_ops643681764 Commits c0bb9f and 374530 patched in ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu 6 2 10 nr_reads 4097663510 nr_writes 6516 nr_ops 4097670026 nr_reads 4177088332 nr_writes 4183 nr_ops 4177092515 nr_reads 4153780077 nr_writes 1907 nr_ops 4153781984 nr_reads 4150954044 nr_writes 3942 nr_ops 4150957986 nr_reads 4267855073 nr_writes 2102 nr_ops 4267857175 nr_reads 4131310825 nr_writes 7119 nr_ops 4131317944 nr_reads 4183771431 nr_writes 1919 nr_ops 4183773350 nr_reads 4270944170 nr_writes 4958 nr_ops 4270949128 nr_reads 4123277225 nr_writes 4228 nr_ops 4123281453 nr_reads 4266997284 nr_writes 1723 nr_ops 4266999007 ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu_bp 6 2 10 nr_reads 6530208343 nr_writes 8860 nr_ops 6530217203 nr_reads 6514357222 nr_writes10568 nr_ops 6514367790 nr_reads 6517420660 nr_writes 9534 nr_ops 6517430194 nr_reads 6510005433 nr_writes11799 nr_ops 6510017232 nr_reads 6492226563 nr_writes12517 nr_ops 6492239080 nr_reads 6532405460 nr_writes 6548 nr_ops 6532412008 nr_reads 6514205150 nr_writes 9686 nr_ops 6514214836 nr_reads 6481643486 nr_writes16167 nr_ops 6481659653 nr_reads 6509268022 nr_writes10582 nr_ops 6509278604 nr_reads 6523168701 nr_writes 9066 nr_ops 6523177767 Comparing and contrasting with 20.04: = Test Parameters: Ubuntu 20.04 LTS KVM, 2 vcpus 0.11.1 liburcu 5.4.0-29-generic ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu 6 2 10 nr_reads 4270089636 nr_writes 1638 nr_ops 4270091274 nr_reads 4281598850 nr_writes 3008 nr_ops 4281601858 nr_reads 4241230576 nr_writes 3612 nr_ops 4241234188 nr_reads 4230643208 nr_writes 5367 nr_ops 4230648575 nr_reads 4333495124 nr_writes 1354 nr_ops 4333496478 nr_reads 4291295097 nr_writes 3545 nr_ops 4291298642 nr_reads 4232582737 nr_writes 1983 nr_ops 4232584720 nr_reads 4268926719 nr_writes 3363 nr_ops 4268930082 nr_reads 4266736459 nr_writes 4881 nr_ops 4266741340 nr_reads 4313525276 nr_writes 4549 nr_ops 4313529825 ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu_bp 6 2 10 nr_reads 6848011482 nr_writes 3171 nr_ops 6848014653 nr_reads 6842990129 nr_writes 4577 nr_ops 6842994706 nr_reads 6862298832 nr_writes 2875 nr_ops 6862301707 nr_reads 6849848255 nr_writes 4292 nr_ops 6849852547 nr_reads
[Sts-sponsors] [Bug 1871685] Re: [SRU] vagrant spits out ruby deprecation warnings on every call
Hello Hartmut, or anyone else affected, Accepted vagrant into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vagrant/2.2.6+dfsg- 2ubuntu2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-focal. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: vagrant (Ubuntu Focal) Status: Triaged => Fix Committed ** Tags removed: verification-done-focal ** Tags added: verification-needed verification-needed-focal -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1871685 Title: [SRU] vagrant spits out ruby deprecation warnings on every call Status in vagrant package in Ubuntu: Fix Released Status in vagrant source package in Focal: Fix Committed Bug description: [Impact] This issue is not critical but I believe it degrades a lot the user experience in a brand new LTS release. Every time one calls vagrant via CLI, Ruby 2.7 throws a bunch of warnings about some deprecated features, which for some (inexperienced?) users might seem a failure in a first look. This was reported not just here as a bug report but also in Discourse: https://discourse.ubuntu.com/t/workarounds-for-applications-which-are- broken-in-20-04lts/15474/5 [Test Case] In a Focal LXD container: $ apt install vagrant $ vagrant NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01. Gem::Specification.default_specifications_dir called from /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/bundler.rb:428. /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/errors.rb:103: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call /usr/share/rubygems-integration/all/gems/i18n-1.8.2/lib/i18n.rb:195: warning: The called method `t' is defined here (eval):3: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/ui.rb:223: warning: The called method `say' is defined here Usage: vagrant [options] [] -v, --versionPrint the version and exit. -h, --help Print this help. Common commands: box manages boxes: installation, removal, etc. cloud manages everything related to Vagrant Cloud destroy stops and deletes all traces of the vagrant machine global-status outputs status Vagrant environments for this user haltstops the vagrant machine helpshows the help for a subcommand initinitializes a new Vagrant environment by creating a Vagrantfile login package packages a running vagrant environment into a box plugin manages plugins: install, uninstall, update, etc. portdisplays information about guest port mappings powershell connects to machine via powershell remoting provision provisions the vagrant machine pushdeploys code in this environment to a configured destination rdp connects to machine via RDP reload restarts vagrant machine, loads new Vagrantfile configuration resume resume a suspended vagrant machine snapshotmanages snapshots: saving, restoring, etc. ssh connects to machine via SSH ssh-config outputs OpenSSH valid configuration to connect to the machine status outputs status of the vagrant machine suspend suspends the machine up starts and provisions the vagrant environment upload upload to machine via communicator validatevalidat
[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED
To answer question 1, I went and checked every rdepends package: gdnsd: dynamically links to -lurcu-qsbr $ ldd /usr/sbin/gdnsd liburcu-qsbr.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-qsbr.so.6 (0x7f33bfde5000) glusterfs: The only package I am not entirely sure about. Only glusterd uses urcu: ubuntu@ubuntu:~/glusterfs-3.13.2$ grep -Rin " header file. */ ./config.h.in:203:/* Define to 1 if you have the header file. */ ./xlators/mgmt/glusterd/src/glusterd-rcu.h:14:#include ./xlators/mgmt/glusterd/src/glusterd-rcu.h:15:#include ./xlators/mgmt/glusterd/src/glusterd-rcu.h:16:#include ./xlators/mgmt/glusterd/src/glusterd-rcu.h:17:#include ./xlators/mgmt/glusterd/src/glusterd-rcu.h:18:#include ./xlators/mgmt/glusterd/src/glusterd-conn-helper.c:15:#include $ ldd /usr/sbin/glusterd | grep urcu No mention of static linking either: ubuntu@ubuntu:~/glusterfs-3.13.2$ grep -Rin "\.a" . | grep urcu The library linker settings are in ./configure.ac: dnl Check for userspace-rcu PKG_CHECK_MODULES([URCU], [liburcu-bp], [], [AC_CHECK_HEADERS([urcu-bp.h], [URCU_LIBS='-lurcu-bp'], AC_MSG_ERROR([liburcu-bp not found]))]) PKG_CHECK_MODULES([URCU_CDS], [liburcu-cds >= 0.8], [], [PKG_CHECK_MODULES([URCU_CDS], [liburcu-cds >= 0.7], [AC_DEFINE(URCU_OLD, 1, [Define if liburcu 0.6 or 0.7 is found])], [AC_CHECK_HEADERS([urcu/cds.h], [AC_DEFINE(URCU_OLD, 1, [Define if liburcu 0.6 or 0.7 is found]) URCU_CDS_LIBS='-lurcu-cds'], [AC_MSG_ERROR([liburcu-cds not found])])])]) I ran ldd over all glusterfs binaries which are listed by dpkg -L, but they all came back negative for urcu. From what I can tell, glusterd from the xlators directory is either not built, or does not link against urcu in Ubuntu. knot: dynamically links to -lurcu ubuntu@ubuntu:~$ ldd /usr/sbin/knotd liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x7fc9d1e8b000) lttng: dynamically links to various urcu libraries $ ldd /usr/bin/lttng liburcu-common.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-common.so.6 (0x7f6711a2f000) liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x7f6711827000) liburcu-cds.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-cds.so.6 (0x7f671161d000) multipath-tools: dynamically links to -lurcu $ ldd /sbin/multipath liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x7fe63ace1000) netsniff-ng: dynamically links to -lurcu ~/netsniff-ng-0.6.4$ grep -Rin "urcu" flowtop/Makefile:1:flowtop-libs = -lurcu \ ui.h:7:#include flowtop.c:28:#include flowtop.c:29:#include flowtop.c:30:#include INSTALL:28: - liburcu:flowtop $ ldd /usr/sbin/flowtop liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x7f7561a5a000) sheepdog: this program doesn't link against urcu at all! It only uses the header file and not anything more. search for urcu in sheepdog: https://paste.ubuntu.com/p/VPKr4pWtQg/ explanation found in d/changelog: https://paste.ubuntu.com/p/K8XwjK2czD/ ust: dynamically links to various urcu libraries $ ldd /usr/lib/x86_64-linux-gnu/liblttng-ust-cyg-profile-fast.so.0.0.0 liburcu-bp.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-bp.so.6 (0x7f91b0c13000) liburcu-cds.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-cds.so.6 (0x7f91b01f4000) >From what I can tell, no rdepends package statically links to liburcu, and they all use dynamic linking, meaning no packages will need to be rebuilt for this SRU. -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1876230 Title: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED Status in liburcu package in Ubuntu: Fix Released Status in liburcu source package in Bionic: In Progress Bug description: [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process,
[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED
Hi @mruffell, two questions for this sru: 1) it looks like static libs are built/provided by this package: $ pull-lp-debs liburcu bionic ; for p in *.deb ; do echo "$p:" ; dpkg-deb -c $p | grep -E '*\.a' ; done Found liburcu 0.10.1-1 in bionic Using existing file liburcu-dev_0.10.1-1_amd64.deb Using existing file liburcu6_0.10.1-1_amd64.deb liburcu6_0.10.1-1_amd64.deb: liburcu-dev_0.10.1-1_amd64.deb: -rw-r--r-- root/root 47956 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-bp.a -rw-r--r-- root/root 69844 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-cds.a -rw-r--r-- root/root 23912 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-common.a -rw-r--r-- root/root 43750 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-mb.a -rw-r--r-- root/root 45642 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-qsbr.a -rw-r--r-- root/root 45716 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-signal.a -rw-r--r-- root/root 45148 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu.a and several other pkgs have build-dep for that: $ reverse-depends -b -r bionic liburcu-dev Reverse-Build-Depends * gdnsd * glusterfs * knot * ltt-control * multipath-tools * netsniff-ng * sheepdog * ust Can you check those packages to see if any use static linking (and thus should be recompiled with the updated static liburcu libs)? 2) In your testing results comparison: > ./test_urcu 6 2 10 > 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops > 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops The number of writes is obviously much, much better; however the number of reads actually goes down with the patched code. > $ ./test_urcu_bp 6 2 10 > 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops > 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops Similarly, while the number of reads increases significantly, the number of writes goes down. I may be misreading the results, but it seems like this change is not an across-the-board improvement, but more of a performance trade-off. If that's the case, I think it will be hard to make the case this should be included as an SRU. Can you clarify the results comparison in more detail please? -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1876230 Title: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED Status in liburcu package in Ubuntu: Fix Released Status in liburcu source package in Bionic: In Progress Bug description: [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system. Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers. Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED. Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package. This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic. [Test] Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes. A test package is available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github: $ pull-lp-source liburcu bion
[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED
** Tags added: sts-sponsor-ddstreet -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1876230 Title: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED Status in liburcu package in Ubuntu: Fix Released Status in liburcu source package in Bionic: In Progress Bug description: [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system. Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers. Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED. Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package. This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic. [Test] Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes. A test package is available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github: $ pull-lp-source liburcu bionic # OR $ git clone https://github.com/urcu/userspace-rcu.git $ git checkout v0.10.1 # version in bionic Build the code: $ ./bootstrap $ ./configure $ make Go into the tests/benchmark directory $ cd tests/benchmark From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb. On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute: $ ./test_urcu 6 2 10 $ ./test_urcu_bp 6 2 10 $ ./test_urcu_signal 6 2 10 $ ./test_urcu_mb 6 2 10 Results: ./test_urcu 6 2 10 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops $ ./test_urcu_bp 6 2 10 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops $ ./test_urcu_signal 6 2 10 0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops 0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops $ ./test_urcu_mb 6 2 10 0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops 0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint: $ ./test_urcu 6 2 10 17612527935 ops 14989247316 ops $ ./test_urcu_bp 6 2 10 1179590602 ops 13230930051 ops We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is explained by the extra impact that it has on the slowpath, and the extra amount of writes it did during my benchmark. The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of this may be down to the 3x less writes it did during my benchmark. Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload. [Regression Potential] This SRU changes the behaviour of the following
[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED
** Description changed: [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system. Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers. Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED. Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package. This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic. [Test] Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes. A test package is available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github: $ pull-lp-source liburcu bionic # OR $ git clone https://github.com/urcu/userspace-rcu.git $ git checkout v0.10.1 # version in bionic Build the code: $ ./bootstrap $ ./configure $ make Go into the tests/benchmark directory $ cd tests/benchmark From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb. On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute: $ ./test_urcu 6 2 10 $ ./test_urcu_bp 6 2 10 $ ./test_urcu_signal 6 2 10 $ ./test_urcu_mb 6 2 10 Results: ./test_urcu 6 2 10 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops $ ./test_urcu_bp 6 2 10 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops $ ./test_urcu_signal 6 2 10 0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops 0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops $ ./test_urcu_mb 6 2 10 0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops 0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint: $ ./test_urcu 6 2 10 17612527935 ops 14989247316 ops $ ./test_urcu_bp 6 2 10 1179590602 ops 13230930051 ops We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is explained by the extra impact that it has on the slowpath, and the extra amount of writes it did during my benchmark. The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of this may be down to the 3x less writes it did during my benchmark. Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload. [Regression Potential] This SRU changes the behaviour of the following libraries which applications link against: -lurcu and -lurcu-bp. Behaviour is not changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb. On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp. This does not change. What is changing is the semantics of that syscall, from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVAT
[Sts-sponsors] [Bug 1876230] [NEW] liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED
You have been subscribed to a public bug by Dan Streetman (ddstreet): [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system. Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers. Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED. Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package. This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic. [Test] Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes. A test package is available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github: $ pull-lp-source liburcu bionic # OR $ git clone https://github.com/urcu/userspace-rcu.git $ git checkout v0.10.1 # version in bionic Build the code: $ ./bootstrap $ ./configure $ make Go into the tests/benchmark directory $ cd tests/benchmark >From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb. On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute: $ ./test_urcu 6 2 10 $ ./test_urcu_bp 6 2 10 $ ./test_urcu_signal 6 2 10 $ ./test_urcu_mb 6 2 10 Results: ./test_urcu 6 2 10 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops $ ./test_urcu_bp 6 2 10 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops $ ./test_urcu_signal 6 2 10 0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops 0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops $ ./test_urcu_mb 6 2 10 0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops 0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint: $ ./test_urcu 6 2 10 17612527935 ops 14989247316 ops $ ./test_urcu_bp 6 2 10 1179590602 ops 13230930051 ops We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is explained by the extra impact that it has on the slowpath, and the extra amount of writes it did during my benchmark. The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of this may be down to the 3x less writes it did during my benchmark. Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload. [Regression Potential] This SRU changes the behaviour of the following libraries which applications link against: -lurcu and -lurcu-bp. Behaviour is not changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb. On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp. This does not change. What is changing is the semantics of that syscall, from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVATE_EXPEDITED. The changed code is all run in kernel space and resides in the kernel. These commits simply change the parameters which are supplied to the membarrier syscall from liburcu. I have run the testsuite tha