[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED

2020-05-05 Thread Matthew Ruffell
Answering question 2. I have done a comprehensive performance analysis
based on the benchmark application.

Note: The SRU changes how the sys_membarrier syscall is used. The
implementation that we want to change to in this SRU never blocks, while
the previous implementation does. This makes performance analysis
entirely workload dependant. On busy servers with lots of background
processes, sys_membarrier will block more often, compared to quiet
servers with no background processes.

The following is based on a quiet server with no background processes.

Test parameters
===
Ubuntu 18.04.4
KVM, 2 vcpus
0.10.1 liburcu
4.15.0-99-generic
Test program "test_urcu[_bp]": http://paste.ubuntu.com/p/5vXVycQjYk/
(only difference is #include  or #include )

No changes to source code
=

ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu 6 2 10
nr_reads   6065490002 nr_writes  237 nr_ops   6065490239
nr_reads   6476219475 nr_writes  186 nr_ops   6476219661
nr_reads   6474789528 nr_writes  183 nr_ops   6474789711
nr_reads   6476326433 nr_writes  188 nr_ops   6476326621
nr_reads   6479298142 nr_writes  179 nr_ops   6479298321
nr_reads   6476429569 nr_writes  186 nr_ops   6476429755
nr_reads   6478019994 nr_writes  191 nr_ops   6478020185
nr_reads   6479117595 nr_writes  183 nr_ops   6479117778
nr_reads   6478302181 nr_writes  185 nr_ops   6478302366
nr_reads   6481003399 nr_writes  191 nr_ops   6481003590

ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu_bp 6 2 10
nr_reads644339902 nr_writes  485 nr_ops644340387
nr_reads644092800 nr_writes 1101 nr_ops644093901
nr_reads644676446 nr_writes  494 nr_ops644676940
nr_reads643845915 nr_writes  500 nr_ops643846415
nr_reads645156053 nr_writes  502 nr_ops645156555
nr_reads644626421 nr_writes  497 nr_ops644626918
nr_reads644710679 nr_writes  495 nr_ops644711174
nr_reads65530 nr_writes  503 nr_ops66033
nr_reads645150707 nr_writes  497 nr_ops645151204
nr_reads643681268 nr_writes  496 nr_ops643681764

Commits c0bb9f and 374530 patched in


ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu 6 2 10
nr_reads   4097663510 nr_writes 6516 nr_ops   4097670026
nr_reads   4177088332 nr_writes 4183 nr_ops   4177092515
nr_reads   4153780077 nr_writes 1907 nr_ops   4153781984
nr_reads   4150954044 nr_writes 3942 nr_ops   4150957986
nr_reads   4267855073 nr_writes 2102 nr_ops   4267857175
nr_reads   4131310825 nr_writes 7119 nr_ops   4131317944
nr_reads   4183771431 nr_writes 1919 nr_ops   4183773350
nr_reads   4270944170 nr_writes 4958 nr_ops   4270949128
nr_reads   4123277225 nr_writes 4228 nr_ops   4123281453
nr_reads   4266997284 nr_writes 1723 nr_ops   4266999007


ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu_bp 6 2 10
nr_reads   6530208343 nr_writes 8860 nr_ops   6530217203
nr_reads   6514357222 nr_writes10568 nr_ops   6514367790
nr_reads   6517420660 nr_writes 9534 nr_ops   6517430194
nr_reads   6510005433 nr_writes11799 nr_ops   6510017232
nr_reads   6492226563 nr_writes12517 nr_ops   6492239080
nr_reads   6532405460 nr_writes 6548 nr_ops   6532412008
nr_reads   6514205150 nr_writes 9686 nr_ops   6514214836
nr_reads   6481643486 nr_writes16167 nr_ops   6481659653
nr_reads   6509268022 nr_writes10582 nr_ops   6509278604
nr_reads   6523168701 nr_writes 9066 nr_ops   6523177767


Comparing and contrasting with 20.04:
=

Test Parameters:

Ubuntu 20.04 LTS
KVM, 2 vcpus
0.11.1 liburcu
5.4.0-29-generic

ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu 6 2 10
nr_reads   4270089636 nr_writes 1638 nr_ops   4270091274
nr_reads   4281598850 nr_writes 3008 nr_ops   4281601858
nr_reads   4241230576 nr_writes 3612 nr_ops   4241234188
nr_reads   4230643208 nr_writes 5367 nr_ops   4230648575
nr_reads   4333495124 nr_writes 1354 nr_ops   4333496478
nr_reads   4291295097 nr_writes 3545 nr_ops   4291298642
nr_reads   4232582737 nr_writes 1983 nr_ops   4232584720
nr_reads   4268926719 nr_writes 3363 nr_ops   4268930082
nr_reads   4266736459 nr_writes 4881 nr_ops   4266741340
nr_reads   4313525276 nr_writes 4549 nr_ops   4313529825

ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu_bp 6 2 10
nr_reads   6848011482 nr_writes 3171 nr_ops   6848014653
nr_reads   6842990129 nr_writes 4577 nr_ops   6842994706
nr_reads   6862298832 nr_writes 2875 nr_ops   6862301707
nr_reads   6849848255 nr_writes 4292 nr_ops   6849852547
nr_reads

[Sts-sponsors] [Bug 1871685] Re: [SRU] vagrant spits out ruby deprecation warnings on every call

2020-05-05 Thread Chris Halse Rogers
Hello Hartmut, or anyone else affected,

Accepted vagrant into focal-proposed. The package will build now and be
available at https://launchpad.net/ubuntu/+source/vagrant/2.2.6+dfsg-
2ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
focal to verification-done-focal. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-focal. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: vagrant (Ubuntu Focal)
   Status: Triaged => Fix Committed

** Tags removed: verification-done-focal
** Tags added: verification-needed verification-needed-focal

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1871685

Title:
  [SRU] vagrant spits out ruby deprecation warnings on every call

Status in vagrant package in Ubuntu:
  Fix Released
Status in vagrant source package in Focal:
  Fix Committed

Bug description:
  [Impact]

  This issue is not critical but I believe it degrades a lot the user
  experience in a brand new LTS release. Every time one calls vagrant
  via CLI, Ruby 2.7 throws a bunch of warnings about some deprecated
  features, which for some (inexperienced?) users might seem a failure
  in a first look.

  This was reported not just here as a bug report but also in Discourse:

  https://discourse.ubuntu.com/t/workarounds-for-applications-which-are-
  broken-in-20-04lts/15474/5

  [Test Case]

  In a Focal LXD container:

  $ apt install vagrant
  $ vagrant
  NOTE: Gem::Specification.default_specifications_dir is deprecated; use 
Gem.default_specifications_dir instead. It will be removed on or after 
2020-02-01.
  Gem::Specification.default_specifications_dir called from 
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/bundler.rb:428.
  
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/errors.rb:103:
 warning: Using the last argument as keyword parameters is deprecated; maybe ** 
should be added to the call
  /usr/share/rubygems-integration/all/gems/i18n-1.8.2/lib/i18n.rb:195: warning: 
The called method `t' is defined here
  (eval):3: warning: Using the last argument as keyword parameters is 
deprecated; maybe ** should be added to the call
  /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/ui.rb:223: 
warning: The called method `say' is defined here
  Usage: vagrant [options]  []

  -v, --versionPrint the version and exit.
  -h, --help   Print this help.

  Common commands:
   box manages boxes: installation, removal, etc.
   cloud   manages everything related to Vagrant Cloud
   destroy stops and deletes all traces of the vagrant machine
   global-status   outputs status Vagrant environments for this user
   haltstops the vagrant machine
   helpshows the help for a subcommand
   initinitializes a new Vagrant environment by creating a 
Vagrantfile
   login   
   package packages a running vagrant environment into a box
   plugin  manages plugins: install, uninstall, update, etc.
   portdisplays information about guest port mappings
   powershell  connects to machine via powershell remoting
   provision   provisions the vagrant machine
   pushdeploys code in this environment to a configured 
destination
   rdp connects to machine via RDP
   reload  restarts vagrant machine, loads new Vagrantfile 
configuration
   resume  resume a suspended vagrant machine
   snapshotmanages snapshots: saving, restoring, etc.
   ssh connects to machine via SSH
   ssh-config  outputs OpenSSH valid configuration to connect to the 
machine
   status  outputs status of the vagrant machine
   suspend suspends the machine
   up  starts and provisions the vagrant environment
   upload  upload to machine via communicator
   validatevalidat

[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED

2020-05-05 Thread Matthew Ruffell
To answer question 1, I went and checked every rdepends package:

gdnsd: dynamically links to -lurcu-qsbr
$ ldd /usr/sbin/gdnsd
liburcu-qsbr.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-qsbr.so.6 
(0x7f33bfde5000)

glusterfs: The only package I am not entirely sure about. Only glusterd uses 
urcu:
ubuntu@ubuntu:~/glusterfs-3.13.2$ grep -Rin " header file. */
./config.h.in:203:/* Define to 1 if you have the  header file. */
./xlators/mgmt/glusterd/src/glusterd-rcu.h:14:#include 
./xlators/mgmt/glusterd/src/glusterd-rcu.h:15:#include 
./xlators/mgmt/glusterd/src/glusterd-rcu.h:16:#include 
./xlators/mgmt/glusterd/src/glusterd-rcu.h:17:#include 
./xlators/mgmt/glusterd/src/glusterd-rcu.h:18:#include 
./xlators/mgmt/glusterd/src/glusterd-conn-helper.c:15:#include 
$ ldd /usr/sbin/glusterd | grep urcu

No mention of static linking either:
ubuntu@ubuntu:~/glusterfs-3.13.2$ grep -Rin "\.a" . | grep urcu

The library linker settings are in ./configure.ac:
dnl Check for userspace-rcu
PKG_CHECK_MODULES([URCU], [liburcu-bp], [],
  [AC_CHECK_HEADERS([urcu-bp.h],
 [URCU_LIBS='-lurcu-bp'],
 AC_MSG_ERROR([liburcu-bp not found]))])
PKG_CHECK_MODULES([URCU_CDS], [liburcu-cds >= 0.8], [],
  [PKG_CHECK_MODULES([URCU_CDS], [liburcu-cds >= 0.7],
[AC_DEFINE(URCU_OLD, 1, [Define if liburcu 0.6 or 0.7 is found])],
[AC_CHECK_HEADERS([urcu/cds.h],
  [AC_DEFINE(URCU_OLD, 1, [Define if liburcu 0.6 or 0.7 is found])
   URCU_CDS_LIBS='-lurcu-cds'],
  [AC_MSG_ERROR([liburcu-cds not found])])])])
I ran ldd over all glusterfs binaries which are listed by dpkg -L, but they all 
came back negative for urcu. From what I can tell, glusterd from the xlators 
directory is either not built, or does not link against urcu in Ubuntu.

knot: dynamically links to -lurcu
ubuntu@ubuntu:~$ ldd /usr/sbin/knotd
liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x7fc9d1e8b000)

lttng: dynamically links to various urcu libraries
$ ldd /usr/bin/lttng
liburcu-common.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-common.so.6 
(0x7f6711a2f000)
liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x7f6711827000)
liburcu-cds.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-cds.so.6 
(0x7f671161d000)

multipath-tools: dynamically links to -lurcu
$ ldd /sbin/multipath
liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x7fe63ace1000)

netsniff-ng: dynamically links to -lurcu
~/netsniff-ng-0.6.4$ grep -Rin "urcu"
flowtop/Makefile:1:flowtop-libs =   -lurcu \
ui.h:7:#include 
flowtop.c:28:#include 
flowtop.c:29:#include 
flowtop.c:30:#include 
INSTALL:28: - liburcu:flowtop
$ ldd /usr/sbin/flowtop
liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x7f7561a5a000)

sheepdog: this program doesn't link against urcu at all! It only uses the 
 header file and not anything more.
search for urcu in sheepdog: https://paste.ubuntu.com/p/VPKr4pWtQg/
explanation found in d/changelog: https://paste.ubuntu.com/p/K8XwjK2czD/

ust: dynamically links to various urcu libraries
$ ldd /usr/lib/x86_64-linux-gnu/liblttng-ust-cyg-profile-fast.so.0.0.0
liburcu-bp.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-bp.so.6 
(0x7f91b0c13000)
liburcu-cds.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-cds.so.6 
(0x7f91b01f4000)

>From what I can tell, no rdepends package statically links to liburcu,
and they all use dynamic linking, meaning no packages will need to be
rebuilt for this SRU.

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1876230

Title:
  liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address
  performance problems with MEMBARRIER_CMD_SHARED

Status in liburcu package in Ubuntu:
  Fix Released
Status in liburcu source package in Bionic:
  In Progress

Bug description:
  [Impact]

  In Linux 4.3, a new syscall was defined, called "membarrier". This
  systemcall was defined specifically for use in userspace-rcu (liburcu)
  to speed up the fast path / reader side of the library. The original
  implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED
  subcommand of the membarrier syscall.

  MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from
  all processes running on the system. When it exits, the userspace
  thread which called it is guaranteed that all running threads share
  the same world view in regards to userspace addresses which are
  consumed by readers and writers.

  The problem with MEMBARRIER_CMD_SHARED is system calls made in this
  fashion can block, since it deploys a barrier across all threads in a
  system, and some other threads can be waiting on blocking operations,
  and take time to reach the barrier.

  In Linux 4.14, this was addressed by adding the
  MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It
  only targets threads which share the same mm as the thread calling the
  membarrier syscall, aka, threads in the current process,

[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED

2020-05-05 Thread Dan Streetman
Hi @mruffell,

two questions for this sru:

1) it looks like static libs are built/provided by this package:

$ pull-lp-debs liburcu bionic ; for p in *.deb ; do echo "$p:" ; dpkg-deb -c $p 
| grep -E '*\.a' ; done
Found liburcu 0.10.1-1 in bionic
Using existing file liburcu-dev_0.10.1-1_amd64.deb
Using existing file liburcu6_0.10.1-1_amd64.deb
liburcu6_0.10.1-1_amd64.deb:
liburcu-dev_0.10.1-1_amd64.deb:
-rw-r--r-- root/root 47956 2018-01-23 15:46 
./usr/lib/x86_64-linux-gnu/liburcu-bp.a
-rw-r--r-- root/root 69844 2018-01-23 15:46 
./usr/lib/x86_64-linux-gnu/liburcu-cds.a
-rw-r--r-- root/root 23912 2018-01-23 15:46 
./usr/lib/x86_64-linux-gnu/liburcu-common.a
-rw-r--r-- root/root 43750 2018-01-23 15:46 
./usr/lib/x86_64-linux-gnu/liburcu-mb.a
-rw-r--r-- root/root 45642 2018-01-23 15:46 
./usr/lib/x86_64-linux-gnu/liburcu-qsbr.a
-rw-r--r-- root/root 45716 2018-01-23 15:46 
./usr/lib/x86_64-linux-gnu/liburcu-signal.a
-rw-r--r-- root/root 45148 2018-01-23 15:46 
./usr/lib/x86_64-linux-gnu/liburcu.a

and several other pkgs have build-dep for that:

$ reverse-depends -b -r bionic liburcu-dev
Reverse-Build-Depends
* gdnsd
* glusterfs
* knot
* ltt-control
* multipath-tools
* netsniff-ng
* sheepdog
* ust

Can you check those packages to see if any use static linking (and thus
should be recompiled with the updated static liburcu libs)?


2) In your testing results comparison:

> ./test_urcu 6 2 10
> 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops
> 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops

The number of writes is obviously much, much better; however the number
of reads actually goes down with the patched code.

> $ ./test_urcu_bp 6 2 10
> 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops
> 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops

Similarly, while the number of reads increases significantly, the number
of writes goes down.

I may be misreading the results, but it seems like this change is not an
across-the-board improvement, but more of a performance trade-off.  If
that's the case, I think it will be hard to make the case this should be
included as an SRU.  Can you clarify the results comparison in more
detail please?

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1876230

Title:
  liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address
  performance problems with MEMBARRIER_CMD_SHARED

Status in liburcu package in Ubuntu:
  Fix Released
Status in liburcu source package in Bionic:
  In Progress

Bug description:
  [Impact]

  In Linux 4.3, a new syscall was defined, called "membarrier". This
  systemcall was defined specifically for use in userspace-rcu (liburcu)
  to speed up the fast path / reader side of the library. The original
  implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED
  subcommand of the membarrier syscall.

  MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from
  all processes running on the system. When it exits, the userspace
  thread which called it is guaranteed that all running threads share
  the same world view in regards to userspace addresses which are
  consumed by readers and writers.

  The problem with MEMBARRIER_CMD_SHARED is system calls made in this
  fashion can block, since it deploys a barrier across all threads in a
  system, and some other threads can be waiting on blocking operations,
  and take time to reach the barrier.

  In Linux 4.14, this was addressed by adding the
  MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It
  only targets threads which share the same mm as the thread calling the
  membarrier syscall, aka, threads in the current process, and not all
  threads / processes in the system.

  Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command
  are guaranteed non-blocking, due to using inter-processor interrupts
  to implement memory barriers.

  Because of this, membarrier calls that use
  MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use
  MEMBARRIER_CMD_SHARED.

  Since Bionic uses a 4.15 kernel, all kernel requirements are met, and
  this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in
  the liburcu package.

  This brings the performance of the liburcu library back in line to
  where it was in Trusty, as this particular user has performance
  problems upon upgrading from Trusty to Bionic.

  [Test]

  Testing performance is heavily dependant on the application which
  links against liburcu, and the workload which it executes.

  A test package is available in the following ppa:
  https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test

  For the sake of testing, we can use the benchmarks provided in the
  liburcu source code. Download a copy of the source code for liburcu
  either from the repos or from github:

  $ pull-lp-source liburcu bion

[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED

2020-05-05 Thread Dan Streetman
** Tags added: sts-sponsor-ddstreet

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1876230

Title:
  liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address
  performance problems with MEMBARRIER_CMD_SHARED

Status in liburcu package in Ubuntu:
  Fix Released
Status in liburcu source package in Bionic:
  In Progress

Bug description:
  [Impact]

  In Linux 4.3, a new syscall was defined, called "membarrier". This
  systemcall was defined specifically for use in userspace-rcu (liburcu)
  to speed up the fast path / reader side of the library. The original
  implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED
  subcommand of the membarrier syscall.

  MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from
  all processes running on the system. When it exits, the userspace
  thread which called it is guaranteed that all running threads share
  the same world view in regards to userspace addresses which are
  consumed by readers and writers.

  The problem with MEMBARRIER_CMD_SHARED is system calls made in this
  fashion can block, since it deploys a barrier across all threads in a
  system, and some other threads can be waiting on blocking operations,
  and take time to reach the barrier.

  In Linux 4.14, this was addressed by adding the
  MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It
  only targets threads which share the same mm as the thread calling the
  membarrier syscall, aka, threads in the current process, and not all
  threads / processes in the system.

  Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command
  are guaranteed non-blocking, due to using inter-processor interrupts
  to implement memory barriers.

  Because of this, membarrier calls that use
  MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use
  MEMBARRIER_CMD_SHARED.

  Since Bionic uses a 4.15 kernel, all kernel requirements are met, and
  this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in
  the liburcu package.

  This brings the performance of the liburcu library back in line to
  where it was in Trusty, as this particular user has performance
  problems upon upgrading from Trusty to Bionic.

  [Test]

  Testing performance is heavily dependant on the application which
  links against liburcu, and the workload which it executes.

  A test package is available in the following ppa:
  https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test

  For the sake of testing, we can use the benchmarks provided in the
  liburcu source code. Download a copy of the source code for liburcu
  either from the repos or from github:

  $ pull-lp-source liburcu bionic
  # OR
  $ git clone https://github.com/urcu/userspace-rcu.git
  $ git checkout v0.10.1 # version in bionic

  Build the code:

  $ ./bootstrap
  $ ./configure
  $ make

  Go into the tests/benchmark directory

  $ cd tests/benchmark

  From there, you can run benchmarks for the four main usages of
  liburcu: urcu, urcu-bp, urcu-signal and urcu-mb.

  On a 8 core machine, 6 threads for readers and 2 threads for writers,
  with a 10 second runtime, execute:

  $ ./test_urcu 6 2 10
  $ ./test_urcu_bp 6 2 10
  $ ./test_urcu_signal 6 2 10
  $ ./test_urcu_mb 6 2 10

  Results:

  ./test_urcu 6 2 10
  0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops
  0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops

  $ ./test_urcu_bp 6 2 10
  0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops
  0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops

  $ ./test_urcu_signal 6 2 10
  0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops
  0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops

  $ ./test_urcu_mb 6 2 10
  0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops
  0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops

  The SRU only changes behaviour for urcu and urcu-bp, since they are
  the only "flavours" of liburcu which the patches change. From a pure
  ops standpoint:

  $ ./test_urcu 6 2 10
  17612527935 ops
  14989247316 ops

  $ ./test_urcu_bp 6 2 10
  1179590602 ops
  13230930051 ops

  We see that this particular benchmark workload, test_urcu sees extra
  performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is
  explained by the extra impact that it has on the slowpath, and the
  extra amount of writes it did during my benchmark.

  The real winner in this benchmark workload is test_urcu_bp, which sees
  a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some
  of this may be down to the 3x less writes it did during my benchmark.

  Again, these benchmarks are indicative only are very "random".
  Performance is really dependant on the application which links against
  liburcu and its workload.

  [Regression Potential]

  This SRU changes the behaviour of the following 

[Sts-sponsors] [Bug 1876230] Re: liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED

2020-05-05 Thread Dan Streetman
** Description changed:

  [Impact]
  
  In Linux 4.3, a new syscall was defined, called "membarrier". This
  systemcall was defined specifically for use in userspace-rcu (liburcu)
  to speed up the fast path / reader side of the library. The original
  implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED
  subcommand of the membarrier syscall.
  
  MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all
  processes running on the system. When it exits, the userspace thread
  which called it is guaranteed that all running threads share the same
  world view in regards to userspace addresses which are consumed by
  readers and writers.
  
  The problem with MEMBARRIER_CMD_SHARED is system calls made in this
  fashion can block, since it deploys a barrier across all threads in a
  system, and some other threads can be waiting on blocking operations,
  and take time to reach the barrier.
  
  In Linux 4.14, this was addressed by adding the
  MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It
  only targets threads which share the same mm as the thread calling the
  membarrier syscall, aka, threads in the current process, and not all
  threads / processes in the system.
  
  Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command
  are guaranteed non-blocking, due to using inter-processor interrupts to
  implement memory barriers.
  
  Because of this, membarrier calls that use
  MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use
  MEMBARRIER_CMD_SHARED.
  
  Since Bionic uses a 4.15 kernel, all kernel requirements are met, and
  this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in
  the liburcu package.
  
  This brings the performance of the liburcu library back in line to where
  it was in Trusty, as this particular user has performance problems upon
  upgrading from Trusty to Bionic.
  
  [Test]
  
  Testing performance is heavily dependant on the application which links
  against liburcu, and the workload which it executes.
  
  A test package is available in the following ppa:
  https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test
  
  For the sake of testing, we can use the benchmarks provided in the
  liburcu source code. Download a copy of the source code for liburcu
  either from the repos or from github:
  
  $ pull-lp-source liburcu bionic
  # OR
  $ git clone https://github.com/urcu/userspace-rcu.git
  $ git checkout v0.10.1 # version in bionic
  
  Build the code:
  
  $ ./bootstrap
  $ ./configure
  $ make
  
  Go into the tests/benchmark directory
  
  $ cd tests/benchmark
  
  From there, you can run benchmarks for the four main usages of liburcu:
  urcu, urcu-bp, urcu-signal and urcu-mb.
  
  On a 8 core machine, 6 threads for readers and 2 threads for writers,
  with a 10 second runtime, execute:
  
  $ ./test_urcu 6 2 10
  $ ./test_urcu_bp 6 2 10
  $ ./test_urcu_signal 6 2 10
  $ ./test_urcu_mb 6 2 10
  
  Results:
  
  ./test_urcu 6 2 10
  0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops
  0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops
  
  $ ./test_urcu_bp 6 2 10
  0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops
  0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops
  
  $ ./test_urcu_signal 6 2 10
  0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops
  0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops
  
  $ ./test_urcu_mb 6 2 10
  0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops
  0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops
  
  The SRU only changes behaviour for urcu and urcu-bp, since they are the
  only "flavours" of liburcu which the patches change. From a pure ops
  standpoint:
  
  $ ./test_urcu 6 2 10
  17612527935 ops
  14989247316 ops
  
  $ ./test_urcu_bp 6 2 10
  1179590602 ops
  13230930051 ops
  
  We see that this particular benchmark workload, test_urcu sees extra
  performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is
  explained by the extra impact that it has on the slowpath, and the extra
  amount of writes it did during my benchmark.
  
  The real winner in this benchmark workload is test_urcu_bp, which sees a
  10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of
  this may be down to the 3x less writes it did during my benchmark.
  
  Again, these benchmarks are indicative only are very "random".
  Performance is really dependant on the application which links against
  liburcu and its workload.
  
  [Regression Potential]
  
  This SRU changes the behaviour of the following libraries which
  applications link against: -lurcu and -lurcu-bp. Behaviour is not
  changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb.
  
  On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp.
  This does not change. What is changing is the semantics of that syscall,
  from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVAT

[Sts-sponsors] [Bug 1876230] [NEW] liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED

2020-05-05 Thread Launchpad Bug Tracker
You have been subscribed to a public bug by Dan Streetman (ddstreet):

[Impact]

In Linux 4.3, a new syscall was defined, called "membarrier". This
systemcall was defined specifically for use in userspace-rcu (liburcu)
to speed up the fast path / reader side of the library. The original
implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED
subcommand of the membarrier syscall.

MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all
processes running on the system. When it exits, the userspace thread
which called it is guaranteed that all running threads share the same
world view in regards to userspace addresses which are consumed by
readers and writers.

The problem with MEMBARRIER_CMD_SHARED is system calls made in this
fashion can block, since it deploys a barrier across all threads in a
system, and some other threads can be waiting on blocking operations,
and take time to reach the barrier.

In Linux 4.14, this was addressed by adding the
MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It
only targets threads which share the same mm as the thread calling the
membarrier syscall, aka, threads in the current process, and not all
threads / processes in the system.

Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command
are guaranteed non-blocking, due to using inter-processor interrupts to
implement memory barriers.

Because of this, membarrier calls that use
MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use
MEMBARRIER_CMD_SHARED.

Since Bionic uses a 4.15 kernel, all kernel requirements are met, and
this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in
the liburcu package.

This brings the performance of the liburcu library back in line to where
it was in Trusty, as this particular user has performance problems upon
upgrading from Trusty to Bionic.

[Test]

Testing performance is heavily dependant on the application which links
against liburcu, and the workload which it executes.

A test package is available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test

For the sake of testing, we can use the benchmarks provided in the
liburcu source code. Download a copy of the source code for liburcu
either from the repos or from github:

$ pull-lp-source liburcu bionic
# OR
$ git clone https://github.com/urcu/userspace-rcu.git
$ git checkout v0.10.1 # version in bionic

Build the code:

$ ./bootstrap
$ ./configure
$ make

Go into the tests/benchmark directory

$ cd tests/benchmark

>From there, you can run benchmarks for the four main usages of liburcu:
urcu, urcu-bp, urcu-signal and urcu-mb.

On a 8 core machine, 6 threads for readers and 2 threads for writers,
with a 10 second runtime, execute:

$ ./test_urcu 6 2 10
$ ./test_urcu_bp 6 2 10
$ ./test_urcu_signal 6 2 10
$ ./test_urcu_mb 6 2 10

Results:

./test_urcu 6 2 10
0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops
0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops

$ ./test_urcu_bp 6 2 10
0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops
0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops

$ ./test_urcu_signal 6 2 10
0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops
0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops

$ ./test_urcu_mb 6 2 10
0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops
0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops

The SRU only changes behaviour for urcu and urcu-bp, since they are the
only "flavours" of liburcu which the patches change. From a pure ops
standpoint:

$ ./test_urcu 6 2 10
17612527935 ops
14989247316 ops

$ ./test_urcu_bp 6 2 10
1179590602 ops
13230930051 ops

We see that this particular benchmark workload, test_urcu sees extra
performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is
explained by the extra impact that it has on the slowpath, and the extra
amount of writes it did during my benchmark.

The real winner in this benchmark workload is test_urcu_bp, which sees a
10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of
this may be down to the 3x less writes it did during my benchmark.

Again, these benchmarks are indicative only are very "random".
Performance is really dependant on the application which links against
liburcu and its workload.

[Regression Potential]

This SRU changes the behaviour of the following libraries which
applications link against: -lurcu and -lurcu-bp. Behaviour is not
changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb.

On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp.
This does not change. What is changing is the semantics of that syscall,
from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVATE_EXPEDITED. The
changed code is all run in kernel space and resides in the kernel. These
commits simply change the parameters which are supplied to the
membarrier syscall from liburcu.

I have run the testsuite tha