Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
On Mon, Apr 23, 2018 at 10:39:50AM +0800, kbuild test robot wrote: > Hi Christian, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on net-next/master] > > url: > https://github.com/0day-ci/linux/commits/Christian-Brauner/netns-uevent-performance-tweaks/20180420-013717 > config: alpha-alldefconfig (attached as .config) > compiler: alpha-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=alpha > > All errors (new ones prefixed by >>): > >kernel/ksysfs.o: In function `uevent_seqnum_show': > >> (.text+0x18c): undefined reference to `get_ns_uevent_seqnum_by_vpid' >(.text+0x19c): undefined reference to `get_ns_uevent_seqnum_by_vpid' Right, this happens when CONFIG_NET=n. I am about to send out the second version of the patch that includes all of the test results in the commit message and also accounts for kernels compiled without CONFIG_NET. Thanks! Christian > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
Hi Christian, Thank you for the patch! Yet something to improve: [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Christian-Brauner/netns-uevent-performance-tweaks/20180420-013717 config: alpha-alldefconfig (attached as .config) compiler: alpha-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=alpha All errors (new ones prefixed by >>): kernel/ksysfs.o: In function `uevent_seqnum_show': >> (.text+0x18c): undefined reference to `get_ns_uevent_seqnum_by_vpid' (.text+0x19c): undefined reference to `get_ns_uevent_seqnum_by_vpid' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
On Fri, Apr 20, 2018 at 06:16:44PM +0200, Christian Brauner wrote: > On Fri, Apr 20, 2018 at 03:56:28PM +0200, Christian Brauner wrote: > > On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote: > > > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote: > > > > Christian Braunerwrites: > > > > > > > > > Now that it's possible to have a different set of uevents in different > > > > > network namespaces, per-network namespace uevent sequence numbers are > > > > > introduced. This increases performance as locking is now restricted > > > > > to the > > > > > network namespace affected by the uevent rather than locking > > > > > everything. > > > > > > > > Numbers please. I personally expect that the netlink mc_list issues > > > > will swamp any benefit you get from this. > > > > > > I wouldn't see how this would be the case. The gist of this is: > > > Everytime you send a uevent into a network namespace *not* owned by > > > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list) > > > effectively blocking the host from processing uevents even though > > > - the uevent you're receiving might be totally different from the > > > uevent that you're sending > > > - the uevent socket of the non-init_user_ns owned network namespace > > > isn't even recorded in the list. > > > > > > The other argument is that we now have properly isolated network > > > namespaces wrt to uevents such that each netns can have its own set of > > > uevents. This can either happen by a sufficiently privileged userspace > > > process sending it uevents that are only dedicated to that specific > > > netns. Or - and this *has been true for a long time* - because network > > > devices are *properly namespaced*. Meaning a uevent for that network > > > device is *tied to a network namespace*. For both cases the uevent > > > sequence numbering will be absolutely misleading. For example, whenever > > > you create e.g. a new veth device in a new network namespace it > > > shouldn't be accounted against the initial network namespace but *only* > > > against the network namespace that has that device added to it. > > > > Eric, I did the testing. Here's what I did: > > > > I compiled two 4.17-rc1 Kernels: > > - one with per netns uevent seqnums with decoupled locking > > - one without per netns uevent seqnums with decoupled locking > > > > # Testcase 1: > > Only Injecting Uevents into network namespaces not owned by the initial user > > namespace. > > - created 1000 new user namespace + network namespace pairs > > - opened a uevent listener in each of those namespace pairs > > - injected uevents into each of those network namespaces 10,000 times > > meaning > > 10,000,000 (10 million) uevents were injected. (The high number of > > uevent injections should get rid of a lot of jitter.) > > - Calculated the mean transaction time. > > - *without* uevent sequence number namespacing: > > 67 μs > > - *with* uevent sequence number namespacing: > > 55 μs > > - makes a difference of 12 μs > > > > # Testcase 2: > > Injecting Uevents into network namespaces not owned by the initial user > > namespace and network namespaces owned by the initial user namespace. > > - created 500 new user namespace + network namespace pairs > > - created 500 new network namespace pairs > > - opened a uevent listener in each of those namespace pairs > > - injected uevents into each of those network namespaces 10,000 times > > meaning > > 10,000,000 (10 million) uevents were injected. (The high number of > > uevent injections should get rid of a lot of jitter.) > > - Calculated the mean transaction time. > > - *without* uevent sequence number namespacing: > > 572 μs > > - *with* uevent sequence number namespacing: > > 514 μs > > - makes a difference of 58 μs > > > > So there's performance gain. The third case would be to create a bunch > > of hanging processes that send SIGSTOP to themselves but do not actually > > open a uevent socket in their respective namespaces and then inject > > uevents into them. I expect there to be an even more performance > > benefits since the rtnl_table_lock() isn't hit in this case because > > there are no listeners. > > I did the third test-case as well so: > - created 500 new user namespace + network namespace pairs *without > uevent listeners* > - created 500 new network namespace pairs *without uevent listeners* > - injected uevents into each of those network namespaces 10,000 times meaning > 10,000,000 (10 million) uevents were injected. (The high number of > uevent injections should get rid of a lot of jitter.) > - Calculated the mean transaction time. > - *without* uevent sequence number namespacing: > 206 μs > - *with* uevent sequence number namespacing: > 163 μs > - makes a difference of 43 μs > > So this test-case shows performance improvement as well. Just for fun, I did a simple statistical anlysis using t-tests and they all
Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
On Fri, Apr 20, 2018 at 03:56:28PM +0200, Christian Brauner wrote: > On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote: > > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote: > > > Christian Braunerwrites: > > > > > > > Now that it's possible to have a different set of uevents in different > > > > network namespaces, per-network namespace uevent sequence numbers are > > > > introduced. This increases performance as locking is now restricted to > > > > the > > > > network namespace affected by the uevent rather than locking > > > > everything. > > > > > > Numbers please. I personally expect that the netlink mc_list issues > > > will swamp any benefit you get from this. > > > > I wouldn't see how this would be the case. The gist of this is: > > Everytime you send a uevent into a network namespace *not* owned by > > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list) > > effectively blocking the host from processing uevents even though > > - the uevent you're receiving might be totally different from the > > uevent that you're sending > > - the uevent socket of the non-init_user_ns owned network namespace > > isn't even recorded in the list. > > > > The other argument is that we now have properly isolated network > > namespaces wrt to uevents such that each netns can have its own set of > > uevents. This can either happen by a sufficiently privileged userspace > > process sending it uevents that are only dedicated to that specific > > netns. Or - and this *has been true for a long time* - because network > > devices are *properly namespaced*. Meaning a uevent for that network > > device is *tied to a network namespace*. For both cases the uevent > > sequence numbering will be absolutely misleading. For example, whenever > > you create e.g. a new veth device in a new network namespace it > > shouldn't be accounted against the initial network namespace but *only* > > against the network namespace that has that device added to it. > > Eric, I did the testing. Here's what I did: > > I compiled two 4.17-rc1 Kernels: > - one with per netns uevent seqnums with decoupled locking > - one without per netns uevent seqnums with decoupled locking > > # Testcase 1: > Only Injecting Uevents into network namespaces not owned by the initial user > namespace. > - created 1000 new user namespace + network namespace pairs > - opened a uevent listener in each of those namespace pairs > - injected uevents into each of those network namespaces 10,000 times meaning > 10,000,000 (10 million) uevents were injected. (The high number of > uevent injections should get rid of a lot of jitter.) > - Calculated the mean transaction time. > - *without* uevent sequence number namespacing: > 67 μs > - *with* uevent sequence number namespacing: > 55 μs > - makes a difference of 12 μs > > # Testcase 2: > Injecting Uevents into network namespaces not owned by the initial user > namespace and network namespaces owned by the initial user namespace. > - created 500 new user namespace + network namespace pairs > - created 500 new network namespace pairs > - opened a uevent listener in each of those namespace pairs > - injected uevents into each of those network namespaces 10,000 times meaning > 10,000,000 (10 million) uevents were injected. (The high number of > uevent injections should get rid of a lot of jitter.) > - Calculated the mean transaction time. > - *without* uevent sequence number namespacing: > 572 μs > - *with* uevent sequence number namespacing: > 514 μs > - makes a difference of 58 μs > > So there's performance gain. The third case would be to create a bunch > of hanging processes that send SIGSTOP to themselves but do not actually > open a uevent socket in their respective namespaces and then inject > uevents into them. I expect there to be an even more performance > benefits since the rtnl_table_lock() isn't hit in this case because > there are no listeners. I did the third test-case as well so: - created 500 new user namespace + network namespace pairs *without uevent listeners* - created 500 new network namespace pairs *without uevent listeners* - injected uevents into each of those network namespaces 10,000 times meaning 10,000,000 (10 million) uevents were injected. (The high number of uevent injections should get rid of a lot of jitter.) - Calculated the mean transaction time. - *without* uevent sequence number namespacing: 206 μs - *with* uevent sequence number namespacing: 163 μs - makes a difference of 43 μs So this test-case shows performance improvement as well. Thanks! Christian
Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote: > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote: > > Christian Braunerwrites: > > > > > Now that it's possible to have a different set of uevents in different > > > network namespaces, per-network namespace uevent sequence numbers are > > > introduced. This increases performance as locking is now restricted to the > > > network namespace affected by the uevent rather than locking > > > everything. > > > > Numbers please. I personally expect that the netlink mc_list issues > > will swamp any benefit you get from this. > > I wouldn't see how this would be the case. The gist of this is: > Everytime you send a uevent into a network namespace *not* owned by > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list) > effectively blocking the host from processing uevents even though > - the uevent you're receiving might be totally different from the > uevent that you're sending > - the uevent socket of the non-init_user_ns owned network namespace > isn't even recorded in the list. > > The other argument is that we now have properly isolated network > namespaces wrt to uevents such that each netns can have its own set of > uevents. This can either happen by a sufficiently privileged userspace > process sending it uevents that are only dedicated to that specific > netns. Or - and this *has been true for a long time* - because network > devices are *properly namespaced*. Meaning a uevent for that network > device is *tied to a network namespace*. For both cases the uevent > sequence numbering will be absolutely misleading. For example, whenever > you create e.g. a new veth device in a new network namespace it > shouldn't be accounted against the initial network namespace but *only* > against the network namespace that has that device added to it. Eric, I did the testing. Here's what I did: I compiled two 4.17-rc1 Kernels: - one with per netns uevent seqnums with decoupled locking - one without per netns uevent seqnums with decoupled locking # Testcase 1: Only Injecting Uevents into network namespaces not owned by the initial user namespace. - created 1000 new user namespace + network namespace pairs - opened a uevent listener in each of those namespace pairs - injected uevents into each of those network namespaces 10,000 times meaning 10,000,000 (10 million) uevents were injected. (The high number of uevent injections should get rid of a lot of jitter.) - Calculated the mean transaction time. - *without* uevent sequence number namespacing: 67 μs - *with* uevent sequence number namespacing: 55 μs - makes a difference of 12 μs # Testcase 2: Injecting Uevents into network namespaces not owned by the initial user namespace and network namespaces owned by the initial user namespace. - created 500 new user namespace + network namespace pairs - created 500 new network namespace pairs - opened a uevent listener in each of those namespace pairs - injected uevents into each of those network namespaces 10,000 times meaning 10,000,000 (10 million) uevents were injected. (The high number of uevent injections should get rid of a lot of jitter.) - Calculated the mean transaction time. - *without* uevent sequence number namespacing: 572 μs - *with* uevent sequence number namespacing: 514 μs - makes a difference of 58 μs So there's performance gain. The third case would be to create a bunch of hanging processes that send SIGSTOP to themselves but do not actually open a uevent socket in their respective namespaces and then inject uevents into them. I expect there to be an even more performance benefits since the rtnl_table_lock() isn't hit in this case because there are no listeners. Christian
Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote: > Christian Braunerwrites: > > > Now that it's possible to have a different set of uevents in different > > network namespaces, per-network namespace uevent sequence numbers are > > introduced. This increases performance as locking is now restricted to the > > network namespace affected by the uevent rather than locking > > everything. > > Numbers please. I personally expect that the netlink mc_list issues > will swamp any benefit you get from this. I wouldn't see how this would be the case. The gist of this is: Everytime you send a uevent into a network namespace *not* owned by init_user_ns you currently *have* to take mutex_lock(uevent_sock_list) effectively blocking the host from processing uevents even though - the uevent you're receiving might be totally different from the uevent that you're sending - the uevent socket of the non-init_user_ns owned network namespace isn't even recorded in the list. The other argument is that we now have properly isolated network namespaces wrt to uevents such that each netns can have its own set of uevents. This can either happen by a sufficiently privileged userspace process sending it uevents that are only dedicated to that specific netns. Or - and this *has been true for a long time* - because network devices are *properly namespaced*. Meaning a uevent for that network device is *tied to a network namespace*. For both cases the uevent sequence numbering will be absolutely misleading. For example, whenever you create e.g. a new veth device in a new network namespace it shouldn't be accounted against the initial network namespace but *only* against the network namespace that has that device added to it. Thanks! Christian
Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
Christian Braunerwrites: > Now that it's possible to have a different set of uevents in different > network namespaces, per-network namespace uevent sequence numbers are > introduced. This increases performance as locking is now restricted to the > network namespace affected by the uevent rather than locking > everything. Numbers please. I personally expect that the netlink mc_list issues will swamp any benefit you get from this. Eric