Re: Understanding mutual exclusion between rtnl_lock and rcu_read_lock

2017-02-05 Thread Cong Wang
On Fri, Feb 3, 2017 at 7:25 PM, Joel Cunningham  wrote:
> From the documentation in dev.c:
> /*
>  * The @dev_base_head list is protected by @dev_base_lock and the rtnl
>  * semaphore.
>  *
>  * Pure readers hold dev_base_lock for reading, or rcu_read_lock()
>  *
>  * Writers must hold the rtnl semaphore while they loop through the
>  * dev_base_head list, and hold dev_base_lock for writing when they do the
>  * actual updates.  This allows pure readers to access the list even
>  * while a writer is preparing to update it.
>  *
>  * To put it another way, dev_base_lock is held for writing only to
>  * protect against pure readers; the rtnl semaphore provides the
>  * protection against other writers.
>  *
>  * See, for example usages, register_netdevice() and
>  * unregister_netdevice(), which must be called with the rtnl
>  * semaphore held.
>  */

This comment is pretty old, it was added prior to git history.


> Is the correct usage is to hold both rtnl_lock() and dev_base_lock when 
> modifying a member of a struct net_device?  The wording seems vague as to 
> which synchronization issue holding both covers.  What does “do the actual 
> update” mean, updating the list or structure member?  If the latter, then 
> maybe the concurrent dev_ioctl() case has never been safe
>

I think dev_base_lock is supposed to speed up the readers when
we only read one or a few fields from netdevice, otherwise it would
be pretty pointless since we already have the RTNL lock.

Unfortunately, as you noticed, not all of these fields are protected
by dev_base_lock, therefore the readers who only take this read
lock is not enough to read an atomic result.

RCU doesn't seem to be the solution here, since it still requires
a whole copy of netdevice even we only update, for example MTU.
This is very inconvenient.

It is also kinda messy due to the mix of dev_base_lock, RCU,
and RTNL. Sigh...


Re: Understanding mutual exclusion between rtnl_lock and rcu_read_lock

2017-02-03 Thread Joel Cunningham

> On Feb 3, 2017, at 3:40 PM, Cong Wang  wrote:
> 
> On Thu, Feb 2, 2017 at 6:05 PM, Joel Cunningham  
> wrote:
>> 
>> In the case of SIOCSIFHWADDR, we get a pointer to the net_device through 
>> __dev_get_by_name() and then pass it to dev_set_mac_address() to modify 
>> through ndo_set_mac_address().  I didn’t see any uses of RCU APIs on the 
>> writer side and that’s why I figured there was something going on with 
>> rtnl_lock() that I didn’t understand or that the dev_ioctl function wasn’t 
>> re-entrant from another CPU
>> 
> 
> You are right, that RCU read lock could merely protect the netdevice from
> being unregistered concurrently, can't prevent a concurrent dev_ifsioc().
> 
> I don't know why Eric changed it to RCU read lock, it is not a hot path, using
> rtnl lock is fine and can guarantee a atomic read.

Thanks for confirming what I was seeing.  I took a look through the history and 
the change happened in 3710becf8a58a5c6c4e797e3a3c968c161abdb41.  It was 
previously holding the dev_base_lock().

From the documentation in dev.c:
/*
 * The @dev_base_head list is protected by @dev_base_lock and the rtnl
 * semaphore.
 *
 * Pure readers hold dev_base_lock for reading, or rcu_read_lock()
 *
 * Writers must hold the rtnl semaphore while they loop through the
 * dev_base_head list, and hold dev_base_lock for writing when they do the
 * actual updates.  This allows pure readers to access the list even
 * while a writer is preparing to update it.
 *
 * To put it another way, dev_base_lock is held for writing only to
 * protect against pure readers; the rtnl semaphore provides the
 * protection against other writers.
 *
 * See, for example usages, register_netdevice() and
 * unregister_netdevice(), which must be called with the rtnl
 * semaphore held.
 */
Is the correct usage is to hold both rtnl_lock() and dev_base_lock when 
modifying a member of a struct net_device?  The wording seems vague as to which 
synchronization issue holding both covers.  What does “do the actual update” 
mean, updating the list or structure member?  If the latter, then maybe the 
concurrent dev_ioctl() case has never been safe

Joel

Re: Understanding mutual exclusion between rtnl_lock and rcu_read_lock

2017-02-03 Thread Cong Wang
On Thu, Feb 2, 2017 at 6:05 PM, Joel Cunningham  wrote:
>
> In the case of SIOCSIFHWADDR, we get a pointer to the net_device through 
> __dev_get_by_name() and then pass it to dev_set_mac_address() to modify 
> through ndo_set_mac_address().  I didn’t see any uses of RCU APIs on the 
> writer side and that’s why I figured there was something going on with 
> rtnl_lock() that I didn’t understand or that the dev_ioctl function wasn’t 
> re-entrant from another CPU
>

You are right, that RCU read lock could merely protect the netdevice from
being unregistered concurrently, can't prevent a concurrent dev_ifsioc().

I don't know why Eric changed it to RCU read lock, it is not a hot path, using
rtnl lock is fine and can guarantee a atomic read.


Re: Understanding mutual exclusion between rtnl_lock and rcu_read_lock

2017-02-03 Thread Joel Cunningham

> On Feb 2, 2017, at 11:21 PM, Eric Dumazet  wrote:
> 
> On Thu, 2017-02-02 at 15:52 -0800, Alexander Duyck wrote:
>> On Thu, Feb 2, 2017 at 3:47 PM, Joel Cunningham  
>> wrote:
>>> Hi,
>>> 
>>> I’m studying the synchronization used on different parts of struct 
>>> net_device and I’m struggling to understand how structure member 
>>> modifications in dev_ioctl are synchronized.  Getters in 
>>> dev_ifsioc_locked() are only holding rcu_read_lock() while setters in 
>>> dev_ifsioc() are holding rtnl_lock, but not using RCU APIs.  I was 
>>> specifically looking at SIOCGIFHWADDR/SIOCSIFHWADDR.  What’s to prevent one 
>>> CPU from executing a getter and another CPU from executing a setter 
>>> resulting in possibly a torn read/write?  I didn’t see anything in 
>>> rtnl_lock() that would wait for any rcu_reader_lock() critical sections (on 
>>> other CPUs) to finish before acquiring the mutex.
>>> 
>>> Is there something about dev_ioctl that prevents parallel execution? or 
>>> maybe something I still don’t understand about the RCU implementation?
>>> 
>>> Thanks,
>>> 
>>> Joel
>> 
>> My advice would be to spend more time familiarizing yourself with RCU.
>> The advantage of RCU is that it allows for updates while other threads
>> are accessing the data.  The rtnl_lock is just meant to prevent
>> multiple writers from updating the data simultaneously.  So between
>> writers the rtnl_lock is used to keep things synchronized, but between
>> writers and readers the mechanism that is meant to protect the data
>> and keep it sane is RCU.
> 
> Note that sometimes we do not properly handle the case one field can be
> written by a writer holding RTNL (or socket lock or something else)
> 
> We often believe compiler wont do something stupid, but it can
> sometimes.
> 
> We definitely should scrutinize things a bit more, or maybe add __rcu
> like annotations to catch potential problems earlier.

This is my hunch from looking at dev_ioctl().  For some of the other fields, 
there is additional support to detect a write during the read, but not any of 
the ioctls handled in dev_ifsioc_locked().  For example, SIOCGIFNAME, 
dev_ifname() calls netdev_get_name() to copy dev->name, which uses 
devnet_rename_seq seqcount to detect if another thread called dev_change_name() 
and updated the name.

I found more examples of accessing net_device fields in net-sysfs.c and these 
instances are all acquire dev_base_lock/rtnl_lock before reading fields.  Maybe 
dev_ioctl should be implemented this way

> 
> We recently found an issue in drivers/net/macvtap.c and
> drivers/net/tun.c using q->vnet_hdr_sz without proper annotation.
> 
> macvtap patch would be :
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 
> 4026185658381df004a7d641e2be7bcb9a45b509..d11a807565acf371f9bbb4afbfaca1aacd000138
>  100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -681,7 +681,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, 
> struct msghdr *m,
>   size_t linear;
> 
>   if (q->flags & IFF_VNET_HDR) {
> - vnet_hdr_len = q->vnet_hdr_sz;
> + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
> 
>   err = -EINVAL;
>   if (len < vnet_hdr_len)
> @@ -820,7 +820,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
> 
>   if (q->flags & IFF_VNET_HDR) {
>   struct virtio_net_hdr vnet_hdr;
> - vnet_hdr_len = q->vnet_hdr_sz;
> + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
>   if (iov_iter_count(iter) < vnet_hdr_len)
>   return -EINVAL;
> 
> @@ -1090,7 +1090,7 @@ static long macvtap_ioctl(struct file *file, unsigned 
> int cmd,
>   if (s < (int)sizeof(struct virtio_net_hdr))
>   return -EINVAL;
> 
> - q->vnet_hdr_sz = s;
> + WRITE_ONCE(q->vnet_hdr_sz, s);
>   return 0;
> 
>   case TUNGETVNETLE:

Joel



Re: Understanding mutual exclusion between rtnl_lock and rcu_read_lock

2017-02-02 Thread Eric Dumazet
On Thu, 2017-02-02 at 15:52 -0800, Alexander Duyck wrote:
> On Thu, Feb 2, 2017 at 3:47 PM, Joel Cunningham  
> wrote:
> > Hi,
> >
> > I’m studying the synchronization used on different parts of struct 
> > net_device and I’m struggling to understand how structure member 
> > modifications in dev_ioctl are synchronized.  Getters in 
> > dev_ifsioc_locked() are only holding rcu_read_lock() while setters in 
> > dev_ifsioc() are holding rtnl_lock, but not using RCU APIs.  I was 
> > specifically looking at SIOCGIFHWADDR/SIOCSIFHWADDR.  What’s to prevent one 
> > CPU from executing a getter and another CPU from executing a setter 
> > resulting in possibly a torn read/write?  I didn’t see anything in 
> > rtnl_lock() that would wait for any rcu_reader_lock() critical sections (on 
> > other CPUs) to finish before acquiring the mutex.
> >
> > Is there something about dev_ioctl that prevents parallel execution? or 
> > maybe something I still don’t understand about the RCU implementation?
> >
> > Thanks,
> >
> > Joel
> 
> My advice would be to spend more time familiarizing yourself with RCU.
> The advantage of RCU is that it allows for updates while other threads
> are accessing the data.  The rtnl_lock is just meant to prevent
> multiple writers from updating the data simultaneously.  So between
> writers the rtnl_lock is used to keep things synchronized, but between
> writers and readers the mechanism that is meant to protect the data
> and keep it sane is RCU.

Note that sometimes we do not properly handle the case one field can be
written by a writer holding RTNL (or socket lock or something else)

We often believe compiler wont do something stupid, but it can
sometimes.

We definitely should scrutinize things a bit more, or maybe add __rcu
like annotations to catch potential problems earlier.

We recently found an issue in drivers/net/macvtap.c and
drivers/net/tun.c using q->vnet_hdr_sz without proper annotation.

macvtap patch would be :

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 
4026185658381df004a7d641e2be7bcb9a45b509..d11a807565acf371f9bbb4afbfaca1aacd000138
 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -681,7 +681,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, 
struct msghdr *m,
size_t linear;
 
if (q->flags & IFF_VNET_HDR) {
-   vnet_hdr_len = q->vnet_hdr_sz;
+   vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
 
err = -EINVAL;
if (len < vnet_hdr_len)
@@ -820,7 +820,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
 
if (q->flags & IFF_VNET_HDR) {
struct virtio_net_hdr vnet_hdr;
-   vnet_hdr_len = q->vnet_hdr_sz;
+   vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
if (iov_iter_count(iter) < vnet_hdr_len)
return -EINVAL;
 
@@ -1090,7 +1090,7 @@ static long macvtap_ioctl(struct file *file, unsigned int 
cmd,
if (s < (int)sizeof(struct virtio_net_hdr))
return -EINVAL;
 
-   q->vnet_hdr_sz = s;
+   WRITE_ONCE(q->vnet_hdr_sz, s);
return 0;
 
case TUNGETVNETLE:




Re: Understanding mutual exclusion between rtnl_lock and rcu_read_lock

2017-02-02 Thread Joel Cunningham

> On Feb 2, 2017, at 5:52 PM, Alexander Duyck  wrote:
> 
> On Thu, Feb 2, 2017 at 3:47 PM, Joel Cunningham  
> wrote:
>> Hi,
>> 
>> I’m studying the synchronization used on different parts of struct 
>> net_device and I’m struggling to understand how structure member 
>> modifications in dev_ioctl are synchronized.  Getters in dev_ifsioc_locked() 
>> are only holding rcu_read_lock() while setters in dev_ifsioc() are holding 
>> rtnl_lock, but not using RCU APIs.  I was specifically looking at 
>> SIOCGIFHWADDR/SIOCSIFHWADDR.  What’s to prevent one CPU from executing a 
>> getter and another CPU from executing a setter resulting in possibly a torn 
>> read/write?  I didn’t see anything in rtnl_lock() that would wait for any 
>> rcu_reader_lock() critical sections (on other CPUs) to finish before 
>> acquiring the mutex.
>> 
>> Is there something about dev_ioctl that prevents parallel execution? or 
>> maybe something I still don’t understand about the RCU implementation?
>> 
>> Thanks,
>> 
>> Joel
> 
> My advice would be to spend more time familiarizing yourself with RCU.
> The advantage of RCU is that it allows for updates while other threads
> are accessing the data.

Thanks for the follow up!

I have been trying to find more examples of RCU where the writer updates the 
structure in-place without using RCU APIs, but so far haven’t found anything

In the case of SIOCSIFHWADDR, we get a pointer to the net_device through 
__dev_get_by_name() and then pass it to dev_set_mac_address() to modify through 
ndo_set_mac_address().  I didn’t see any uses of RCU APIs on the writer side 
and that’s why I figured there was something going on with rtnl_lock() that I 
didn’t understand or that the dev_ioctl function wasn’t re-entrant from another 
CPU

> The rtnl_lock is just meant to prevent
> multiple writers from updating the data simultaneously.  So between
> writers the rtnl_lock is used to keep things synchronized, but between
> writers and readers the mechanism that is meant to protect the data
> and keep it sane is RCU.

Your description of rtnl_lock make sense.  I’m still confused with how the 
setter side code (in dev_ifsioc) is using RCU since I don’t see any APIs called.

Thanks,

Joel




Re: Understanding mutual exclusion between rtnl_lock and rcu_read_lock

2017-02-02 Thread Alexander Duyck
On Thu, Feb 2, 2017 at 3:47 PM, Joel Cunningham  wrote:
> Hi,
>
> I’m studying the synchronization used on different parts of struct net_device 
> and I’m struggling to understand how structure member modifications in 
> dev_ioctl are synchronized.  Getters in dev_ifsioc_locked() are only holding 
> rcu_read_lock() while setters in dev_ifsioc() are holding rtnl_lock, but not 
> using RCU APIs.  I was specifically looking at SIOCGIFHWADDR/SIOCSIFHWADDR.  
> What’s to prevent one CPU from executing a getter and another CPU from 
> executing a setter resulting in possibly a torn read/write?  I didn’t see 
> anything in rtnl_lock() that would wait for any rcu_reader_lock() critical 
> sections (on other CPUs) to finish before acquiring the mutex.
>
> Is there something about dev_ioctl that prevents parallel execution? or maybe 
> something I still don’t understand about the RCU implementation?
>
> Thanks,
>
> Joel

My advice would be to spend more time familiarizing yourself with RCU.
The advantage of RCU is that it allows for updates while other threads
are accessing the data.  The rtnl_lock is just meant to prevent
multiple writers from updating the data simultaneously.  So between
writers the rtnl_lock is used to keep things synchronized, but between
writers and readers the mechanism that is meant to protect the data
and keep it sane is RCU.

- Alex


Understanding mutual exclusion between rtnl_lock and rcu_read_lock

2017-02-02 Thread Joel Cunningham
Hi,

I’m studying the synchronization used on different parts of struct net_device 
and I’m struggling to understand how structure member modifications in 
dev_ioctl are synchronized.  Getters in dev_ifsioc_locked() are only holding 
rcu_read_lock() while setters in dev_ifsioc() are holding rtnl_lock, but not 
using RCU APIs.  I was specifically looking at SIOCGIFHWADDR/SIOCSIFHWADDR.  
What’s to prevent one CPU from executing a getter and another CPU from 
executing a setter resulting in possibly a torn read/write?  I didn’t see 
anything in rtnl_lock() that would wait for any rcu_reader_lock() critical 
sections (on other CPUs) to finish before acquiring the mutex.

Is there something about dev_ioctl that prevents parallel execution? or maybe 
something I still don’t understand about the RCU implementation?

Thanks,

Joel