Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-02-14 Thread Kimberly Brown
On Thu, Feb 14, 2019 at 08:54:31PM -0500, Sasha Levin wrote:
> On Sat, Feb 02, 2019 at 03:07:35PM -0500, Kimberly Brown wrote:
> > On Fri, Feb 01, 2019 at 06:24:24PM +, Dexuan Cui wrote:
> > > > From: Kimberly Brown 
> > > > Sent: Thursday, January 31, 2019 9:47 AM
> > > > ...
> > > > 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> > > > call in the vmbus_chan_attr_show() function and the sysfs/kernfs 
> > > > functions.
> > > Hi Kim,
> > > Can you please share more details about the deadlock?
> > > It's unclear to me how the deadlock happens.
> > > 
> > 
> > Hi Dexuan,
> > 
> > I encountered the deadlock by:
> > 1) Adding a call to msleep() before acquiring the mutex in
> > vmbus_chan_attr_show()
> > 2) Opening a hv_netvsc subchannel's sysfs file
> > 3) Removing hv_netvsc while the sysfs file is opening
> 
> Dexuan, any objections to pulling this fix in?
> 

Hi Sasha,

This fix can cause a deadlock. I'm working on a different fix for the
original race condition problem.

Thanks,
Kim

> --
> Thanks,
> Sasha
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-02-14 Thread Sasha Levin

On Sat, Feb 02, 2019 at 03:07:35PM -0500, Kimberly Brown wrote:

On Fri, Feb 01, 2019 at 06:24:24PM +, Dexuan Cui wrote:

> From: Kimberly Brown 
> Sent: Thursday, January 31, 2019 9:47 AM
> ...
> 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
Hi Kim,
Can you please share more details about the deadlock?
It's unclear to me how the deadlock happens.



Hi Dexuan,

I encountered the deadlock by:
1) Adding a call to msleep() before acquiring the mutex in
vmbus_chan_attr_show()
2) Opening a hv_netvsc subchannel's sysfs file
3) Removing hv_netvsc while the sysfs file is opening


Dexuan, any objections to pulling this fix in?

--
Thanks,
Sasha
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-02-02 Thread Kimberly Brown
On Fri, Feb 01, 2019 at 06:24:24PM +, Dexuan Cui wrote:
> > From: Kimberly Brown 
> > Sent: Thursday, January 31, 2019 9:47 AM
> > ...
> > 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> > call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
> Hi Kim,
> Can you please share more details about the deadlock? 
> It's unclear to me how the deadlock happens.
>

Hi Dexuan,

I encountered the deadlock by:
1) Adding a call to msleep() before acquiring the mutex in
vmbus_chan_attr_show()
2) Opening a hv_netvsc subchannel's sysfs file
3) Removing hv_netvsc while the sysfs file is opening

Here's the lockdep output:

[  164.167699] hv_vmbus: unregistering driver hv_netvsc

[  164.171660] ==
[  164.171661] WARNING: possible circular locking dependency detected
[  164.171663] 5.0.0-rc1+ #58 Not tainted
[  164.171664] --
[  164.171666] kworker/0:1/12 is trying to acquire lock:
[  164.171668] 664f9893 (kn->count#43){}, at: 
kernfs_remove+0x23/0x40
[  164.171676]
   but task is already holding lock:
[  164.171677] 7b9e8443 (_connection.channel_mutex){+.+.}, at: 
vmbus_onoffer_rescind+0x1ae/0x210 [hv_vmbus]
[  164.171686]
   which lock already depends on the new lock.

[  164.171687]
   the existing dependency chain (in reverse order) is:
[  164.171688]
   -> #1 (_connection.channel_mutex){+.+.}:
[  164.171694]__mutex_lock+0x65/0x9b0
[  164.171696]mutex_lock_nested+0x1b/0x20
[  164.171700]vmbus_chan_attr_show+0x3f/0x90 [hv_vmbus]
[  164.171703]sysfs_kf_seq_show+0xa4/0x130
[  164.171705]kernfs_seq_show+0x2d/0x30
[  164.171708]seq_read+0xe2/0x410
[  164.171711]kernfs_fop_read+0x14e/0x1a0
[  164.171714]__vfs_read+0x3a/0x1a0
[  164.171716]vfs_read+0x91/0x140
[  164.171718]ksys_read+0x58/0xc0
[  164.171721]__x64_sys_read+0x1a/0x20
[  164.171724]do_syscall_64+0x65/0x1b0
[  164.171727]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  164.171728]
   -> #0 (kn->count#43){}:
[  164.171732]lock_acquire+0xa3/0x180
[  164.171734]__kernfs_remove+0x278/0x300
[  164.171737]kernfs_remove+0x23/0x40
[  164.171739]sysfs_remove_dir+0x51/0x60
[  164.171741]kobject_del.part.7+0x13/0x40
[  164.171743]kobject_put+0x6a/0x1a0
[  164.171748]hv_process_channel_removal+0xfe/0x180 [hv_vmbus]
[  164.171752]vmbus_onoffer_rescind+0x20a/0x210 [hv_vmbus]
[  164.171756]vmbus_onmessage+0x5f/0x150 [hv_vmbus]
[  164.171760]vmbus_onmessage_work+0x22/0x30 [hv_vmbus]
[  164.171763]process_one_work+0x291/0x5c0
[  164.171765]worker_thread+0x34/0x400
[  164.171767]kthread+0x124/0x140
[  164.171770]ret_from_fork+0x3a/0x50
[  164.171771]
   other info that might help us debug this:

[  164.171772]  Possible unsafe locking scenario:

[  164.171773]CPU0CPU1
[  164.171775]
[  164.171776]   lock(_connection.channel_mutex);
[  164.171777]lock(kn->count#43);
[  164.171779]
lock(_connection.channel_mutex);
[  164.171781]   lock(kn->count#43);
[  164.171783]
*** DEADLOCK ***

[  164.171785] 3 locks held by kworker/0:1/12:
[  164.171786]  #0: 2128b29f ((wq_completion)"events"){+.+.}, at: 
process_one_work+0x20f/0x5c0
[  164.171790]  #1: 41d2602c ((work_completion)(>work)){+.+.}, at: 
process_one_work+0x20f/0x5c0
[  164.171794]  #2: 7b9e8443 (_connection.channel_mutex){+.+.}, 
at: vmbus_onoffer_rescind+0x1ae/0x210 [hv_vmbus]
[  164.171799]
   stack backtrace:
[  164.171802] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.0.0-rc1+ #58
[  164.171804] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
[  164.171809] Workqueue: events vmbus_onmessage_work [hv_vmbus]
[  164.171811] Call Trace:
[  164.171816]  dump_stack+0x8e/0xd5
[  164.171819]  print_circular_bug.isra.37+0x1e7/0x1f5
[  164.171822]  __lock_acquire+0x1427/0x1490
[  164.171826]  ? sched_clock+0x9/0x10
[  164.171830]  lock_acquire+0xa3/0x180
[  164.171832]  ? lock_acquire+0xa3/0x180
[  164.171835]  ? kernfs_remove+0x23/0x40
[  164.171838]  __kernfs_remove+0x278/0x300
[  164.171840]  ? kernfs_remove+0x23/0x40
[  164.171843]  kernfs_remove+0x23/0x40
[  164.171846]  sysfs_remove_dir+0x51/0x60
[  164.171848]  kobject_del.part.7+0x13/0x40
[  164.171850]  kobject_put+0x6a/0x1a0
[  164.171855]  hv_process_channel_removal+0xfe/0x180 [hv_vmbus]
[  164.171859]  vmbus_onoffer_rescind+0x20a/0x210 [hv_vmbus]
[  164.171863]  vmbus_onmessage+0x5f/0x150 [hv_vmbus]
[  164.171868]  vmbus_onmessage_work+0x22/0x30 [hv_vmbus]
[  164.171870]  

RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-02-01 Thread Dexuan Cui
> From: Kimberly Brown 
> Sent: Thursday, January 31, 2019 9:47 AM
> ...
> 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
Hi Kim,
Can you please share more details about the deadlock? 
It's unclear to me how the deadlock happens.

> I've identified two possible solutions to the deadlock:
> 
> 1) Add a new mutex to the vmbus_channel struct which protects changes to
> "channel->state". Use this new mutex in vmbus_chan_attr_show() instead of
> "vmbus_connection.channel_mutex".
> 
> 2) Use "vmbus_connection.channel_mutex" in vmbus_chan_attr_show() as
> originally proposed, and acquire it with mutex_trylock(). If the mutex
> cannot be acquired, return -EINVAL.
It looks more like a workaround. IMO we should try to find a real fix. :-)
 
> I'm leaning towards (2), using mutex_trylock().
> "vmbus_connection.channel_mutex"
> appears to be used only when a channel is being opened or closed, so
> vmbus_chan_attr_show() should be able to acquire the mutex when the
> channel is in use.
> 
> If anyone has suggestions, please let me know.
> 
> Thanks,
> Kim

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-02-01 Thread Sasha Levin

On Thu, Jan 31, 2019 at 12:47:07PM -0500, Kimberly Brown wrote:

On Thu, Jan 31, 2019 at 04:45:35PM +, Michael Kelley wrote:

From: Sasha Levin  Sent: Thursday, January 31, 2019 7:20 AM
>
> I've queued this one for hyper-fixes, thanks all!
>

Actually, please hold off on queuing this one.  In a conversation I had 
yesterday with Kim, they had identified a deadlock.  Kim was going to be 
looking at some revisions to avoid the deadlock.

Kim -- please confirm.

Michael


This is correct. I need to send a v2 of this patch which addresses two
issues:


Now dropped, thanks for the heads-up.

--
Thanks,
Sasha
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-31 Thread Kimberly Brown
On Thu, Jan 31, 2019 at 04:45:35PM +, Michael Kelley wrote:
> From: Sasha Levin  Sent: Thursday, January 31, 2019 7:20 AM
> > 
> > I've queued this one for hyper-fixes, thanks all!
> > 
> 
> Actually, please hold off on queuing this one.  In a conversation I had 
> yesterday with Kim, they had identified a deadlock.  Kim was going to be 
> looking at some revisions to avoid the deadlock.
> 
> Kim -- please confirm.
> 
> Michael

This is correct. I need to send a v2 of this patch which addresses two
issues:

1) Check channel->state inside the mutex acquire/release in
vmbus_chan_attr_show().

2) Prevent a deadlock that can occur between the proposed mutex_lock()
call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.


I've identified two possible solutions to the deadlock:

1) Add a new mutex to the vmbus_channel struct which protects changes to
"channel->state". Use this new mutex in vmbus_chan_attr_show() instead of
"vmbus_connection.channel_mutex".

2) Use "vmbus_connection.channel_mutex" in vmbus_chan_attr_show() as
originally proposed, and acquire it with mutex_trylock(). If the mutex
cannot be acquired, return -EINVAL.


I'm leaning towards (2), using mutex_trylock(). "vmbus_connection.channel_mutex"
appears to be used only when a channel is being opened or closed, so
vmbus_chan_attr_show() should be able to acquire the mutex when the
channel is in use.

If anyone has suggestions, please let me know.

Thanks,
Kim
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-31 Thread Michael Kelley
From: Sasha Levin  Sent: Thursday, January 31, 2019 7:20 AM
> 
> I've queued this one for hyper-fixes, thanks all!
> 

Actually, please hold off on queuing this one.  In a conversation I had 
yesterday with Kim, they had identified a deadlock.  Kim was going to be 
looking at some revisions to avoid the deadlock.

Kim -- please confirm.

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-31 Thread Sasha Levin

On Tue, Jan 29, 2019 at 07:20:28PM +, Dexuan Cui wrote:

From: Kimberly Brown 
> ...
> But as you pointed, at least for sub-channels, channel->ringbuffer_page
> can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
> the "attribute->show()" could crash when the race happens.
> Adding channel_mutex here seems to be able to fix the race for
> sub-channels, as the same channel_mutex is used in
vmbus_disconnect_ring().
>
> For a primary channel, vmbus_close() -> vmbus_free_ring() can still
> free channel->ringbuffer_page without notifying the "attribute->show()".
> We may also need to acquire/release the channel_mutex in vmbus_close()?
>
> > Actually, there should be checks that "chan" is not null and that
> > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> > need to fix that.
> I suppose "chan" can not be NULL here (see the above).
>
> Checking "chan->state" may not help to completely fix the race, because
> there is no locking/synchronization code in
> vmbus_close_internal() when we test and change "channel->state".
>

The calls to vmbus_close_internal() for the subchannels and the primary
channel are protected with channel_mutex in vmbus_disconnect_ring().
This prevents "channel->state" from changing while "attribute->show()" is
running.

Ah, I think you're right.


> I guess we may need to check if channel->ringbuffer_page is NULL in
> the "attribute->show()".
>

For the primary channel, vmbus_free_ring() is called after the
return from vmbus_disconnect_ring(). Therefore, the primary channel's
state is changed before "channel->ringbuffer_page" is set to NULL.
Checking the channel state should be sufficient to prevent the ring
buffers from being freed while "attribute->show()" is running. The
ring buffers can't be freed until the channel's state is changed, and
the channel state change is protected by the mutex.

I think you're right (I noticed in a previous mail you mentioned you would
improve your patch to check "chan->state" with the mutax held).


I think checking that "channel->ringbuffer_page" is not NULL would
also work, but, as you stated, we would need to aquire/release
channel_mutex in vmbus_close().

Then it looks unnecessary to check "channel->ringbuffer_page".


> PS, to prove that a race condition does exist and can really cause a panic or
> something, I usually add some msleep() delays in different paths so that I
> can reproduce the crash every time I take a special action, e.g. when I read
> the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can 
prove
> a patch can indeed help, at least it can fix the crash which would happen
> without the patch. :-)
>

Thanks! I was able to free the ring buffers while "attribute->show()"
was running, which caused a null pointer dereference bug. As expected,
the mutex lock fixed it.

Awesome!


I've queued this one for hyper-fixes, thanks all!

--
Thanks,
Sasha
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-29 Thread Dexuan Cui
> From: Kimberly Brown 
> > ... 
> > But as you pointed, at least for sub-channels, channel->ringbuffer_page
> > can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
> > the "attribute->show()" could crash when the race happens.
> > Adding channel_mutex here seems to be able to fix the race for
> > sub-channels, as the same channel_mutex is used in
> vmbus_disconnect_ring().
> >
> > For a primary channel, vmbus_close() -> vmbus_free_ring() can still
> > free channel->ringbuffer_page without notifying the "attribute->show()".
> > We may also need to acquire/release the channel_mutex in vmbus_close()?
> >
> > > Actually, there should be checks that "chan" is not null and that
> > > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> > > need to fix that.
> > I suppose "chan" can not be NULL here (see the above).
> >
> > Checking "chan->state" may not help to completely fix the race, because
> > there is no locking/synchronization code in
> > vmbus_close_internal() when we test and change "channel->state".
> >
> 
> The calls to vmbus_close_internal() for the subchannels and the primary
> channel are protected with channel_mutex in vmbus_disconnect_ring().
> This prevents "channel->state" from changing while "attribute->show()" is
> running.
Ah, I think you're right. 
 
> > I guess we may need to check if channel->ringbuffer_page is NULL in
> > the "attribute->show()".
> >
> 
> For the primary channel, vmbus_free_ring() is called after the
> return from vmbus_disconnect_ring(). Therefore, the primary channel's
> state is changed before "channel->ringbuffer_page" is set to NULL.
> Checking the channel state should be sufficient to prevent the ring
> buffers from being freed while "attribute->show()" is running. The
> ring buffers can't be freed until the channel's state is changed, and
> the channel state change is protected by the mutex.
I think you're right (I noticed in a previous mail you mentioned you would
improve your patch to check "chan->state" with the mutax held).

> I think checking that "channel->ringbuffer_page" is not NULL would
> also work, but, as you stated, we would need to aquire/release
> channel_mutex in vmbus_close().
Then it looks unnecessary to check "channel->ringbuffer_page".
 
> > PS, to prove that a race condition does exist and can really cause a panic 
> > or
> > something, I usually add some msleep() delays in different paths so that I
> > can reproduce the crash every time I take a special action, e.g. when I read
> > the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can 
> > prove
> > a patch can indeed help, at least it can fix the crash which would happen
> > without the patch. :-)
> >
> 
> Thanks! I was able to free the ring buffers while "attribute->show()"
> was running, which caused a null pointer dereference bug. As expected,
> the mutex lock fixed it.
Awesome!

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-28 Thread Kimberly Brown
On Tue, Jan 22, 2019 at 06:40:30PM +, Dexuan Cui wrote:
> > From: Kimberly Brown 
> > Sent: Monday, January 21, 2019 10:43 PM
> > > > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > > > kobject *kobj,
> > > > if (chan->state != CHANNEL_OPENED_STATE)
> > > > return -EINVAL;
> > > >
> > > > -   return attribute->show(chan, buf);
> > > > +   mutex_lock(_connection.channel_mutex);
> > > > +   ret = attribute->show(chan, buf);
> > > > +   mutex_unlock(_connection.channel_mutex);
> > > > +   return ret;
> > > >  }
> > >
> > > It looks this patch is already able to fix the original issue:
> > > 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> > > as chan->state can't be CHANNEL_OPENED_STATE when
> > > channel->ringbuffer_page is not set up yet, or has been freed.
> > > -- Dexuan
> > 
> > I think that patch 6712cc9c2211 fixes the problem when the channel is
> > not set up yet, but it doesn't fix the problem when the channel is being
> > closed
> Yeah, now I see your point.
> 
> > The channel could be freed after the check that "chan->state" is
> > CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.
> 
> IMO the channel struct itself can't be freed while the "attribute->show()"
> function is running, because I suppose the sysfs interface should have an 
> extra
> reference to channel->kobj (see vmbus_add_channel_kobj()), so before the sysfs
> files are removed, the channel struct itself can't disappear.
> (I didn't check the related code very carefully, so I could be wrong. :-) )
> 

I think that you're correct that the channel struct can't be freed while
the "attribute->show()" function is running. I tested this by calling
msleep() in chan_attr_show(), opening a subchannel sysfs file, and
unloading hv_netvsc. Unloading hv_netvsc should result in calls to
vmbus_chan_release() for each subchannel. I confirmed that
vmbus_chan_release() isn't called until the "attribute->show()" function
returns.


> But as you pointed, at least for sub-channels, channel->ringbuffer_page
> can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
> the "attribute->show()" could crash when the race happens.
> Adding channel_mutex here seems to be able to fix the race for
> sub-channels, as the same channel_mutex is used in vmbus_disconnect_ring().
>
> For a primary channel, vmbus_close() -> vmbus_free_ring() can still
> free channel->ringbuffer_page without notifying the "attribute->show()".
> We may also need to acquire/release the channel_mutex in vmbus_close()?
>  
> > Actually, there should be checks that "chan" is not null and that
> > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> > need to fix that.
> I suppose "chan" can not be NULL here (see the above).
> 
> Checking "chan->state" may not help to completely fix the race, because
> there is no locking/synchronization code in
> vmbus_close_internal() when we test and change "channel->state".
>

The calls to vmbus_close_internal() for the subchannels and the primary
channel are protected with channel_mutex in vmbus_disconnect_ring().
This prevents "channel->state" from changing while "attribute->show()" is
running.


> I guess we may need to check if channel->ringbuffer_page is NULL in
> the "attribute->show()". 
> 

For the primary channel, vmbus_free_ring() is called after the
return from vmbus_disconnect_ring(). Therefore, the primary channel's
state is changed before "channel->ringbuffer_page" is set to NULL.
Checking the channel state should be sufficient to prevent the ring
buffers from being freed while "attribute->show()" is running. The
ring buffers can't be freed until the channel's state is changed, and
the channel state change is protected by the mutex.

I think checking that "channel->ringbuffer_page" is not NULL would
also work, but, as you stated, we would need to aquire/release
channel_mutex in vmbus_close().


> PS, to prove that a race condition does exist and can really cause a panic or
> something, I usually add some msleep() delays in different paths so that I
> can reproduce the crash every time I take a special action, e.g. when I read
> the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can 
> prove
> a patch can indeed help, at least it can fix the crash which would happen
> without the patch. :-)
> 

Thanks! I was able to free the ring buffers while "attribute->show()"
was running, which caused a null pointer dereference bug. As expected,
the mutex lock fixed it.


> -- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-22 Thread Dexuan Cui
> From: Kimberly Brown 
> Sent: Monday, January 21, 2019 10:43 PM
> > > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > > kobject *kobj,
> > >   if (chan->state != CHANNEL_OPENED_STATE)
> > >   return -EINVAL;
> > >
> > > - return attribute->show(chan, buf);
> > > + mutex_lock(_connection.channel_mutex);
> > > + ret = attribute->show(chan, buf);
> > > + mutex_unlock(_connection.channel_mutex);
> > > + return ret;
> > >  }
> >
> > It looks this patch is already able to fix the original issue:
> > 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> > as chan->state can't be CHANNEL_OPENED_STATE when
> > channel->ringbuffer_page is not set up yet, or has been freed.
> > -- Dexuan
> 
> I think that patch 6712cc9c2211 fixes the problem when the channel is
> not set up yet, but it doesn't fix the problem when the channel is being
> closed
Yeah, now I see your point.

> The channel could be freed after the check that "chan->state" is
> CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.

IMO the channel struct itself can't be freed while the "attribute->show()"
function is running, because I suppose the sysfs interface should have an extra
reference to channel->kobj (see vmbus_add_channel_kobj()), so before the sysfs
files are removed, the channel struct itself can't disappear.
(I didn't check the related code very carefully, so I could be wrong. :-) )

But as you pointed, at least for sub-channels, channel->ringbuffer_page
can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
the "attribute->show()" could crash when the race happens.
Adding channel_mutex here seems to be able to fix the race for
sub-channels, as the same channel_mutex is used in vmbus_disconnect_ring().

For a primary channel, vmbus_close() -> vmbus_free_ring() can still
free channel->ringbuffer_page without notifying the "attribute->show()".
We may also need to acquire/release the channel_mutex in vmbus_close()?
 
> Actually, there should be checks that "chan" is not null and that
> "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> need to fix that.
I suppose "chan" can not be NULL here (see the above).

Checking "chan->state" may not help to completely fix the race, because
there is no locking/synchronization code in
vmbus_close_internal() when we test and change "channel->state".

I guess we may need to check if channel->ringbuffer_page is NULL in
the "attribute->show()". 

PS, to prove that a race condition does exist and can really cause a panic or
something, I usually add some msleep() delays in different paths so that I
can reproduce the crash every time I take a special action, e.g. when I read
the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can 
prove
a patch can indeed help, at least it can fix the crash which would happen
without the patch. :-)

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-21 Thread Kimberly Brown
On Tue, Jan 22, 2019 at 03:46:48AM +, Dexuan Cui wrote:
> > From: Kimberly Brown 
> > Sent: Monday, January 21, 2019 6:08 PM
> > Subject: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show 
> > functions
> > 
> > The channel level "_show" functions are vulnerable to race conditions.
> > Add a mutex lock and unlock around the call to the channel level "_show"
> > functions in vmbus_chan_attr_show().
> > 
> > This problem was discussed here:
> > https://lkml.org/lkml/2018/10/18/830
> > 
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -1414,6 +1414,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject
> > *kobj,
> > = container_of(attr, struct vmbus_chan_attribute, attr);
> > const struct vmbus_channel *chan
> > = container_of(kobj, struct vmbus_channel, kobj);
> > +   ssize_t ret;
> > 
> > if (!attribute->show)
> > return -EIO;
> > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > kobject *kobj,
> > if (chan->state != CHANNEL_OPENED_STATE)
> > return -EINVAL;
> > 
> > -   return attribute->show(chan, buf);
> > +   mutex_lock(_connection.channel_mutex);
> > +   ret = attribute->show(chan, buf);
> > +   mutex_unlock(_connection.channel_mutex);
> > +   return ret;
> >  }
> 
> It looks this patch is already able to fix the original issue:
> 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> as chan->state can't be CHANNEL_OPENED_STATE when
> channel->ringbuffer_page is not set up yet, or has been freed.
> 
> Thanks,
> -- Dexuan

I think that patch 6712cc9c2211 fixes the problem when the channel is
not set up yet, but it doesn't fix the problem when the channel is being
closed. The channel could be freed after the check that "chan->state" is
CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.

Actually, there should be checks that "chan" is not null and that
"chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
need to fix that.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-21 Thread Dexuan Cui
> From: Kimberly Brown 
> Sent: Monday, January 21, 2019 6:08 PM
> Subject: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions
> 
> The channel level "_show" functions are vulnerable to race conditions.
> Add a mutex lock and unlock around the call to the channel level "_show"
> functions in vmbus_chan_attr_show().
> 
> This problem was discussed here:
> https://lkml.org/lkml/2018/10/18/830
> 
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1414,6 +1414,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject
> *kobj,
>   = container_of(attr, struct vmbus_chan_attribute, attr);
>   const struct vmbus_channel *chan
>   = container_of(kobj, struct vmbus_channel, kobj);
> + ssize_t ret;
> 
>   if (!attribute->show)
>   return -EIO;
> @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> kobject *kobj,
>   if (chan->state != CHANNEL_OPENED_STATE)
>   return -EINVAL;
> 
> - return attribute->show(chan, buf);
> + mutex_lock(_connection.channel_mutex);
> + ret = attribute->show(chan, buf);
> + mutex_unlock(_connection.channel_mutex);
> + return ret;
>  }

It looks this patch is already able to fix the original issue:
6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
as chan->state can't be CHANNEL_OPENED_STATE when
channel->ringbuffer_page is not set up yet, or has been freed.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel