答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci

2020-07-22 Thread Weitao Wang(BJ-RD)

On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote:
> On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote:
> > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:
> > > This bug is found in Zhaoxin platform, but it's a commom code bug.
> > > Fail sequence:
> > > step1: Unbind UHCI controller from native driver;
> > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller 
> > > in one
> vfio
> > >group's device list and set UHCI's dev->driver_data to struct
> vfio-pci(for UHCI)
> >
> > Hah, that works?  How do you do that properly?  What code does that?
>
> Yeah, that can't possibly work.  The USB core expects that any host
> controller device (or at least, any PCI host controller device) has its
> driver_data set to point to a struct usb_hcd.  It doesn't expect a host
> controller to be bound to anything other than a host controller driver.
>
> Things could easily go very wrong here.  For example, suppose at this
> point the ehci-hcd driver just happens to bind to the EHCI controller.
> When this happens, the EHCI controller hardware takes over all the USB
> connections that were routed to the UHCI controller.  How will vfio-pci
> deal with that?  Pretty ungracefully, I imagine.
>
> The only way to make this work at all is to unbind both uhci-hcd and
> ehci-hcd first.  Then after both are finished you can safely bind
> vfio-pci to the EHCI controller and the UHCI controllers (in that
> order).
>
I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to
vfio-pci is a more reasonable sequence. Our experiments prove that this
sequence is indeed good as expected.
However, I did not find a formal document to prescribe this order.
Unfortunately, some application software such as virt-manager/qemu assign
UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”.
Do we need to consider compatibility with this application scenario?

The following log is captured when starting then shutdown the
virtual machine.

/* starting virtual machine*/
[ 2785.250001] audit: type=1400 audit(1594375837.191:48): apparmor="STATUS" 
operation="profile_load" profile="unconfined" 
name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2008 
comm="apparmor_parser"
[ 2785.467510] uhci_hcd :00:10.0: remove, state 4
[ 2785.472426] usb usb1: USB disconnect, device number 1
/*bind :00:10.0 to vfio-pci*/
[ 2785.478798] uhci_hcd :00:10.0: USB bus 1 deregistered
[ 2785.568741] uhci_hcd :00:10.1: remove, state 1
[ 2785.573562] usb usb2: USB disconnect, device number 1
[ 2785.578793] usb 2-2: USB disconnect, device number 3
[ 2785.758016] uhci_hcd :00:10.1: USB bus 2 deregistered
/*bind :00:10.1 to vfio-pci*/
[ 2786.037448] ehci-pci :00:10.7: remove, state 4
[ 2786.042460] usb usb3: USB disconnect, device number 1
[ 2786.048700] ehci-pci :00:10.7: USB bus 3 deregistered
/*bind :00:10.7 to vfio-pci*/
[ 2787.518041] audit: type=1400 audit(1594375839.459:49): apparmor="STATUS" 
operation="profile_replace" profile="unconfined" 
name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2034 
comm="apparmor_parser"
[ 2788.290706] audit: type=1400 audit(1594375840.231:50): apparmor="STATUS" 
operation="profile_replace" profile="unconfined" 
name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2037 
comm="apparmor_parser"
[ 2788.960070] audit: type=1400 audit(1594375840.899:51): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" 
pid=2040 comm="apparmor_parser"
[ 2788.968821] virbr0: port 2(vnet0) entered blocking state
[ 2788.988159] virbr0: port 2(vnet0) entered disabled state
[ 2788.993711] device vnet0 entered promiscuous mode
[ 2788.999453] virbr0: port 2(vnet0) entered blocking state
[ 2789.005053] virbr0: port 2(vnet0) entered listening state
[ 2789.098717] systemd-journald[286]: Successfully sent stream file descriptor 
to service manager.
[ 2789.564241] audit: type=1400 audit(1594375841.507:52): apparmor="STATUS" 
operation="profile_replace" profile="unconfined" 
name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2065 
comm="apparmor_parser"
[ 2791.028028] virbr0: port 2(vnet0) entered learning state
[ 2793.047999] virbr0: port 2(vnet0) entered forwarding state
[ 2793.053449] virbr0: topology change detected, propagating
[ 2793.433604] vfio_cap_init: :00:10.7 hiding cap 0xa

/* shutdown virtual machine*/
[ 3838.772058] systemd-journald[286]: Successfully sent stream file descriptor 
to service manager.
[ 3838.815819] systemd-journald[286]: Successfully sent stream file descriptor 
to service manager.
[ 3838.871002] systemd-journald[286]: Successfully sent stream file descriptor 
to service manager.
[ 3838.884606] systemd-journald[286]: Successfully sent stream file descriptor 
to service manager.
[ 3838.894514] systemd-journald[286]: Successfully sent stream file descriptor 
to service manager.
[ 3838.896791] rfkill: 

答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci

2020-07-22 Thread Weitao Wang(BJ-RD)
  
On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote:
> On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:
> > This bug is found in Zhaoxin platform, but it's a commom code bug.
> > Fail sequence:
> > step1: Unbind UHCI controller from native driver;
> > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in 
> > one
> vfio
> >group's device list and set UHCI's dev->driver_data to struct
> vfio-pci(for UHCI)
>
> Hah, that works?  How do you do that properly?  What code does that?

Drivers/vfio/vfio.c
The function vfio_group_create_device will set UHCI dev->driver_data
to vfio-device struct.

> > step3: Unbind EHCI controller from native driver, will try to tell UHCI 
> > native
> driver
> >that "I'm removed by set companion_hcd->self.hs_companion to
> NULL. However,
> >companion_hcd get from UHCI's dev->driver_data that has modified
> by vfio-pci
> >already.So, the vfio-pci structure will be damaged!
>
> Because you are messing around with bind/unbind "by hand", which is
> never guaranteed to work properly.
>
> It's amazing any of that works at all...

If adjust the sequence of UHCI/EHCI unbind form native driver, that
can avoid Null Pointer dereference. Eg. Step3-->stet4-->step1-->step2.
Our experiments prove that this sequence is indeed good as expected.
However, some application software such as virt-manager/qemu assign
/EHCI to guest OS has the same bind/unbind sequence as test “by hand”.

> 4.19.65-2020051917-rainos #1
>
> 4.19 is a bit old, what about 5.7?

5.7.7 has the same issue.

> > +#define PCI_DEV_DRV_FLAG   2
>
> Why is the USB core allowed to touch a private PCI structure field?
>
> I do not think this is allowed.
>
> And why is this a USB host controller-specific issue, shouldn't this
> type of thing be fixed in the PCI core?

When ehci hcd_driver bind or unbind to ehci, it will touch/modify UHCI usb_hcd
that get from UHCI's dev->driver_data. However, UHCI maybe bind to a
another driver(vfio-pci) and it's driver_data will be rewritten. what we
should do is to prevent ehci_hcd to modify UHCI's dev->driver_data when UHCI
has already bound to vfio-pci.

Thanks
weitaowang



保密声明:
本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。
CONFIDENTIAL NOTE:
This email contains confidential or legally privileged information and is for 
the sole use of its intended recipient. Any unauthorized review, use, copying 
or forwarding of this email or the content of this email is strictly prohibited.