Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-22 Thread Jakob Bohm

On 20/10/2016 06:21, A223 A223 wrote:

Okay, I've enabled intremap=no_x2apic_optout on the host kernel and
didn't see much of a change in performance.
I feel I should mention that the qemu process on the host consumes
about 8-30% CPU (as shown in top) when the guest (Windows 10) task
manager shows <1.0% cpu utilization. More often than not, it is >15%.
Is this normal?

Maybe try to look at the Interrupt, disk and network statistics
on the first and second tab of the Windows 8/10 Task Manager
and see if there is any correlation with the extra host CPU
use?  I know the details there are a bit meager compared to
what might be available with lower level Windows tools, but
at least it's easy.

Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-20 Thread A223 A223
On Wed, Oct 19, 2016 at 9:21 PM, A223 A223  wrote:
> On Wed, Oct 19, 2016 at 12:35 PM, A223 A223  wrote:
>> On Wednesday, October 19, 2016, A223 A223  wrote:
>>>
>>> On Mon, Oct 17, 2016 at 10:24 PM, Alex Williamson
>>>  wrote:
>>> > I picked up a card based on the same Fresco FL1100 (rev 10) chip, I
>>> > don't think it needs the no-bus-reset quirk above, resets seem to work
>>> > fine for me.  In fact the card works as expected with the previous
>>> > patch I sent, no warning about mmaps.
>>>
>>> Thanks for giving that a shot.
>>> Yeah, I had the same result -- no warnings about mmaps once I applied
>>> your patch. I did encounter those lockups on the host, but it's still
>>> too hard for me to say whether that was related to your patch, because
>>> my sample size was way to small as I mentioned.
>>>
>>> > There's no performance loss due
>>> > to the mapping of the MSI-X table BAR with or without that patch, that
>>> > BAR seems to only be used for MSI-X and nothing changes once setup, at
>>> > least with modern versions of Linux as the guest.  Tracing shows
>>> > absolutely no traps through QEMU once it's setup.  The device does run
>>> > at a high interrupt rate, which I expect is the primary source of
>>> > performance loss when running in a VM.  Hardware solutions like APICv
>>> > and Posted Interrupts should close that gap.
>>>
>>> Also, just to be clear, I'm using a Windows guest, so perhaps the
>>> guest is doing something out-of-line?
>>>
>>> I also have a couple of other questions about this if you don't mind:
>>> Will most USB 3.0 controllers run at a high interrupt rate, or is that
>>> unique to certain controllers such as this one? I'm planning to try an
>>> NEC chip soon as well, so I'm hoping that may be better.
>>> It appears that APICv is only available on Ivy Bridge E* chips, is
>>> that correct? I wasn't able to find which processors support posted
>>> interrupts. If you know offhand, please let me know. I will have to
>>> keep all of this in mind for my next hardware upgrade.
>>>
>>> > What you're seeing above still suggests to me that the device is not
>>> > returning from a bus reset, all reads from the device are returning
>>> > -1.  Since I'm unable to reproduce with my card, it would seem to imply
>>> > that the problem is not an intrinsic issue with the controller chip
>>> > itself.  Are you overclocking in any way?  Thanks,
>>>
>>> No overclocking, here. I'm going to give all of this another try this
>>> week sometime and see if I can find any other clues. I'll let you know
>>> what happens. Thanks, Andrew
>>
>>
>> In doing some more research about apicv and related topics, I came to find
>> this in my host's dmesg:
>>
>>  [0.025739] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out
>> bit.
>> [0.025749] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS
>> setting.
>> [0.025957] x2apic: IRQ remapping doesn't support X2APIC mode
>>
>> I assume then that it's falling back to (slower?) xAPIC. Could this be a
>> cause for my poor performance? Should I try overriding this with the kernel
>> param? I'm running a Haswell chip on a Z97 chipset. I'm under the impression
>> tha x2apic was broken on far older chipsets running nehalem and they got a
>> little overzealous with the blacklisting... So I should probably be fine
>> overriding? Thanks, Andrew
> Okay, I've enabled intremap=no_x2apic_optout on the host kernel and
> didn't see much of a change in performance.
> I feel I should mention that the qemu process on the host consumes
> about 8-30% CPU (as shown in top) when the guest (Windows 10) task
> manager shows <1.0% cpu utilization. More often than not, it is >15%.
> Is this normal?
> Presuming this is interrupt handling, I've tried:
> Enabling hv_apic
> Disabling hv_apic (No idea which is faster [hv_apic on or off] -- I've
> read seemingly contradictory reports online)
> Adding intremap=no_x2apic_optout to my host's kernel options (x2apic
> is now enabled on the host according to my dmesg output).
> CPU usage for the qemu process always hovers in that range on the host.
>
> More details on what I'm trying to do, here:
> I'm trying to see if qemu/kvm + passthrough is viable for low latency
> / high bandwidth USB applications such as cameras, sensors, etc.
> Specifically I've been trying to get an Oculus Rift VR headset and
> sensor working under this configuration. I can certainly see the CPU
> utilization shoot right up to 60-80% on the host qemu process when I
> plug the sensor into the USB port, and I get poor performance from the
> device (they're reported as dropped / truncated USB frames in the
> guest application). I just tried passing through an NEC / Renesas USB
> controller and had the same issue. I've also tried passing through the
> built-in Intel USB controller with similar results. Running
> bare-metal, things work fine.
>
> At this point, unless you or someone else has some bright ideas on how
> I might decrease the load when the int

Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-19 Thread A223 A223
Okay, I've enabled intremap=no_x2apic_optout on the host kernel and
didn't see much of a change in performance.
I feel I should mention that the qemu process on the host consumes
about 8-30% CPU (as shown in top) when the guest (Windows 10) task
manager shows <1.0% cpu utilization. More often than not, it is >15%.
Is this normal?
Presuming this is interrupt handling, I've tried:
Enabling hv_apic
Disabling hv_apic (No idea which is faster [hv_apic on or off] -- I've
read seemingly contradictory reports online)
Adding intremap=no_x2apic_optout to my host's kernel options (x2apic
is now enabled on the host according to my dmesg output).
CPU usage for the qemu process always hovers in that range on the host.

More details on what I'm trying to do, here:
I'm trying to see if qemu/kvm + passthrough is viable for low latency
/ high bandwidth USB applications such as cameras, sensors, etc.
Specifically I've been trying to get an Oculus Rift VR headset and
sensor working under this configuration. I can certainly see the CPU
utilization shoot right up to 60-80% on the host qemu process when I
plug the sensor into the USB port, and I get poor performance from the
device (they're reported as dropped / truncated USB frames in the
guest application). I just tried passing through an NEC / Renesas USB
controller and had the same issue. I've also tried passing through the
built-in Intel USB controller with similar results. Running
bare-metal, things work fine.

At this point, unless you or someone else has some bright ideas on how
I might decrease the load when the interrupt rate is very high, I will
probably throw in the towel on this effort until I can upgrade my
hardware to something that supports APICv or similar.

I'm going to attach my dmesg output in case it is of any use.

Thanks for all your help.


On Wed, Oct 19, 2016 at 12:35 PM, A223 A223  wrote:
> On Wednesday, October 19, 2016, A223 A223  wrote:
>>
>> On Mon, Oct 17, 2016 at 10:24 PM, Alex Williamson
>>  wrote:
>> > I picked up a card based on the same Fresco FL1100 (rev 10) chip, I
>> > don't think it needs the no-bus-reset quirk above, resets seem to work
>> > fine for me.  In fact the card works as expected with the previous
>> > patch I sent, no warning about mmaps.
>>
>> Thanks for giving that a shot.
>> Yeah, I had the same result -- no warnings about mmaps once I applied
>> your patch. I did encounter those lockups on the host, but it's still
>> too hard for me to say whether that was related to your patch, because
>> my sample size was way to small as I mentioned.
>>
>> > There's no performance loss due
>> > to the mapping of the MSI-X table BAR with or without that patch, that
>> > BAR seems to only be used for MSI-X and nothing changes once setup, at
>> > least with modern versions of Linux as the guest.  Tracing shows
>> > absolutely no traps through QEMU once it's setup.  The device does run
>> > at a high interrupt rate, which I expect is the primary source of
>> > performance loss when running in a VM.  Hardware solutions like APICv
>> > and Posted Interrupts should close that gap.
>>
>> Also, just to be clear, I'm using a Windows guest, so perhaps the
>> guest is doing something out-of-line?
>>
>> I also have a couple of other questions about this if you don't mind:
>> Will most USB 3.0 controllers run at a high interrupt rate, or is that
>> unique to certain controllers such as this one? I'm planning to try an
>> NEC chip soon as well, so I'm hoping that may be better.
>> It appears that APICv is only available on Ivy Bridge E* chips, is
>> that correct? I wasn't able to find which processors support posted
>> interrupts. If you know offhand, please let me know. I will have to
>> keep all of this in mind for my next hardware upgrade.
>>
>> > What you're seeing above still suggests to me that the device is not
>> > returning from a bus reset, all reads from the device are returning
>> > -1.  Since I'm unable to reproduce with my card, it would seem to imply
>> > that the problem is not an intrinsic issue with the controller chip
>> > itself.  Are you overclocking in any way?  Thanks,
>>
>> No overclocking, here. I'm going to give all of this another try this
>> week sometime and see if I can find any other clues. I'll let you know
>> what happens. Thanks, Andrew
>
>
> In doing some more research about apicv and related topics, I came to find
> this in my host's dmesg:
>
>  [0.025739] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out
> bit.
> [0.025749] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS
> setting.
> [0.025957] x2apic: IRQ remapping doesn't support X2APIC mode
>
> I assume then that it's falling back to (slower?) xAPIC. Could this be a
> cause for my poor performance? Should I try overriding this with the kernel
> param? I'm running a Haswell chip on a Z97 chipset. I'm under the impression
> tha x2apic was broken on far older chipsets running nehalem and they got a
> little overzealous with the 

Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-19 Thread A223 A223
On Mon, Oct 17, 2016 at 10:24 PM, Alex Williamson
 wrote:
> I picked up a card based on the same Fresco FL1100 (rev 10) chip, I
> don't think it needs the no-bus-reset quirk above, resets seem to work
> fine for me.  In fact the card works as expected with the previous
> patch I sent, no warning about mmaps.

Thanks for giving that a shot.
Yeah, I had the same result -- no warnings about mmaps once I applied
your patch. I did encounter those lockups on the host, but it's still
too hard for me to say whether that was related to your patch, because
my sample size was way to small as I mentioned.

> There's no performance loss due
> to the mapping of the MSI-X table BAR with or without that patch, that
> BAR seems to only be used for MSI-X and nothing changes once setup, at
> least with modern versions of Linux as the guest.  Tracing shows
> absolutely no traps through QEMU once it's setup.  The device does run
> at a high interrupt rate, which I expect is the primary source of
> performance loss when running in a VM.  Hardware solutions like APICv
> and Posted Interrupts should close that gap.

Also, just to be clear, I'm using a Windows guest, so perhaps the
guest is doing something out-of-line?

I also have a couple of other questions about this if you don't mind:
Will most USB 3.0 controllers run at a high interrupt rate, or is that
unique to certain controllers such as this one? I'm planning to try an
NEC chip soon as well, so I'm hoping that may be better.
It appears that APICv is only available on Ivy Bridge E* chips, is
that correct? I wasn't able to find which processors support posted
interrupts. If you know offhand, please let me know. I will have to
keep all of this in mind for my next hardware upgrade.

> What you're seeing above still suggests to me that the device is not
> returning from a bus reset, all reads from the device are returning
> -1.  Since I'm unable to reproduce with my card, it would seem to imply
> that the problem is not an intrinsic issue with the controller chip
> itself.  Are you overclocking in any way?  Thanks,

No overclocking, here. I'm going to give all of this another try this
week sometime and see if I can find any other clues. I'll let you know
what happens. Thanks, Andrew



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-17 Thread Alex Williamson
On Fri, 14 Oct 2016 10:56:44 -0600
Alex Williamson  wrote:

> On Fri, 14 Oct 2016 10:47:42 -0600
> Alex Williamson  wrote:
> 
> > On Mon, 10 Oct 2016 20:05:02 -0700
> > A223 A223  wrote:
> >   
> > > I just tried testing the patch.
> > > 
> > > Initially, it seemed to work fine (log messages below).
> > > However, I am getting hard lockup of the host machine shortly after
> > > the vfio_region_setup related log lines print out to the screen. I
> > > would say that rougly 3 of the 9 VM startups resulted in a hard lock
> > > like this. I wasn't getting these hard locks before applying the patch
> > > that I can remember.
> > > 
> > > I removed the patch and tried to replicate the hard lock and haven't
> > > been able to, though admittedly I was only able to try a few times.
> > > Unfortunately, repeatedly trying to boot the VM to test things starts
> > > to become time consuming, because once the VM has been through a
> > > single startup-shutdown cycle, qemu refuses to start, printing these
> > > errors:
> > > qemu-system-x86_64: -device
> > > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: vfio: Error: Failed to
> > > setup INTx fd: Device or resource busy
> > > qemu-system-x86_64: -device
> > > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Device initialization
> > > failed
> > > 
> > > In the host's kernel log, there are a ton of these
> > > [  774.069113] vfio_ecap_init: :05:00.0 hiding ecap 0x@0xffc
> > > Followed by these kernel log errors:
> > > [  774.070330] genirq: Flags mismatch irq 16. 
> > > (vfio-intx(:05:00.0)) vs. 0080 (ehci_hcd:usb1)
> > > [  774.085595] vfio-pci :05:00.0: Refused to change power state,
> > > currently in D3
> > > [  774.797671] vfio-pci :05:00.0: timed out waiting for pending
> > > transaction; performing function level reset anyway
> > > [  775.945685] vfio-pci :05:00.0: Failed to return from FLR
> > > 
> > > This is not a problem with your patch, but it does complicate my
> > > ability to test the patch repeatedly, since a host restart is needed
> > > between every try. If you have any idea what could be going there,
> > > please do let me know.
> > 
> > Does this kernel patch help?
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 5bb5609..0e48631 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3198,6 +3198,8 @@ static void quirk_no_bus_reset(struct pci_dev *dev)
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, 
> > quirk_no_bus_reset);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, 
> > quirk_no_bus_reset);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, 
> > quirk_no_bus_reset);
> > +/* Fresco Logic FL1100 USB 3.0 Host Controller */
> > +DECLARE_PCI_FIXUP_HEADER(0x1b73, 0x1100, quirk_no_bus_reset);  
> 
> Please also verify with 'lspci -nns 5:' before trying this that
> 1b73:1100 matches your device.

I picked up a card based on the same Fresco FL1100 (rev 10) chip, I
don't think it needs the no-bus-reset quirk above, resets seem to work
fine for me.  In fact the card works as expected with the previous
patch I sent, no warning about mmaps.  There's no performance loss due
to the mapping of the MSI-X table BAR with or without that patch, that
BAR seems to only be used for MSI-X and nothing changes once setup, at
least with modern versions of Linux as the guest.  Tracing shows
absolutely no traps through QEMU once it's setup.  The device does run
at a high interrupt rate, which I expect is the primary source of
performance loss when running in a VM.  Hardware solutions like APICv
and Posted Interrupts should close that gap.

What you're seeing above still suggests to me that the device is not
returning from a bus reset, all reads from the device are returning
-1.  Since I'm unable to reproduce with my card, it would seem to imply
that the problem is not an intrinsic issue with the controller chip
itself.  Are you overclocking in any way?  Thanks,

Alex



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-14 Thread Alex Williamson
On Fri, 14 Oct 2016 10:47:42 -0600
Alex Williamson  wrote:

> On Mon, 10 Oct 2016 20:05:02 -0700
> A223 A223  wrote:
> 
> > I just tried testing the patch.
> > 
> > Initially, it seemed to work fine (log messages below).
> > However, I am getting hard lockup of the host machine shortly after
> > the vfio_region_setup related log lines print out to the screen. I
> > would say that rougly 3 of the 9 VM startups resulted in a hard lock
> > like this. I wasn't getting these hard locks before applying the patch
> > that I can remember.
> > 
> > I removed the patch and tried to replicate the hard lock and haven't
> > been able to, though admittedly I was only able to try a few times.
> > Unfortunately, repeatedly trying to boot the VM to test things starts
> > to become time consuming, because once the VM has been through a
> > single startup-shutdown cycle, qemu refuses to start, printing these
> > errors:
> > qemu-system-x86_64: -device
> > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: vfio: Error: Failed to
> > setup INTx fd: Device or resource busy
> > qemu-system-x86_64: -device
> > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Device initialization
> > failed
> > 
> > In the host's kernel log, there are a ton of these
> > [  774.069113] vfio_ecap_init: :05:00.0 hiding ecap 0x@0xffc
> > Followed by these kernel log errors:
> > [  774.070330] genirq: Flags mismatch irq 16. 
> > (vfio-intx(:05:00.0)) vs. 0080 (ehci_hcd:usb1)
> > [  774.085595] vfio-pci :05:00.0: Refused to change power state,
> > currently in D3
> > [  774.797671] vfio-pci :05:00.0: timed out waiting for pending
> > transaction; performing function level reset anyway
> > [  775.945685] vfio-pci :05:00.0: Failed to return from FLR
> > 
> > This is not a problem with your patch, but it does complicate my
> > ability to test the patch repeatedly, since a host restart is needed
> > between every try. If you have any idea what could be going there,
> > please do let me know.  
> 
> Does this kernel patch help?
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 5bb5609..0e48631 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3198,6 +3198,8 @@ static void quirk_no_bus_reset(struct pci_dev *dev)
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset);
> +/* Fresco Logic FL1100 USB 3.0 Host Controller */
> +DECLARE_PCI_FIXUP_HEADER(0x1b73, 0x1100, quirk_no_bus_reset);

Please also verify with 'lspci -nns 5:' before trying this that
1b73:1100 matches your device.



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-14 Thread Alex Williamson
On Mon, 10 Oct 2016 20:05:02 -0700
A223 A223  wrote:

> I just tried testing the patch.
> 
> Initially, it seemed to work fine (log messages below).
> However, I am getting hard lockup of the host machine shortly after
> the vfio_region_setup related log lines print out to the screen. I
> would say that rougly 3 of the 9 VM startups resulted in a hard lock
> like this. I wasn't getting these hard locks before applying the patch
> that I can remember.
> 
> I removed the patch and tried to replicate the hard lock and haven't
> been able to, though admittedly I was only able to try a few times.
> Unfortunately, repeatedly trying to boot the VM to test things starts
> to become time consuming, because once the VM has been through a
> single startup-shutdown cycle, qemu refuses to start, printing these
> errors:
> qemu-system-x86_64: -device
> vfio-pci,host=05:00.0,bus=root.1,addr=00.5: vfio: Error: Failed to
> setup INTx fd: Device or resource busy
> qemu-system-x86_64: -device
> vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Device initialization
> failed
> 
> In the host's kernel log, there are a ton of these
> [  774.069113] vfio_ecap_init: :05:00.0 hiding ecap 0x@0xffc
> Followed by these kernel log errors:
> [  774.070330] genirq: Flags mismatch irq 16. 
> (vfio-intx(:05:00.0)) vs. 0080 (ehci_hcd:usb1)
> [  774.085595] vfio-pci :05:00.0: Refused to change power state,
> currently in D3
> [  774.797671] vfio-pci :05:00.0: timed out waiting for pending
> transaction; performing function level reset anyway
> [  775.945685] vfio-pci :05:00.0: Failed to return from FLR
> 
> This is not a problem with your patch, but it does complicate my
> ability to test the patch repeatedly, since a host restart is needed
> between every try. If you have any idea what could be going there,
> please do let me know.

Does this kernel patch help?

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 5bb5609..0e48631 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3198,6 +3198,8 @@ static void quirk_no_bus_reset(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset);
+/* Fresco Logic FL1100 USB 3.0 Host Controller */
+DECLARE_PCI_FIXUP_HEADER(0x1b73, 0x1100, quirk_no_bus_reset);
 
 static void quirk_no_pm_reset(struct pci_dev *dev)
 {

Also, when you're in this state where the Fresco USB3 controller
doesn't work, please record 'sudo lspci -vvvs 5:'  Thanks,

Alex



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-10 Thread A223 A223
I just tried testing the patch.

Initially, it seemed to work fine (log messages below).
However, I am getting hard lockup of the host machine shortly after
the vfio_region_setup related log lines print out to the screen. I
would say that rougly 3 of the 9 VM startups resulted in a hard lock
like this. I wasn't getting these hard locks before applying the patch
that I can remember.

I removed the patch and tried to replicate the hard lock and haven't
been able to, though admittedly I was only able to try a few times.
Unfortunately, repeatedly trying to boot the VM to test things starts
to become time consuming, because once the VM has been through a
single startup-shutdown cycle, qemu refuses to start, printing these
errors:
qemu-system-x86_64: -device
vfio-pci,host=05:00.0,bus=root.1,addr=00.5: vfio: Error: Failed to
setup INTx fd: Device or resource busy
qemu-system-x86_64: -device
vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Device initialization
failed

In the host's kernel log, there are a ton of these
[  774.069113] vfio_ecap_init: :05:00.0 hiding ecap 0x@0xffc
Followed by these kernel log errors:
[  774.070330] genirq: Flags mismatch irq 16. 
(vfio-intx(:05:00.0)) vs. 0080 (ehci_hcd:usb1)
[  774.085595] vfio-pci :05:00.0: Refused to change power state,
currently in D3
[  774.797671] vfio-pci :05:00.0: timed out waiting for pending
transaction; performing function level reset anyway
[  775.945685] vfio-pci :05:00.0: Failed to return from FLR

This is not a problem with your patch, but it does complicate my
ability to test the patch repeatedly, since a host restart is needed
between every try. If you have any idea what could be going there,
please do let me know.

Here is the vfio_region log entry output from qemu when started with your patch:
4767@1476148411.866618:vfio_region_setup Device :02:00.0, region 0
":02:00.0 BAR 0", flags: 7, offset: 0, size: 100
4767@1476148411.866641:vfio_region_setup Device :02:00.0, region 1
":02:00.0 BAR 1", flags: 7, offset: 100, size: 1000
4767@1476148411.866643:vfio_region_setup Device :02:00.0, region 2
":02:00.0 BAR 2", flags: 0, offset: 200, size: 0
4767@1476148411.866649:vfio_region_setup Device :02:00.0, region 3
":02:00.0 BAR 3", flags: 7, offset: 300, size: 200
4767@1476148411.866652:vfio_region_setup Device :02:00.0, region 4
":02:00.0 BAR 4", flags: 0, offset: 400, size: 0
4767@1476148411.87:vfio_region_setup Device :02:00.0, region 5
":02:00.0 BAR 5", flags: 3, offset: 500, size: 80
4767@1476148411.882580:vfio_region_setup Device :02:00.1, region 0
":02:00.1 BAR 0", flags: 7, offset: 0, size: 4000
4767@1476148411.882589:vfio_region_setup Device :02:00.1, region 1
":02:00.1 BAR 1", flags: 0, offset: 100, size: 0
4767@1476148411.882591:vfio_region_setup Device :02:00.1, region 2
":02:00.1 BAR 2", flags: 0, offset: 200, size: 0
4767@1476148411.882593:vfio_region_setup Device :02:00.1, region 3
":02:00.1 BAR 3", flags: 0, offset: 300, size: 0
4767@1476148411.882594:vfio_region_setup Device :02:00.1, region 4
":02:00.1 BAR 4", flags: 0, offset: 400, size: 0
4767@1476148411.882596:vfio_region_setup Device :02:00.1, region 5
":02:00.1 BAR 5", flags: 0, offset: 500, size: 0
4767@1476148411.930645:vfio_region_setup Device :05:00.0, region 0
":05:00.0 BAR 0", flags: 7, offset: 0, size: 1
4767@1476148411.930654:vfio_region_setup Device :05:00.0, region 1
":05:00.0 BAR 1", flags: 0, offset: 100, size: 0
4767@1476148411.930670:vfio_region_sparse_mmap_header Device
:05:00.0 region 2: 1 sparse mmap entries
4767@1476148411.930673:vfio_region_sparse_mmap_entry sparse entry 0 [0x0 - 0x0]
4767@1476148411.930675:vfio_region_setup Device :05:00.0, region 2
":05:00.0 BAR 2", flags: f, offset: 200, size: 1000
4767@1476148411.930688:vfio_region_setup Device :05:00.0, region 3
":05:00.0 BAR 3", flags: 0, offset: 300, size: 0
4767@1476148411.930696:vfio_region_setup Device :05:00.0, region 4
":05:00.0 BAR 4", flags: 7, offset: 400, size: 1000
4767@1476148411.930698:vfio_region_setup Device :05:00.0, region 5
":05:00.0 BAR 5", flags: 0, offset: 500, size: 0




On Fri, Oct 7, 2016 at 9:23 AM, Alex Williamson
 wrote:
> On Thu, 6 Oct 2016 23:19:13 -0700
> A223 A223  wrote:
>
>> On Thu, Oct 6, 2016 at 9:29 PM, Alex Williamson
>>  wrote:
>> > On Thu, 6 Oct 2016 21:07:06 -0700
>> > A223 A223  wrote:
>> >
>> >> On Thu, Oct 6, 2016 at 8:30 PM, A223 A223  wrote:
>> >> > On Thu, Oct 6, 2016 at 9:46 AM, Alex Williamson
>> >> >  wrote:
>> >> >> On Wed, 5 Oct 2016 20:30:34 -0700
>> >> >> A223 A223  wrote:
>> >> >>
>> >> >>> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
>> >> >>>  wrote:
>> >> >>> > On Wed, 5 Oct 2016 02:13:21 -0700
>> >> >>> > A223 A223  wrote:
>> >> >>> >> H

Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-07 Thread Alex Williamson
On Thu, 6 Oct 2016 23:19:13 -0700
A223 A223  wrote:

> On Thu, Oct 6, 2016 at 9:29 PM, Alex Williamson
>  wrote:
> > On Thu, 6 Oct 2016 21:07:06 -0700
> > A223 A223  wrote:
> >  
> >> On Thu, Oct 6, 2016 at 8:30 PM, A223 A223  wrote:  
> >> > On Thu, Oct 6, 2016 at 9:46 AM, Alex Williamson
> >> >  wrote:  
> >> >> On Wed, 5 Oct 2016 20:30:34 -0700
> >> >> A223 A223  wrote:
> >> >>  
> >> >>> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
> >> >>>  wrote:  
> >> >>> > On Wed, 5 Oct 2016 02:13:21 -0700
> >> >>> > A223 A223  wrote:  
> >> >>> >> How can I go about tracking down the root cause of this error?  
> >> >>> >
> >> >>> > This often means that the device resources are in use by another,
> >> >>> > non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
> >> >>> > drivers are claiming sub-regions on the device.  You can also turn on
> >> >>> > tracing and look for the event vfio_region_mmap_fault, which might 
> >> >>> > give
> >> >>> > us more information (see docs/tracing.txt).  Also, please provide 
> >> >>> > 'sudo
> >> >>> > lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,  
> >> >>>
> >> >>> The simpletrace output wouldn't parse for me using the python script
> >> >>> (I'll paste the trace at the end of this email), so I used the ftrace
> >> >>> backend instead (
> >> >>> https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
> >> >>> quite large and I'm not sure what I'm looking for, so I didn't look at
> >> >>> it very carefully.  
> >> >>
> >> >> That's not the right trace, that's like kvm trace or something.  Try:
> >> >>
> >> >> # ./configure --enable-trace-backends=log
> >> >>
> >> >> # echo vfio_region_mmap_fault > events.txt
> >> >>
> >> >> Then add to your qemu commandline "-trace events=events.txt".  You
> >> >> should get maybe a couple extra log lines per boot, not gobs of output. 
> >> >>  
> >> >
> >> > Ah, got it. Here are the relevant log lines then:
> >> >
> >> > 10203@1475810801.243686:vfio_region_mmap_fault Region :05:00.0 BAR
> >> > 2 mmaps[0], [200 - 1ff], fault: -22
> >> > qemu-system-x86_64: -device
> >> > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
> >> > :05:00.0 BAR 2. Performance may be slow  
> >>
> >> I took a look at everything again and noted this in the lspci output
> >> for the Fresco Logic USB controller:
> >> Vector table: BAR=2 offset=
> >> PBA: BAR=4 offset=
> >>
> >> Is it normal for both the Vector table and PBA to have a 0 offset?  
> >
> > Note that they're in different BARs, so it's a little unexpected but
> > it's not wrong.  The MSI-X table does appear to be the problem though.
> > Based on that trace line, the region size we're trying to mmap is
> > zero.  We're probably hitting a corner case of a page sized BAR
> > containing the MSI-X table, where we cannot mmap the page covering the
> > table.  We need to figure out where that's coming from.  Please add the
> > additional trace lines to your events.txt and try again:
> >
> > vfio_region_sparse_mmap_header
> > vfio_region_sparse_mmap_entry
> > vfio_region_setup
> > vfio_msix_fixup
> >
> > The bad news is that due to the MSI-X table, we're not going to be able
> > to mmap that BAR, the good news is that it probably doesn't actually
> > hurt performance.  With the table and PBA in separate BARs, my guess
> > would be that's the only thing in those BARs.  Thanks,  
> 
> Ah, overall bad news for me, as I now have absolutely no clue what's
> causing my USB performance/reliability issues in the VM :(
> 
> Nonetheless, here is the new log output:
> 
> 6137@1475820936.795846:vfio_region_setup Device :02:00.0, region 0
> ":02:00.0 BAR 0", flags: 7, offset: 0, size: 100
> 6137@1475820936.795871:vfio_region_setup Device :02:00.0, region 1
> ":02:00.0 BAR 1", flags: 7, offset: 100, size: 1000
> 6137@1475820936.795873:vfio_region_setup Device :02:00.0, region 2
> ":02:00.0 BAR 2", flags: 0, offset: 200, size: 0
> 6137@1475820936.795880:vfio_region_setup Device :02:00.0, region 3
> ":02:00.0 BAR 3", flags: 7, offset: 300, size: 200
> 6137@1475820936.795882:vfio_region_setup Device :02:00.0, region 4
> ":02:00.0 BAR 4", flags: 0, offset: 400, size: 0
> 6137@1475820936.795888:vfio_region_setup Device :02:00.0, region 5
> ":02:00.0 BAR 5", flags: 3, offset: 500, size: 80
> 6137@1475820936.811872:vfio_region_setup Device :02:00.1, region 0
> ":02:00.1 BAR 0", flags: 7, offset: 0, size: 4000
> 6137@1475820936.811883:vfio_region_setup Device :02:00.1, region 1
> ":02:00.1 BAR 1", flags: 0, offset: 100, size: 0
> 6137@1475820936.811885:vfio_region_setup Device :02:00.1, region 2
> ":02:00.1 BAR 2", flags: 0, offset: 200, size: 0
> 6137@1475820936.811887:vfio_region_setup Device :02:00.1, region 3
> ":02:00.1 BAR 3", flags: 0, offset: 300, size: 0
> 6137@1475820936.81188

Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-06 Thread A223 A223
On Thu, Oct 6, 2016 at 9:29 PM, Alex Williamson
 wrote:
> On Thu, 6 Oct 2016 21:07:06 -0700
> A223 A223  wrote:
>
>> On Thu, Oct 6, 2016 at 8:30 PM, A223 A223  wrote:
>> > On Thu, Oct 6, 2016 at 9:46 AM, Alex Williamson
>> >  wrote:
>> >> On Wed, 5 Oct 2016 20:30:34 -0700
>> >> A223 A223  wrote:
>> >>
>> >>> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
>> >>>  wrote:
>> >>> > On Wed, 5 Oct 2016 02:13:21 -0700
>> >>> > A223 A223  wrote:
>> >>> >> How can I go about tracking down the root cause of this error?
>> >>> >
>> >>> > This often means that the device resources are in use by another,
>> >>> > non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
>> >>> > drivers are claiming sub-regions on the device.  You can also turn on
>> >>> > tracing and look for the event vfio_region_mmap_fault, which might give
>> >>> > us more information (see docs/tracing.txt).  Also, please provide 'sudo
>> >>> > lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,
>> >>>
>> >>> The simpletrace output wouldn't parse for me using the python script
>> >>> (I'll paste the trace at the end of this email), so I used the ftrace
>> >>> backend instead (
>> >>> https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
>> >>> quite large and I'm not sure what I'm looking for, so I didn't look at
>> >>> it very carefully.
>> >>
>> >> That's not the right trace, that's like kvm trace or something.  Try:
>> >>
>> >> # ./configure --enable-trace-backends=log
>> >>
>> >> # echo vfio_region_mmap_fault > events.txt
>> >>
>> >> Then add to your qemu commandline "-trace events=events.txt".  You
>> >> should get maybe a couple extra log lines per boot, not gobs of output.
>> >
>> > Ah, got it. Here are the relevant log lines then:
>> >
>> > 10203@1475810801.243686:vfio_region_mmap_fault Region :05:00.0 BAR
>> > 2 mmaps[0], [200 - 1ff], fault: -22
>> > qemu-system-x86_64: -device
>> > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
>> > :05:00.0 BAR 2. Performance may be slow
>>
>> I took a look at everything again and noted this in the lspci output
>> for the Fresco Logic USB controller:
>> Vector table: BAR=2 offset=
>> PBA: BAR=4 offset=
>>
>> Is it normal for both the Vector table and PBA to have a 0 offset?
>
> Note that they're in different BARs, so it's a little unexpected but
> it's not wrong.  The MSI-X table does appear to be the problem though.
> Based on that trace line, the region size we're trying to mmap is
> zero.  We're probably hitting a corner case of a page sized BAR
> containing the MSI-X table, where we cannot mmap the page covering the
> table.  We need to figure out where that's coming from.  Please add the
> additional trace lines to your events.txt and try again:
>
> vfio_region_sparse_mmap_header
> vfio_region_sparse_mmap_entry
> vfio_region_setup
> vfio_msix_fixup
>
> The bad news is that due to the MSI-X table, we're not going to be able
> to mmap that BAR, the good news is that it probably doesn't actually
> hurt performance.  With the table and PBA in separate BARs, my guess
> would be that's the only thing in those BARs.  Thanks,

Ah, overall bad news for me, as I now have absolutely no clue what's
causing my USB performance/reliability issues in the VM :(

Nonetheless, here is the new log output:

6137@1475820936.795846:vfio_region_setup Device :02:00.0, region 0
":02:00.0 BAR 0", flags: 7, offset: 0, size: 100
6137@1475820936.795871:vfio_region_setup Device :02:00.0, region 1
":02:00.0 BAR 1", flags: 7, offset: 100, size: 1000
6137@1475820936.795873:vfio_region_setup Device :02:00.0, region 2
":02:00.0 BAR 2", flags: 0, offset: 200, size: 0
6137@1475820936.795880:vfio_region_setup Device :02:00.0, region 3
":02:00.0 BAR 3", flags: 7, offset: 300, size: 200
6137@1475820936.795882:vfio_region_setup Device :02:00.0, region 4
":02:00.0 BAR 4", flags: 0, offset: 400, size: 0
6137@1475820936.795888:vfio_region_setup Device :02:00.0, region 5
":02:00.0 BAR 5", flags: 3, offset: 500, size: 80
6137@1475820936.811872:vfio_region_setup Device :02:00.1, region 0
":02:00.1 BAR 0", flags: 7, offset: 0, size: 4000
6137@1475820936.811883:vfio_region_setup Device :02:00.1, region 1
":02:00.1 BAR 1", flags: 0, offset: 100, size: 0
6137@1475820936.811885:vfio_region_setup Device :02:00.1, region 2
":02:00.1 BAR 2", flags: 0, offset: 200, size: 0
6137@1475820936.811887:vfio_region_setup Device :02:00.1, region 3
":02:00.1 BAR 3", flags: 0, offset: 300, size: 0
6137@1475820936.811888:vfio_region_setup Device :02:00.1, region 4
":02:00.1 BAR 4", flags: 0, offset: 400, size: 0
6137@1475820936.811890:vfio_region_setup Device :02:00.1, region 5
":02:00.1 BAR 5", flags: 0, offset: 500, size: 0
6137@1475820936.859957:vfio_region_setup Device :05:00.0, regio

Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-06 Thread Alex Williamson
On Thu, 6 Oct 2016 21:07:06 -0700
A223 A223  wrote:

> On Thu, Oct 6, 2016 at 8:30 PM, A223 A223  wrote:
> > On Thu, Oct 6, 2016 at 9:46 AM, Alex Williamson
> >  wrote:  
> >> On Wed, 5 Oct 2016 20:30:34 -0700
> >> A223 A223  wrote:
> >>  
> >>> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
> >>>  wrote:  
> >>> > On Wed, 5 Oct 2016 02:13:21 -0700
> >>> > A223 A223  wrote:  
> >>> >> How can I go about tracking down the root cause of this error?  
> >>> >
> >>> > This often means that the device resources are in use by another,
> >>> > non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
> >>> > drivers are claiming sub-regions on the device.  You can also turn on
> >>> > tracing and look for the event vfio_region_mmap_fault, which might give
> >>> > us more information (see docs/tracing.txt).  Also, please provide 'sudo
> >>> > lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,  
> >>>
> >>> The simpletrace output wouldn't parse for me using the python script
> >>> (I'll paste the trace at the end of this email), so I used the ftrace
> >>> backend instead (
> >>> https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
> >>> quite large and I'm not sure what I'm looking for, so I didn't look at
> >>> it very carefully.  
> >>
> >> That's not the right trace, that's like kvm trace or something.  Try:
> >>
> >> # ./configure --enable-trace-backends=log
> >>
> >> # echo vfio_region_mmap_fault > events.txt
> >>
> >> Then add to your qemu commandline "-trace events=events.txt".  You
> >> should get maybe a couple extra log lines per boot, not gobs of output.  
> >
> > Ah, got it. Here are the relevant log lines then:
> >
> > 10203@1475810801.243686:vfio_region_mmap_fault Region :05:00.0 BAR
> > 2 mmaps[0], [200 - 1ff], fault: -22
> > qemu-system-x86_64: -device
> > vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
> > :05:00.0 BAR 2. Performance may be slow  
> 
> I took a look at everything again and noted this in the lspci output
> for the Fresco Logic USB controller:
> Vector table: BAR=2 offset=
> PBA: BAR=4 offset=
> 
> Is it normal for both the Vector table and PBA to have a 0 offset?

Note that they're in different BARs, so it's a little unexpected but
it's not wrong.  The MSI-X table does appear to be the problem though.
Based on that trace line, the region size we're trying to mmap is
zero.  We're probably hitting a corner case of a page sized BAR
containing the MSI-X table, where we cannot mmap the page covering the
table.  We need to figure out where that's coming from.  Please add the
additional trace lines to your events.txt and try again:

vfio_region_sparse_mmap_header
vfio_region_sparse_mmap_entry
vfio_region_setup
vfio_msix_fixup

The bad news is that due to the MSI-X table, we're not going to be able
to mmap that BAR, the good news is that it probably doesn't actually
hurt performance.  With the table and PBA in separate BARs, my guess
would be that's the only thing in those BARs.  Thanks,

Alex



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-06 Thread A223 A223
On Thu, Oct 6, 2016 at 8:30 PM, A223 A223  wrote:
> On Thu, Oct 6, 2016 at 9:46 AM, Alex Williamson
>  wrote:
>> On Wed, 5 Oct 2016 20:30:34 -0700
>> A223 A223  wrote:
>>
>>> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
>>>  wrote:
>>> > On Wed, 5 Oct 2016 02:13:21 -0700
>>> > A223 A223  wrote:
>>> >> How can I go about tracking down the root cause of this error?
>>> >
>>> > This often means that the device resources are in use by another,
>>> > non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
>>> > drivers are claiming sub-regions on the device.  You can also turn on
>>> > tracing and look for the event vfio_region_mmap_fault, which might give
>>> > us more information (see docs/tracing.txt).  Also, please provide 'sudo
>>> > lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,
>>>
>>> The simpletrace output wouldn't parse for me using the python script
>>> (I'll paste the trace at the end of this email), so I used the ftrace
>>> backend instead (
>>> https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
>>> quite large and I'm not sure what I'm looking for, so I didn't look at
>>> it very carefully.
>>
>> That's not the right trace, that's like kvm trace or something.  Try:
>>
>> # ./configure --enable-trace-backends=log
>>
>> # echo vfio_region_mmap_fault > events.txt
>>
>> Then add to your qemu commandline "-trace events=events.txt".  You
>> should get maybe a couple extra log lines per boot, not gobs of output.
>
> Ah, got it. Here are the relevant log lines then:
>
> 10203@1475810801.243686:vfio_region_mmap_fault Region :05:00.0 BAR
> 2 mmaps[0], [200 - 1ff], fault: -22
> qemu-system-x86_64: -device
> vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
> :05:00.0 BAR 2. Performance may be slow

I took a look at everything again and noted this in the lspci output
for the Fresco Logic USB controller:
Vector table: BAR=2 offset=
PBA: BAR=4 offset=

Is it normal for both the Vector table and PBA to have a 0 offset?

Thanks,
Andrew



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-06 Thread A223 A223
On Thu, Oct 6, 2016 at 9:46 AM, Alex Williamson
 wrote:
> On Wed, 5 Oct 2016 20:30:34 -0700
> A223 A223  wrote:
>
>> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
>>  wrote:
>> > On Wed, 5 Oct 2016 02:13:21 -0700
>> > A223 A223  wrote:
>> >> How can I go about tracking down the root cause of this error?
>> >
>> > This often means that the device resources are in use by another,
>> > non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
>> > drivers are claiming sub-regions on the device.  You can also turn on
>> > tracing and look for the event vfio_region_mmap_fault, which might give
>> > us more information (see docs/tracing.txt).  Also, please provide 'sudo
>> > lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,
>>
>> The simpletrace output wouldn't parse for me using the python script
>> (I'll paste the trace at the end of this email), so I used the ftrace
>> backend instead (
>> https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
>> quite large and I'm not sure what I'm looking for, so I didn't look at
>> it very carefully.
>
> That's not the right trace, that's like kvm trace or something.  Try:
>
> # ./configure --enable-trace-backends=log
>
> # echo vfio_region_mmap_fault > events.txt
>
> Then add to your qemu commandline "-trace events=events.txt".  You
> should get maybe a couple extra log lines per boot, not gobs of output.

Ah, got it. Here are the relevant log lines then:

10203@1475810801.243686:vfio_region_mmap_fault Region :05:00.0 BAR
2 mmaps[0], [200 - 1ff], fault: -22
qemu-system-x86_64: -device
vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
:05:00.0 BAR 2. Performance may be slow

Thanks,

Andrew



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-06 Thread Alex Williamson
On Wed, 5 Oct 2016 20:30:34 -0700
A223 A223  wrote:

> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
>  wrote:
> > On Wed, 5 Oct 2016 02:13:21 -0700
> > A223 A223  wrote:  
> >> How can I go about tracking down the root cause of this error?  
> >
> > This often means that the device resources are in use by another,
> > non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
> > drivers are claiming sub-regions on the device.  You can also turn on
> > tracing and look for the event vfio_region_mmap_fault, which might give
> > us more information (see docs/tracing.txt).  Also, please provide 'sudo
> > lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,  
> 
> The simpletrace output wouldn't parse for me using the python script
> (I'll paste the trace at the end of this email), so I used the ftrace
> backend instead (
> https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
> quite large and I'm not sure what I'm looking for, so I didn't look at
> it very carefully.

That's not the right trace, that's like kvm trace or something.  Try:

# ./configure --enable-trace-backends=log

# echo vfio_region_mmap_fault > events.txt

Then add to your qemu commandline "-trace events=events.txt".  You
should get maybe a couple extra log lines per boot, not gobs of output.
 
> I did examine the output from the commands/files you mentioned
> (attached). The only thing I noticed is that 5:00.0 is sharing an IRQ
> with the onboard EHCI controller on the host, but I'm not sure if
> that's an issue. It's also quite possible I missed some other obvious
> problem. Please let me know if you have any insights.

I'm not spotting anything either.  Regarding your comment about the
upstream patch to do sub-page mappings, that's not relevant to your
device.  Page size is 4k and the BARs for your device are all at least
4k.  I do notice you have nvidia loaded on the host, I don't know why
it would be squatting on resources for your USB card, but I can't rule
it out.  Thanks,

Alex



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-06 Thread A223 A223
On Wed, Oct 5, 2016 at 8:30 PM, A223 A223  wrote:
> On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
>  wrote:
>> On Wed, 5 Oct 2016 02:13:21 -0700
>> A223 A223  wrote:
>>> How can I go about tracking down the root cause of this error?

Just to follow up with some additional thoughts - my host is currently
linux kernel 4.6.2, which was released in early June, so I was just
looking over the kvm git repo to see if any potentially relevant
changes were made in recent months.

I noticed this one:
https://patchwork.kernel.org/patch/8956511/
This change was merged by Linus to
http://git.kernel.org/pub/scm/virt/kvm/kvm.git/ as e55884d on
2016-06-28, well after kernel 4.6.2 was released, so I'm quite
confident I don't have it right now.


In your opinion, based on my /proc/iomem and /proc/ioport output, is
it possible that this commit could help my situation?



Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-06 Thread A223 A223
On Wed, Oct 5, 2016 at 9:33 AM, Alex Williamson
 wrote:
> On Wed, 5 Oct 2016 02:13:21 -0700
> A223 A223  wrote:
>> How can I go about tracking down the root cause of this error?
>
> This often means that the device resources are in use by another,
> non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
> drivers are claiming sub-regions on the device.  You can also turn on
> tracing and look for the event vfio_region_mmap_fault, which might give
> us more information (see docs/tracing.txt).  Also, please provide 'sudo
> lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,

The simpletrace output wouldn't parse for me using the python script
(I'll paste the trace at the end of this email), so I used the ftrace
backend instead (
https://drive.google.com/open?id=0Bwjufq6oAZMfMVpUYTUxODR0NW8 ). It's
quite large and I'm not sure what I'm looking for, so I didn't look at
it very carefully.

I did examine the output from the commands/files you mentioned
(attached). The only thing I noticed is that 5:00.0 is sharing an IRQ
with the onboard EHCI controller on the host, but I'm not sure if
that's an issue. It's also quite possible I missed some other obvious
problem. Please let me know if you have any insights.

Regarding simpletrace, I would get the following errors when I ran the
python command:

./scripts/simpletrace.py trace-events trace-12695
Traceback (most recent call last):
  File "./scripts/simpletrace.py", line 195, in 
run(Formatter())
  File "./scripts/simpletrace.py", line 170, in run
process(events, sys.argv[2], analyzer, read_header=read_header)
  File "./scripts/simpletrace.py", line 145, in process
for rec in read_trace_records(edict, log):
  File "./scripts/simpletrace.py", line 79, in read_trace_records
rec = read_record(edict, fobj)
  File "./scripts/simpletrace.py", line 59, in read_record
return get_record(edict, rechdr, fobj) # return tuple of record elements
  File "./scripts/simpletrace.py", line 40, in get_record
event = edict[event_id]
KeyError: 1791

Thanks, Andrew
-0fff : reserved
1000-0009d7ff : System RAM
0009d800-0009 : reserved
000a-000b : PCI Bus :00
000c-000cfdff : Video ROM
000d-000d3fff : PCI Bus :00
000d4000-000d7fff : PCI Bus :00
000d8000-000dbfff : PCI Bus :00
000dc000-000d : PCI Bus :00
000e-000f : reserved
  000e-000e3fff : PCI Bus :00
  000e4000-000e7fff : PCI Bus :00
  000f-000f : System ROM
0010-9c916fff : System RAM
  0100-016786f4 : Kernel code
  016786f5-01d4cb7f : Kernel data
  01f0c000-02168fff : Kernel bss
9c917000-9c91dfff : ACPI Non-volatile Storage
9c91e000-9d3c6fff : System RAM
9d3c7000-9d8bbfff : reserved
9d8bc000-b9c15fff : System RAM
b9c16000-b9ca7fff : reserved
b9ca8000-b9d10fff : System RAM
b9d11000-b9e53fff : ACPI Non-volatile Storage
b9e54000-bbffefff : reserved
bbfff000-bbff : System RAM
bd00-bf1f : reserved
bf20-feaf : PCI Bus :00
  c000-cfff : :00:02.0
  d000-e1ff : PCI Bus :02
d000-dfff : :02:00.0
e000-e1ff : :02:00.0
  e800-f1ff : PCI Bus :01
e800-efff : :01:00.0
f000-f1ff : :01:00.0
  f400-f50f : PCI Bus :02
f400-f4ff : :02:00.0
f500-f507 : :02:00.0
f508-f5083fff : :02:00.1
  f600-f70f : PCI Bus :01
f600-f6ff : :01:00.0
  f600-f6ff : nvidia
f700-f707 : :01:00.0
f708-f7083fff : :01:00.1
  f708-f7083fff : ICH HD audio
  f740-f77f : :00:02.0
  f780-f78f : PCI Bus :05
f780-f780 : :05:00.0
f781-f7810fff : :05:00.0
f7811000-f7811fff : :05:00.0
  f790-f79f : PCI Bus :04
f790-f793 : :04:00.0
  f790-f793 : alx
  f7a0-f7a03fff : :00:1b.0
f7a0-f7a03fff : ICH HD audio
  f7a04000-f7a07fff : :00:03.0
  f7a08000-f7a080ff : :00:1f.3
  f7a09000-f7a097ff : :00:1f.2
f7a09000-f7a097ff : ahci
  f7a0a000-f7a0a3ff : :00:1d.0
f7a0a000-f7a0a3ff : ehci_hcd
  f7a0b000-f7a0b3ff : :00:1a.0
f7a0b000-f7a0b3ff : ehci_hcd
  f7a0c000-f7a0c00f : :00:16.0
f7a0c000-f7a0c00f : mei_me
  f7fe-f7fe : pnp 00:06
  f800-fbff : PCI MMCONFIG  [bus 00-3f]
f800-fbff : reserved
  f800-fbff : pnp 00:06
fec0-fec00fff : reserved
  fec0-fec003ff : IOAPIC 0
fed0-fed03fff : reserved
  fed0-fed003ff : HPET 0
fed0-fed003ff : PNP0103:00
fed1-fed17fff : pnp 00:06
fed18000-fed18fff : pnp 00:06
fed19000-fed19fff : pnp 00:06
fed1c000-fed1 : reserved
  fed1c000-fed1 : pnp 00:06
fed1f410-fed1f414 : iTCO_wdt.0.auto
fed2-fed3 : pnp 00:06
fed4-fed44fff : pnp 00:00
fed45000-fed8 : pnp 00:06
fed9-fed90fff : dmar0
fed91000-fed91fff : dmar1
fee0-fee00fff : Local API

Re: [Qemu-discuss] How to resolve "Failed to mmap" error?

2016-10-05 Thread Alex Williamson
On Wed, 5 Oct 2016 02:13:21 -0700
A223 A223  wrote:

> Hello,
> 
> When attempting PCI passthrough of a USB controller using the
> following parameter to qemu:
> -device vfio-pci,host=05:00.0,bus=root.1,addr=00.5
> 
> I receive the following error:
> qemu-system-x86_64: -device
> vfio-pci,host=05:00.0,bus=root.1,addr=00.5: Failed to mmap
> :05:00.0 BAR 2. Performance may be slow
> 
> Ultimately, the controller does somewhat work in the VM, but I am
> suffering some performance and reliability issues with it, so I
> suspect this error message is pointing to the issue.
> 
> How can I go about tracking down the root cause of this error?

This often means that the device resources are in use by another,
non-pci driver.  Look in /proc/iomem and /proc/ioport to see if any
drivers are claiming sub-regions on the device.  You can also turn on
tracing and look for the event vfio_region_mmap_fault, which might give
us more information (see docs/tracing.txt).  Also, please provide 'sudo
lspci -vvvs 5:00.0' so we can see the device resources.  Thanks,

Alex