from:"Cao jin"

Re: [Qemu-devel] [PATCH v10 0/8] the reset of msix_init series

2017-04-28 Thread Cao jin

Hi Michael,

  I rebased this patchset against upstream, find no conflicts.
  Hope it is time to merge it.

On 03/06/2017 04:10 PM, Cao jin wrote:
> Michael,
> Is this series ok for 2.9?
> 

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin



On 04/06/2017 06:36 AM, Michael S. Tsirkin wrote:
> On Wed, Apr 05, 2017 at 04:19:10PM -0600, Alex Williamson wrote:
>> On Thu, 6 Apr 2017 00:50:22 +0300
>> "Michael S. Tsirkin" <m...@redhat.com> wrote:
>>
>>> On Wed, Apr 05, 2017 at 01:38:22PM -0600, Alex Williamson wrote:
>>>> The previous intention of trying to handle all sorts of AER faults
>>>> clearly had more value, though even there the implementation and
>>>> configuration requirements restricted the practicality.  For instance
>>>> is AER support actually useful to a customer if it requires all ports
>>>> of a multifunction device assigned to the VM?  This seems more like a
>>>> feature targeting whole system partitioning rather than general VM
>>>> device assignment use cases.  Maybe that's ok, but it should be a clear
>>>> design decision.  
>>>
>>> Alex, what kind of testing do you expect to be necessary?
>>> Would you say testing on real hardware and making it trigger
>>> AER errors is a requirement?
>>
>> Testing various fatal, non-fatal, and corrected errors with aer-inject,
>> especially in multfunction configurations (where more than one port
>> is actually usable) would certainly be required.  If we have cases where
>> the driver for a companion function can escalate a non-fatal error to a
>> bus reset, that should be tested, even if it requires temporary hacks to
>> the host driver for the companion function to trigger that case.  AER
>> handling is not something that the typical user is going to experience,
>> so it should to be thoroughly tested to make sure it works when needed
>> or there's little point to doing it at all.  Thanks,
>>
>> Alex
> 
> Some things can be tested within a VM. What would you
> say would be sufficient on a VM and what has to be
> tested on bare metal?
> 

Does the "bare metal" here mean something like XenServer?
-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin



On 04/06/2017 03:38 AM, Alex Williamson wrote:
> On Wed, 5 Apr 2017 16:54:33 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> Sorry for late. Distracted by other problem for a while.
>>
>> On 03/31/2017 02:16 AM, Alex Williamson wrote:
>>> On Thu, 30 Mar 2017 21:00:35 +0300  
>>
>>>>>>>>   
>>>>>>>>>
>>>>>>>>> I also asked in my previous comments to provide examples of errors 
>>>>>>>>> that
>>>>>>>>> might trigger correctable errors to the user, this comment seems to
>>>>>>>>> have been missed.  In my experience, AERs generated during device
>>>>>>>>> assignment are generally hardware faults or induced by bad guest
>>>>>>>>> drivers.  These are cases where a single fatal error is an appropriate
>>>>>>>>> and sufficient response.  We've scaled back this support to the point
>>>>>>>>> where we're only improving the situation of correctable errors and I'm
>>>>>>>>> not convinced this is worthwhile and we're not simply checking a box 
>>>>>>>>> on
>>>>>>>>> an ill-conceived marketing requirements document.
>>>>>>>>
>>>>>>>> Sorry. I noticed that question: "what actual errors do we expect
>>>>>>>> userspace to see as non-fatal errors?", but I am confused about it.
>>>>>>>> Correctable, non-fatal, fatal errors are clearly defined in PCIe spec,
>>>>>>>> and Uncorrectable Error Severity Register will tell which is fatal, and
>>>>>>>> which is non-fatal, this register is configurable, they are device
>>>>>>>> specific as I guess. AER core driver distinguish them by
>>>>>>>> pci_channel_io_normal/pci_channel_io_frozen,  So I don't understand 
>>>>>>>> your
>>>>>>>> question. Or
>>>>>>>>
>>>>>>>> Or, Do you mean we could list the default non-fatal error of
>>>>>>>> Uncorrectable Error Severity Register which is provided by PCIe spec?  
>>>>>>>> 
>>>>>>>
>>>>>>> I'm trying to ask why is this patch series useful.  It's clearly
>>>>>>> possible for us to signal non-fatal errors for a device to a guest, but
>>>>>>> why is it necessarily a good idea to do so?  What additional RAS
>>>>>>> feature is gained by this?  Can we give a single example of a real
>>>>>>> world scenario where a guest has been shutdown due to a non-fatal error
>>>>>>> that the guest driver would have handled?  
>>>>>>
>>>>>> We've been discussing AER for months if not years.
>>>>>> Isn't it a bit too late to ask whether AER recovery
>>>>>> by guests it useful at all?
>>>>>
>>>>>
>>>>> Years, but I think that is more indicative of the persistence of the
>>>>> developers rather than growing acceptance on my part.  For the majority
>>>>> of that we were headed down the path of full AER support with the guest
>>>>> able to invoke bus resets.  It was a complicated solution, but it was
>>>>> more clear that it had some value.   Of course that's been derailed
>>>>> due to various issues and we're now on this partial implementation that
>>>>> only covers non-fatal errors that we assume the guest can recover from
>>>>> without providing it mechanisms to do bus resets.  Is there actual
>>>>> value to this or are we just trying to fill an AER checkbox on
>>>>> someone's marketing sheet?  I don't think it's too much to ask for a
>>>>> commit log to include evidence or discussion about how a feature is
>>>>> actually a benefit to implement.
>>>>
>>>> Seems rather self evident but ok.  So something like
>>>>
>>>> With this patch, guest is able to recover from non-fatal correctable
>>>> errors - as opposed to stopping the guest with no ability to
>>>> recover which was the only option previously.
>>>>
>>>> Would this address your question?  
>>>
>>>
>>> No, that's just restating the theoretical usefulness of this.  Have you
>>> ever seen

Re: [Qemu-devel] [PATCH v6] vfio error recovery: kernel support

2017-04-06 Thread Cao jin

On 04/06/2017 05:56 AM, Michael S. Tsirkin wrote:
> On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote:
>> Apparently, I don't have experience to induce non-fatal error, device
>> error is more of a chance related with the environment(temperature,
>> humidity, etc) as I understand.
> 
> I'm not sure how to interpret this statement. I think what Alex is
> saying is simply that patches should include some justification. They
> make changes but what are they improving?
> For example:
> 
>   I tested device ABC in conditions DEF. Without a patch VM
>   stops. With the patches applied VM recovers and proceeds to
>   use the device normally.
> 
> is one reasonable justification imho.
> 

Got it. But unfortunately, until now, I haven't seen a VM stop caused by
a real device non-fatal error during device assignment(Only saw real
fatal errors after start VM).
On one side, AER error could occur theoretically; on the other side,
seldom people have seen a VM stop caused by AER. Now I am asked that do
I have a real evidence or scenario to prove that this patchset is really
useful? I don't, and we all know it is hard to trigger a real hardware
error, so, seems I am pushed into the corner.  I guess these questions
also apply for AER driver's author, if the scenario is easy to
reproduce, there is no need to write aer_inject to fake errors.

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v6] vfio error recovery: kernel support

2017-04-05 Thread Cao jin

Sorry for late. Distracted by other problem for a while.

On 03/31/2017 02:16 AM, Alex Williamson wrote:
> On Thu, 30 Mar 2017 21:00:35 +0300

>>>>>> 
>>>>>>>
>>>>>>> I also asked in my previous comments to provide examples of errors that
>>>>>>> might trigger correctable errors to the user, this comment seems to
>>>>>>> have been missed.  In my experience, AERs generated during device
>>>>>>> assignment are generally hardware faults or induced by bad guest
>>>>>>> drivers.  These are cases where a single fatal error is an appropriate
>>>>>>> and sufficient response.  We've scaled back this support to the point
>>>>>>> where we're only improving the situation of correctable errors and I'm
>>>>>>> not convinced this is worthwhile and we're not simply checking a box on
>>>>>>> an ill-conceived marketing requirements document.  
>>>>>>
>>>>>> Sorry. I noticed that question: "what actual errors do we expect
>>>>>> userspace to see as non-fatal errors?", but I am confused about it.
>>>>>> Correctable, non-fatal, fatal errors are clearly defined in PCIe spec,
>>>>>> and Uncorrectable Error Severity Register will tell which is fatal, and
>>>>>> which is non-fatal, this register is configurable, they are device
>>>>>> specific as I guess. AER core driver distinguish them by
>>>>>> pci_channel_io_normal/pci_channel_io_frozen,  So I don't understand your
>>>>>> question. Or
>>>>>>
>>>>>> Or, Do you mean we could list the default non-fatal error of
>>>>>> Uncorrectable Error Severity Register which is provided by PCIe spec?
>>>>>
>>>>> I'm trying to ask why is this patch series useful.  It's clearly
>>>>> possible for us to signal non-fatal errors for a device to a guest, but
>>>>> why is it necessarily a good idea to do so?  What additional RAS
>>>>> feature is gained by this?  Can we give a single example of a real
>>>>> world scenario where a guest has been shutdown due to a non-fatal error
>>>>> that the guest driver would have handled?
>>>>
>>>> We've been discussing AER for months if not years.
>>>> Isn't it a bit too late to ask whether AER recovery
>>>> by guests it useful at all?  
>>>
>>>
>>> Years, but I think that is more indicative of the persistence of the
>>> developers rather than growing acceptance on my part.  For the majority
>>> of that we were headed down the path of full AER support with the guest
>>> able to invoke bus resets.  It was a complicated solution, but it was
>>> more clear that it had some value.   Of course that's been derailed
>>> due to various issues and we're now on this partial implementation that
>>> only covers non-fatal errors that we assume the guest can recover from
>>> without providing it mechanisms to do bus resets.  Is there actual
>>> value to this or are we just trying to fill an AER checkbox on
>>> someone's marketing sheet?  I don't think it's too much to ask for a
>>> commit log to include evidence or discussion about how a feature is
>>> actually a benefit to implement.  
>>
>> Seems rather self evident but ok.  So something like
>>
>> With this patch, guest is able to recover from non-fatal correctable
>> errors - as opposed to stopping the guest with no ability to
>> recover which was the only option previously.
>>
>> Would this address your question?
> 
> 
> No, that's just restating the theoretical usefulness of this.  Have you
> ever seen a non-fatal error?  Does it ever trigger?  If we can't
> provide a real world case of this being useful, can we at least discuss
> the types of things that might trigger a non-fatal error for which the
> guest could recover?  In patch 3/3 Cao Jin claimed we have a 50% chance
> of reducing VM stop conditions, but I suspect this is just a misuse of
> statistics, just because there are two choices, fatal vs non-fatal,
> does not make them equally likely.  Do we have any idea of the
> incidence rate of non-fatal errors?  Is it non-zero?  Thanks,
> 

Apparently, I don't have experience to induce non-fatal error, device
error is more of a chance related with the environment(temperature,
humidity, etc) as I understand.

After reading the discussion, can I construe that the old design with
full AER support is preferred than this new one?  The core issue of the
old design is that the second host link reset make the subsequent
guest's register reading fail, and I think this can be solved by test
the device's accessibility(read device' register, all F's means
unaccessible. IIRC, EEH of Power also use this way to test device's
accessiblity) and delay guest's reading if device is temporarily
unaccessible.

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v3 2/3] vfio pci: new function to init AER capability

2017-03-28 Thread Cao jin



On 03/25/2017 06:12 AM, Alex Williamson wrote:
> On Thu, 23 Mar 2017 17:09:22 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
> This is not a sufficiently trivial patch to leave the commit log empty.
> 
>> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---

>> +pcie_cap_deverr_init(pdev);
> 
> This assumes we've set exp_cap, perhaps the code should validate this
> to avoid corner case configurations where the PCIe cap has been dropped
> yet AER is still present.  I'm not sure if it's possible, but I'd
> rather test than segfault.
> 
>> +return pcie_aer_init(pdev, cap_ver, pos, size, errp);
> 
> 
> I think here too, users may have existing configurations that could
> break by suddenly imposing a new topology requirement.
> 

Not quite follow, test what for pcie_aer_init()?

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v3 3/3] vfio-pci: process non fatal error of AER

2017-03-28 Thread Cao jin



On 03/25/2017 06:12 AM, Alex Williamson wrote:
> On Thu, 23 Mar 2017 17:09:23 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> Make use of the non fatal error eventfd that the kernel module provide
>> to process the AER non fatal error. Fatal error still goes into the
>> legacy way which results in VM stop.
>>
>> Register the handler, wait for notification. Construct aer message and
>> pass it to root port on notification. Root port will trigger an interrupt
>> to signal guest, then guest driver will do the recovery.
> 
> Can we guarantee this is the better solution in all cases or could
> there be guests without AER support where the VM stop is the better
> solution?
> 

Currently, we only have VM stop on errors, that looks the same as a
sudden power down to me.  With this solution, we have about
50%(non-fatal) chance to reduce the sudden power-down risk.

What if a guest doesn't support AER?  It looks the same as a host
without AER support. Now I only can speculate the worst condition: guest
crash, would that be quite different from a sudden power-down?

>>
>> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
>>  hw/vfio/pci.c  | 202 
>> +
>>  hw/vfio/pci.h  |   2 +
>>  linux-headers/linux/vfio.h |   2 +
>>  3 files changed, 206 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 3d0d005..c6786d5 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2432,6 +2432,200 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>>  vfio_put_base_device(>vbasedev);
>>  }
>>  
>> +static void vfio_non_fatal_err_notifier_handler(void *opaque)
>> +{
>> +VFIOPCIDevice *vdev = opaque;
>> +PCIDevice *dev = >pdev;
>> +PCIEAERMsg msg = {
>> +.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
>> +.source_id = pci_requester_id(dev),
>> +};
>> +
>> +if (!event_notifier_test_and_clear(>non_fatal_err_notifier)) {
>> +return;
>> +}
>> +
>> +/* Populate the aer msg and send it to root port */
>> +if (dev->exp.aer_cap) {
> 
> Why would we have registered this notifier otherwise?
> 
>> +uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>> +uint32_t uncor_status;
>> +bool isfatal;
>> +
>> +uncor_status = vfio_pci_read_config(dev,
>> +dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
>> +if (!uncor_status) {
>> +return;
>> +}
>> +
>> +isfatal = uncor_status & pci_get_long(aer_cap + 
>> PCI_ERR_UNCOR_SEVER);
>> +if (isfatal) {
>> +goto stop;
>> +}
> 
> Huh?  How can we get a non-fatal error notice for a fatal error?  (and
> why are we saving this to a variable rather than testing it within the
> 'if' condition?
>

Both of these are for the unsure corner cases.
Is it possible that register reading shows a fatal error?
Saving it into a variable just is personal taste: more neat.

>> +
>> +error_report("%s sending non fatal event to root port. uncor status 
>> = "
>> + "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
>> +pcie_aer_msg(dev, );
>> +return;
>> +}
>> +
>> +stop:
>> +/* Terminate the guest in case of fatal error */
>> +error_report("%s: Device detected a fatal error. VM stopped",
>> +vdev->vbasedev.name);
>> +vm_stop(RUN_STATE_INTERNAL_ERROR);
> 
> Shouldn't we use the existing error index if we can't make use of
> correctable errors?
> 

Why? If register reading shows it is actually a fatal error, is it the
same as fatal error handler is notified?  what we use the existing error
index for?


>> @@ -2860,6 +3054,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>  }
>>  }
>>  
>> +vfio_register_passive_reset_notifier(vdev);
>> +vfio_register_non_fatal_err_notifier(vdev);
> 
> I think it's wrong that we configure these unconditionally.  Why do we
> care about these unless we're configuring the guest to receive AER
> events?
> 

But do we have ways to know whether the guest has AER support? For now,
I don't think so.

If guest don't have AER support, for the worst condition: guest crash,
it is not worse than a sudden power-down.


-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v3 1/3] pcie aer: verify if AER functionality is available

2017-03-28 Thread Cao jin



On 03/25/2017 06:12 AM, Alex Williamson wrote:
> On Thu, 23 Mar 2017 17:09:21 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> For devices which support AER function, verify it can work or not in the
>> system:
>> 1. AER capable device is a PCIe device, it can't be plugged into PCI bus
>> 2. If root port doesn't support AER, then there is no need to expose the
>>AER capability
>>
>> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
>>  hw/pci/pcie_aer.c | 28 
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
>> index daf1f65..a2e9818 100644
>> --- a/hw/pci/pcie_aer.c
>> +++ b/hw/pci/pcie_aer.c
>> @@ -100,6 +100,34 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
>>  int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
>>uint16_t size, Error **errp)
>>  {
>> +PCIDevice *parent_dev;
>> +uint8_t type;
>> +uint8_t parent_type;
>> +
>> +/* Topology test: see if there is need to expose AER cap */
>> +type = pcie_cap_get_type(dev);
>> +parent_dev = pci_bridge_get_device(dev->bus);
>> +while (parent_dev) {
>> +parent_type = pcie_cap_get_type(parent_dev);
>> +
>> +if (type == PCI_EXP_TYPE_ENDPOINT &&
>> +(parent_type != PCI_EXP_TYPE_ROOT_PORT &&
>> + parent_type != PCI_EXP_TYPE_DOWNSTREAM)) {
>> +error_setg(errp, "Parent device is not a PCIe component");
>> +return -ENOTSUP;
>> +}
>> +
>> +if (parent_type == PCI_EXP_TYPE_ROOT_PORT) {
>> +if (!parent_dev->exp.aer_cap)
>> +{
> 
> Curly brace at the end of the previous line.
> 
>> +error_setg(errp, "Root port does not support AER");
>> +return -ENOTSUP;
>> +}
>> +}
>> +
>> +parent_dev = pci_bridge_get_device(parent_dev->bus);
>> +}
>> +
>>  pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, cap_ver,
>>  offset, size);
>>  dev->exp.aer_cap = offset;
> 
> This patch makes existing configurations including PCIe root ports,
> upstream ports, downstream ports, and e1000e fail if they do not meet
> this new configuration requirement.
> 

Yes, I noticed that e1000e could be realized on i440fx, which I think is
not possible in real world, like the commit log(1.) said.

But for those ports, what are the conditions they will fail with this patch?
-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v3 0/3] vfio-pci: support recovery of AER non fatal error

2017-03-28 Thread Cao jin

On 03/25/2017 06:12 AM, Alex Williamson wrote:
> On Thu, 23 Mar 2017 17:09:20 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> v3 changelog:
>> 1. Address all comments from MST in patch 3, include remove the flag
>>pci_aer_non_fatal & passive_reset, also the boilerplate code.
>>The corresponding kernel patch is v6.
>>
>> Test:
>> Test with func1 passthroughed while func0 doesn't have user.
> 
> So the slot_reset trigger really hasn't been tested at all?

No, because we don't have that kind of multi-function device. IIRC, in
real world, most of multi-function devices have the same functions.

I plan to do basic test as described above before got Reviewed-by, and
will do full test as before after reviewed.

I will consider if we can fake to trigger slot_reset.

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v6] vfio error recovery: kernel support

2017-03-28 Thread Cao jin



On 03/25/2017 06:12 AM, Alex Williamson wrote:
> On Thu, 23 Mar 2017 17:07:31 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
> A more appropriate patch subject would be:
> 
> vfio-pci: Report correctable errors and slot reset events to user
>

Correctable? It is confusing to me. Correctable error has its clear
definition in PCIe spec, shouldn't it be "non-fatal"?

>> From: "Michael S. Tsirkin" <m...@redhat.com>
> 
> This hardly seems accurate anymore.  You could say Suggested-by and let
> Michael add a sign-off, but it's changed since he sent it.
> 
>>
>> 0. What happens now (PCIE AER only)
>>Fatal errors cause a link reset. Non fatal errors don't.
>>All errors stop the QEMU guest eventually, but not immediately,
>>because it's detected and reported asynchronously.
>>Interrupts are forwarded as usual.
>>Correctable errors are not reported to user at all.
>>
>>Note:
>>PPC EEH is different, but this approach won't affect EEH. EEH treat
>>all errors as fatal ones in AER, so they will still be signalled to user
>>via the legacy eventfd.  Besides, all devices/functions in a PE belongs
>>to the same IOMMU group, so the slot_reset handler in this approach
>>won't affect EEH either.
>>
>> 1. Correctable errors
>>Hardware can correct these errors without software intervention,
>>clear the error status is enough, this is what already done now.
>>No need to recover it, nothing changed, leave it as it is.
>>
>> 2. Fatal errors
>>They will induce a link reset. This is troublesome when user is
>>a QEMU guest. This approach doesn't touch the existing mechanism.
>>
>> 3. Non-fatal errors
>>Before this patch, they are signalled to user the same way as fatal ones.
>>With this patch, a new eventfd is introduced only for non-fatal error
>>notification. By splitting non-fatal ones out, it will benefit AER
>>recovery of a QEMU guest user.
>>
>>To maintain backwards compatibility with userspace, non-fatal errors
>>will continue to trigger via the existing error interrupt index if a
>>non-fatal signaling mechanism has not been registered.
>>
>>Note:
>>In case of PCI Express errors, kernel might request a slot reset
>>affecting our device (from our point of view this is a passive device
>>reset as opposed to an active one requested by vfio itself).
>>This might currently happen if a slot reset is requested by a driver
>>(other than vfio) bound to another device function in the same slot.
>>This will cause our device to lose its state so report this event to
>>userspace.
> 
> I tried to convey this in my last comments, I don't think this is an
> appropriate commit log.  Lead with what is the problem you're trying to
> fix and why, what is the benefit to the user, and how is the change
> accomplished.  If you want to provide a State of Error Handling in
> VFIO, append it after the main points of the commit log.

ok.

> 
> I also asked in my previous comments to provide examples of errors that
> might trigger correctable errors to the user, this comment seems to
> have been missed.  In my experience, AERs generated during device
> assignment are generally hardware faults or induced by bad guest
> drivers.  These are cases where a single fatal error is an appropriate
> and sufficient response.  We've scaled back this support to the point
> where we're only improving the situation of correctable errors and I'm
> not convinced this is worthwhile and we're not simply checking a box on
> an ill-conceived marketing requirements document.

Sorry. I noticed that question: "what actual errors do we expect
userspace to see as non-fatal errors?", but I am confused about it.
Correctable, non-fatal, fatal errors are clearly defined in PCIe spec,
and Uncorrectable Error Severity Register will tell which is fatal, and
which is non-fatal, this register is configurable, they are device
specific as I guess. AER core driver distinguish them by
pci_channel_io_normal/pci_channel_io_frozen,  So I don't understand your
question. Or

Or, Do you mean we could list the default non-fatal error of
Uncorrectable Error Severity Register which is provided by PCIe spec?

> 
> I had also commented asking how the hypervisor is expected to know
> whether the guest supports AER.  With the existing support of a single
> fatal error, the hypervisor halts the VM regardless of the error
> severity or guest support.  Now we have the opportunity that the
> hypervisor can forward a correctable error to the guest... and hope the
> right thing occurs?

[Qemu-devel] [PATCH v3 2/3] vfio pci: new function to init AER capability

2017-03-23 Thread Cao jin

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c | 41 -
 hw/vfio/pci.h |  1 +
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..3d0d005 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1855,18 +1855,42 @@ out:
 return 0;
 }
 
-static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
+  int pos, uint16_t size, Error **errp)
+{
+PCIDevice *pdev = >pdev;
+uint32_t errcap;
+
+errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+/*
+ * The ability to record multiple headers is depending on
+ * the state of the Multiple Header Recording Capable bit and
+ * enabled by the Multiple Header Recording Enable bit.
+ */
+if ((errcap & PCI_ERR_CAP_MHRC) &&
+(errcap & PCI_ERR_CAP_MHRE)) {
+pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+} else {
+pdev->exp.aer_log.log_max = 0;
+}
+
+pcie_cap_deverr_init(pdev);
+return pcie_aer_init(pdev, cap_ver, pos, size, errp);
+}
+
+static int vfio_add_ext_cap(VFIOPCIDevice *vdev, Error **errp)
 {
 PCIDevice *pdev = >pdev;
 uint32_t header;
 uint16_t cap_id, next, size;
 uint8_t cap_ver;
 uint8_t *config;
+int ret = 0;
 
 /* Only add extended caps if we have them and the guest can see them */
 if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
 !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
-return;
+return 0;
 }
 
 /*
@@ -1915,6 +1939,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
PCI_EXT_CAP_NEXT_MASK);
 
 switch (cap_id) {
+case PCI_EXT_CAP_ID_ERR:
+ret = vfio_setup_aer(vdev, cap_ver, next, size, errp);
+break;
 case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
 case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
 trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
@@ -1923,6 +1950,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pcie_add_capability(pdev, cap_id, cap_ver, next, size);
 }
 
+if (ret) {
+goto out;
+}
 }
 
 /* Cleanup chain head ID if necessary */
@@ -1930,8 +1960,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
 }
 
+out:
 g_free(config);
-return;
+return ret;
 }
 
 static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
@@ -1949,8 +1980,8 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, 
Error **errp)
 return ret;
 }
 
-vfio_add_ext_cap(vdev);
-return 0;
+ret = vfio_add_ext_cap(vdev, errp);
+return ret;
 }
 
 static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index a8366bb..34e8b04 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
 #include "qemu-common.h"
 #include "exec/memory.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
 #include "hw/vfio/vfio-common.h"
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 0/3] vfio-pci: support recovery of AER non fatal error

2017-03-23 Thread Cao jin

v3 changelog:
1. Address all comments from MST in patch 3, include remove the flag
   pci_aer_non_fatal & passive_reset, also the boilerplate code.
   The corresponding kernel patch is v6.

Test:
Test with func1 passthroughed while func0 doesn't have user.

Cao jin (3):
  pcie aer: verify if AER functionality is available
  vfio pci: new function to init AER capability
  vfio-pci: process non fatal error of AER

 hw/pci/pcie_aer.c  |  28 ++
 hw/vfio/pci.c  | 243 -
 hw/vfio/pci.h  |   3 +
 linux-headers/linux/vfio.h |   2 +
 4 files changed, 271 insertions(+), 5 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v3 3/3] vfio-pci: process non fatal error of AER

2017-03-23 Thread Cao jin

Make use of the non fatal error eventfd that the kernel module provide
to process the AER non fatal error. Fatal error still goes into the
legacy way which results in VM stop.

Register the handler, wait for notification. Construct aer message and
pass it to root port on notification. Root port will trigger an interrupt
to signal guest, then guest driver will do the recovery.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c  | 202 +
 hw/vfio/pci.h  |   2 +
 linux-headers/linux/vfio.h |   2 +
 3 files changed, 206 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3d0d005..c6786d5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2432,6 +2432,200 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
 vfio_put_base_device(>vbasedev);
 }
 
+static void vfio_non_fatal_err_notifier_handler(void *opaque)
+{
+VFIOPCIDevice *vdev = opaque;
+PCIDevice *dev = >pdev;
+PCIEAERMsg msg = {
+.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
+.source_id = pci_requester_id(dev),
+};
+
+if (!event_notifier_test_and_clear(>non_fatal_err_notifier)) {
+return;
+}
+
+/* Populate the aer msg and send it to root port */
+if (dev->exp.aer_cap) {
+uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+uint32_t uncor_status;
+bool isfatal;
+
+uncor_status = vfio_pci_read_config(dev,
+dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
+if (!uncor_status) {
+return;
+}
+
+isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+if (isfatal) {
+goto stop;
+}
+
+error_report("%s sending non fatal event to root port. uncor status = "
+ "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
+pcie_aer_msg(dev, );
+return;
+}
+
+stop:
+/* Terminate the guest in case of fatal error */
+error_report("%s: Device detected a fatal error. VM stopped",
+   vdev->vbasedev.name);
+vm_stop(RUN_STATE_INTERNAL_ERROR);
+}
+
+/*
+ * Register non fatal error notifier for devices supporting error recovery.
+ * If we encounter a failure in this function, we report an error
+ * and continue after disabling error recovery support for the device.
+ */
+static void vfio_register_non_fatal_err_notifier(VFIOPCIDevice *vdev)
+{
+int ret;
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+
+if (event_notifier_init(>non_fatal_err_notifier, 0)) {
+error_report("vfio: Unable to init event notifier for non-fatal error 
detection");
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)_set->data;
+
+*pfd = event_notifier_get_fd(>non_fatal_err_notifier);
+qemu_set_fd_handler(*pfd, vfio_non_fatal_err_notifier_handler, NULL, vdev);
+
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+error_report("vfio: Failed to set up non-fatal error notification: 
%m");
+qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
+event_notifier_cleanup(>non_fatal_err_notifier);
+}
+g_free(irq_set);
+}
+
+static void vfio_unregister_non_fatal_err_notifier(VFIOPCIDevice *vdev)
+{
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+int ret;
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)_set->data;
+*pfd = -1;
+
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+error_report("vfio: Failed to de-assign error fd: %m");
+}
+g_free(irq_set);
+qemu_set_fd_handler(event_notifier_get_fd(>non_fatal_err_notifier),
+NULL, NULL, vdev);
+event_notifier_cleanup(>non_fatal_err_notifier);
+}
+
+static void vfio_passive_reset_notifier_handler(void *opaque)
+{
+VFIOPCIDevice *vdev = opaque;
+
+if (!event_notifier_test_and_clear(>passive_reset_notifier)) {
+return;
+}
+
+error_report("%s: Device lost state due to host device reset. VM stopped",
+   vdev->vbasedev.name);
+vm_stop(RUN_STATE_INTERNAL_ERROR);
+}

[Qemu-devel] [PATCH v3 1/3] pcie aer: verify if AER functionality is available

2017-03-23 Thread Cao jin

For devices which support AER function, verify it can work or not in the
system:
1. AER capable device is a PCIe device, it can't be plugged into PCI bus
2. If root port doesn't support AER, then there is no need to expose the
   AER capability

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/pcie_aer.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index daf1f65..a2e9818 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -100,6 +100,34 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
 int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
   uint16_t size, Error **errp)
 {
+PCIDevice *parent_dev;
+uint8_t type;
+uint8_t parent_type;
+
+/* Topology test: see if there is need to expose AER cap */
+type = pcie_cap_get_type(dev);
+parent_dev = pci_bridge_get_device(dev->bus);
+while (parent_dev) {
+parent_type = pcie_cap_get_type(parent_dev);
+
+if (type == PCI_EXP_TYPE_ENDPOINT &&
+(parent_type != PCI_EXP_TYPE_ROOT_PORT &&
+ parent_type != PCI_EXP_TYPE_DOWNSTREAM)) {
+error_setg(errp, "Parent device is not a PCIe component");
+return -ENOTSUP;
+}
+
+if (parent_type == PCI_EXP_TYPE_ROOT_PORT) {
+if (!parent_dev->exp.aer_cap)
+{
+error_setg(errp, "Root port does not support AER");
+return -ENOTSUP;
+}
+}
+
+parent_dev = pci_bridge_get_device(parent_dev->bus);
+}
+
 pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, cap_ver,
 offset, size);
 dev->exp.aer_cap = offset;
-- 
1.8.3.1

[Qemu-devel] [PATCH v6] vfio error recovery: kernel support

2017-03-23 Thread Cao jin

From: "Michael S. Tsirkin" <m...@redhat.com>

0. What happens now (PCIE AER only)
   Fatal errors cause a link reset. Non fatal errors don't.
   All errors stop the QEMU guest eventually, but not immediately,
   because it's detected and reported asynchronously.
   Interrupts are forwarded as usual.
   Correctable errors are not reported to user at all.

   Note:
   PPC EEH is different, but this approach won't affect EEH. EEH treat
   all errors as fatal ones in AER, so they will still be signalled to user
   via the legacy eventfd.  Besides, all devices/functions in a PE belongs
   to the same IOMMU group, so the slot_reset handler in this approach
   won't affect EEH either.

1. Correctable errors
   Hardware can correct these errors without software intervention,
   clear the error status is enough, this is what already done now.
   No need to recover it, nothing changed, leave it as it is.

2. Fatal errors
   They will induce a link reset. This is troublesome when user is
   a QEMU guest. This approach doesn't touch the existing mechanism.

3. Non-fatal errors
   Before this patch, they are signalled to user the same way as fatal ones.
   With this patch, a new eventfd is introduced only for non-fatal error
   notification. By splitting non-fatal ones out, it will benefit AER
   recovery of a QEMU guest user.

   To maintain backwards compatibility with userspace, non-fatal errors
   will continue to trigger via the existing error interrupt index if a
   non-fatal signaling mechanism has not been registered.

   Note:
   In case of PCI Express errors, kernel might request a slot reset
   affecting our device (from our point of view this is a passive device
   reset as opposed to an active one requested by vfio itself).
   This might currently happen if a slot reset is requested by a driver
   (other than vfio) bound to another device function in the same slot.
   This will cause our device to lose its state so report this event to
   userspace.

Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
v6 changelog:
Address all the comments from MST.

 drivers/vfio/pci/vfio_pci.c | 49 +++--
 drivers/vfio/pci/vfio_pci_intrs.c   | 38 
 drivers/vfio/pci/vfio_pci_private.h |  2 ++
 include/uapi/linux/vfio.h   |  2 ++
 4 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e..71f9a8a 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -441,7 +441,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
 
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
-   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
+  irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX ||
+  irq_type == VFIO_PCI_PASSIVE_RESET_IRQ_INDEX) {
if (pci_is_pcie(vdev->pdev))
return 1;
} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
@@ -796,6 +798,8 @@ static long vfio_pci_ioctl(void *device_data,
case VFIO_PCI_REQ_IRQ_INDEX:
break;
case VFIO_PCI_ERR_IRQ_INDEX:
+   case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
+   case VFIO_PCI_PASSIVE_RESET_IRQ_INDEX:
if (pci_is_pcie(vdev->pdev))
break;
/* pass thru to return error */
@@ -1282,7 +1286,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
 
mutex_lock(>igate);
 
-   if (vdev->err_trigger)
+   if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
+   eventfd_signal(vdev->non_fatal_err_trigger, 1);
+   else if (vdev->err_trigger)
eventfd_signal(vdev->err_trigger, 1);
 
mutex_unlock(>igate);
@@ -1292,8 +1298,47 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+/*
+ * In case of PCI Express errors, kernel might request a slot reset
+ * affecting our device (from our point of view, this is a passive device
+ * reset as opposed to an active one requested by vfio itself).
+ * This might currently happen if a slot reset is requested by a driver
+ * (other than vfio) bound to another device function in the same slot.
+ * This will cause our device to lose its state, so report this event to
+ * userspace.
+ */
+static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
+{
+   struct vfio_pci_device *vdev;
+   struct vfio_device *device;
+   static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
+
+   device = vfio_device_get_from_dev(>dev);
+   if (!device)
+   goto

Re: [Qemu-devel] [PATCH v2 3/3] vfio-pci: process non fatal error of AER

2017-03-22 Thread Cao jin



On 03/22/2017 09:27 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 22, 2017 at 06:36:52PM +0800, Cao jin wrote:
>> Make use of the non fatal error eventfd that the kernel module provide
>> to process the AER non fatal error. Fatal error still goes into the
>> legacy way which results in VM stop.
>>
>> Register the handler, wait for notification. Construct aer message and
>> pass it to root port on notification. Root port will trigger an interrupt
>> to signal guest, then guest driver will do the recovery.
>>
>> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
>>  hw/vfio/pci.c  | 247 
>> +
>>  hw/vfio/pci.h  |   4 +
>>  linux-headers/linux/vfio.h |   2 +
>>  3 files changed, 253 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 3d0d005..4912bc6 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2422,6 +2422,34 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
>> Error **errp)
>>   "Could not enable error recovery for the device",
>>   vbasedev->name);
>>  }
>> +
>> +irq_info.index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
>> +irq_info.count = 0; /* clear */
>> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
>> +if (ret) {
>> +/* This can fail for an old kernel or legacy PCI dev */
>> +trace_vfio_populate_device_get_irq_info_failure();
>> +} else if (irq_info.count == 1) {
>> +vdev->pci_aer_non_fatal = true;
>> +} else {
>> +error_report(WARN_PREFIX
>> + "Couldn't enable non fatal error recovery for the 
>> device",
>> + vbasedev->name);
> 
> when does this trigger?
> 
>> +}
>> +
>> +irq_info.index = VFIO_PCI_PASSIVE_RESET_IRQ_INDEX;
>> +irq_info.count = 0; /* clear */
>> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
>> +if (ret) {
>> +/* This can fail for an old kernel or legacy PCI dev */
>> +trace_vfio_populate_device_get_irq_info_failure();
>> +} else if (irq_info.count == 1) {
>> +vdev->passive_reset = true;
>> +} else {
>> +error_report(WARN_PREFIX
>> + "Don't support passive reset notification",
>> + vbasedev->name);
> 
> when does this happen?
> what does this message mean?
> 

Both are boilerplate code as err_notifier. They will be triggered by
running latest QEMU on older kernel.  Will drop these code & the flags

>> +}
>>  }
>>  
>>  static void vfio_put_device(VFIOPCIDevice *vdev)
>> @@ -2432,6 +2460,221 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>>  vfio_put_base_device(>vbasedev);
>>  }
>>  
>> +static void vfio_non_fatal_err_notifier_handler(void *opaque)
>> +{
>> +VFIOPCIDevice *vdev = opaque;
>> +PCIDevice *dev = >pdev;
>> +PCIEAERMsg msg = {
>> +.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
>> +.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>> +};
>> +
> 
> Should this just use pci_requester_id?
> 
> 
> At least Peter thought so.
> 
>> +if (!event_notifier_test_and_clear(>non_fatal_err_notifier)) {
>> +return;
>> +}
>> +
>> +/* Populate the aer msg and send it to root port */
>> +if (dev->exp.aer_cap) {
>> +uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>> +uint32_t uncor_status;
>> +bool isfatal;
>> +
>> +uncor_status = vfio_pci_read_config(dev,
>> +dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
>> +if (!uncor_status) {
>> +return;
>> +}
>> +
>> +isfatal = uncor_status & pci_get_long(aer_cap + 
>> PCI_ERR_UNCOR_SEVER);
>> +if (isfatal) {
>> +goto stop;
>> +}
>> +
>> +error_report("%s sending non fatal event to root port. uncor status 
>> = "
>> + "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
>> +pcie_aer_msg(dev, );
>> +return;
>> +}
>> +
>> +stop:
>> +/* Terminate the guest in case of fatal error */
>> +error_report

Re: [Qemu-devel] [PATCH v5] vfio error recovery: kernel support

2017-03-22 Thread Cao jin



On 03/22/2017 09:10 PM, Michael S. Tsirkin wrote:
> Minor comments on commit log below.
> 
> On Wed, Mar 22, 2017 at 06:34:23PM +0800, Cao jin wrote:
>> From: "Michael S. Tsirkin" <m...@redhat.com>
>>

> 
>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
>>
>> v5 changelog:
>> 1. Add another new eventfd passive_reset_trigger & the boilerplate code,
>>used in slot_reset. Add comment for slot_reset().
>> 2. Rewrite the commit log.
>>
>>  drivers/vfio/pci/vfio_pci.c | 49 
>> +++--
>>  drivers/vfio/pci/vfio_pci_intrs.c   | 38 
>>  drivers/vfio/pci/vfio_pci_private.h |  2 ++
>>  include/uapi/linux/vfio.h   |  2 ++
>>  4 files changed, 89 insertions(+), 2 deletions(-)
>>

> 
> 
>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
>> +{
>> +struct vfio_pci_device *vdev;
>> +struct vfio_device *device;
>> +static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
>> +
>> +device = vfio_device_get_from_dev(>dev);
>> +if (!device)
>> +goto err_dev;
>> +
>> +vdev = vfio_device_data(device);
>> +if (!vdev)
>> +goto err_data;
>> +
>> +mutex_lock(>igate);
>> +
>> +if (vdev->passive_reset_trigger)
>> +eventfd_signal(vdev->passive_reset_trigger, 1);
>> +else if (vdev->err_trigger)
>> +eventfd_signal(vdev->err_trigger, 1);
> 
> why is this chunk here? why not just do
> 
>   if (vdev->passive_reset_trigger)
>   eventfd_signal(vdev->passive_reset_trigger, 1);
> 
> without a fallback?
> 
> 

I thought it is one way of "passing maximum info to userspace and let it
decide."

-- 
Sincerely,
Cao jin

[Qemu-devel] [PATCH v2 2/3] vfio pci: new function to init AER capability

2017-03-22 Thread Cao jin

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c | 41 -
 hw/vfio/pci.h |  1 +
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..3d0d005 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1855,18 +1855,42 @@ out:
 return 0;
 }
 
-static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
+  int pos, uint16_t size, Error **errp)
+{
+PCIDevice *pdev = >pdev;
+uint32_t errcap;
+
+errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+/*
+ * The ability to record multiple headers is depending on
+ * the state of the Multiple Header Recording Capable bit and
+ * enabled by the Multiple Header Recording Enable bit.
+ */
+if ((errcap & PCI_ERR_CAP_MHRC) &&
+(errcap & PCI_ERR_CAP_MHRE)) {
+pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+} else {
+pdev->exp.aer_log.log_max = 0;
+}
+
+pcie_cap_deverr_init(pdev);
+return pcie_aer_init(pdev, cap_ver, pos, size, errp);
+}
+
+static int vfio_add_ext_cap(VFIOPCIDevice *vdev, Error **errp)
 {
 PCIDevice *pdev = >pdev;
 uint32_t header;
 uint16_t cap_id, next, size;
 uint8_t cap_ver;
 uint8_t *config;
+int ret = 0;
 
 /* Only add extended caps if we have them and the guest can see them */
 if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
 !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
-return;
+return 0;
 }
 
 /*
@@ -1915,6 +1939,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
PCI_EXT_CAP_NEXT_MASK);
 
 switch (cap_id) {
+case PCI_EXT_CAP_ID_ERR:
+ret = vfio_setup_aer(vdev, cap_ver, next, size, errp);
+break;
 case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
 case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
 trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
@@ -1923,6 +1950,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pcie_add_capability(pdev, cap_id, cap_ver, next, size);
 }
 
+if (ret) {
+goto out;
+}
 }
 
 /* Cleanup chain head ID if necessary */
@@ -1930,8 +1960,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
 }
 
+out:
 g_free(config);
-return;
+return ret;
 }
 
 static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
@@ -1949,8 +1980,8 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, 
Error **errp)
 return ret;
 }
 
-vfio_add_ext_cap(vdev);
-return 0;
+ret = vfio_add_ext_cap(vdev, errp);
+return ret;
 }
 
 static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index a8366bb..34e8b04 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
 #include "qemu-common.h"
 #include "exec/memory.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
 #include "hw/vfio/vfio-common.h"
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 1/3] pcie aer: verify if AER functionality is available

2017-03-22 Thread Cao jin

For devices which support AER function, verify it can work or not in the
system:
1. AER capable device is a PCIe device, it can't be plugged into PCI bus
2. If root port doesn't support AER, then there is no need to expose the
   AER capability

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/pcie_aer.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index daf1f65..a2e9818 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -100,6 +100,34 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
 int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
   uint16_t size, Error **errp)
 {
+PCIDevice *parent_dev;
+uint8_t type;
+uint8_t parent_type;
+
+/* Topology test: see if there is need to expose AER cap */
+type = pcie_cap_get_type(dev);
+parent_dev = pci_bridge_get_device(dev->bus);
+while (parent_dev) {
+parent_type = pcie_cap_get_type(parent_dev);
+
+if (type == PCI_EXP_TYPE_ENDPOINT &&
+(parent_type != PCI_EXP_TYPE_ROOT_PORT &&
+ parent_type != PCI_EXP_TYPE_DOWNSTREAM)) {
+error_setg(errp, "Parent device is not a PCIe component");
+return -ENOTSUP;
+}
+
+if (parent_type == PCI_EXP_TYPE_ROOT_PORT) {
+if (!parent_dev->exp.aer_cap)
+{
+error_setg(errp, "Root port does not support AER");
+return -ENOTSUP;
+}
+}
+
+parent_dev = pci_bridge_get_device(parent_dev->bus);
+}
+
 pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, cap_ver,
 offset, size);
 dev->exp.aer_cap = offset;
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 3/3] vfio-pci: process non fatal error of AER

2017-03-22 Thread Cao jin

Make use of the non fatal error eventfd that the kernel module provide
to process the AER non fatal error. Fatal error still goes into the
legacy way which results in VM stop.

Register the handler, wait for notification. Construct aer message and
pass it to root port on notification. Root port will trigger an interrupt
to signal guest, then guest driver will do the recovery.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c  | 247 +
 hw/vfio/pci.h  |   4 +
 linux-headers/linux/vfio.h |   2 +
 3 files changed, 253 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3d0d005..4912bc6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2422,6 +2422,34 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
  "Could not enable error recovery for the device",
  vbasedev->name);
 }
+
+irq_info.index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
+irq_info.count = 0; /* clear */
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
+if (ret) {
+/* This can fail for an old kernel or legacy PCI dev */
+trace_vfio_populate_device_get_irq_info_failure();
+} else if (irq_info.count == 1) {
+vdev->pci_aer_non_fatal = true;
+} else {
+error_report(WARN_PREFIX
+ "Couldn't enable non fatal error recovery for the device",
+ vbasedev->name);
+}
+
+irq_info.index = VFIO_PCI_PASSIVE_RESET_IRQ_INDEX;
+irq_info.count = 0; /* clear */
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
+if (ret) {
+/* This can fail for an old kernel or legacy PCI dev */
+trace_vfio_populate_device_get_irq_info_failure();
+} else if (irq_info.count == 1) {
+vdev->passive_reset = true;
+} else {
+error_report(WARN_PREFIX
+ "Don't support passive reset notification",
+ vbasedev->name);
+}
 }
 
 static void vfio_put_device(VFIOPCIDevice *vdev)
@@ -2432,6 +2460,221 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
 vfio_put_base_device(>vbasedev);
 }
 
+static void vfio_non_fatal_err_notifier_handler(void *opaque)
+{
+VFIOPCIDevice *vdev = opaque;
+PCIDevice *dev = >pdev;
+PCIEAERMsg msg = {
+.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
+.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+};
+
+if (!event_notifier_test_and_clear(>non_fatal_err_notifier)) {
+return;
+}
+
+/* Populate the aer msg and send it to root port */
+if (dev->exp.aer_cap) {
+uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+uint32_t uncor_status;
+bool isfatal;
+
+uncor_status = vfio_pci_read_config(dev,
+dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
+if (!uncor_status) {
+return;
+}
+
+isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+if (isfatal) {
+goto stop;
+}
+
+error_report("%s sending non fatal event to root port. uncor status = "
+ "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
+pcie_aer_msg(dev, );
+return;
+}
+
+stop:
+/* Terminate the guest in case of fatal error */
+error_report("%s(%s) fatal error detected. Please collect any data"
+" possible and then kill the guest", __func__, 
vdev->vbasedev.name);
+vm_stop(RUN_STATE_INTERNAL_ERROR);
+}
+
+/*
+ * Register non fatal error notifier for devices supporting error recovery.
+ * If we encounter a failure in this function, we report an error
+ * and continue after disabling error recovery support for the device.
+ */
+static void vfio_register_non_fatal_err_notifier(VFIOPCIDevice *vdev)
+{
+int ret;
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+
+if (!vdev->pci_aer_non_fatal) {
+return;
+}
+
+if (event_notifier_init(>non_fatal_err_notifier, 0)) {
+error_report("vfio: Unable to init event notifier for non-fatal error 
detection");
+vdev->pci_aer_non_fatal = false;
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)_set->data;
+
+*pfd = event_notifier_get_fd(>non_fatal_err_notifier);
+qemu_set_fd_handler(*pfd, vfio_non_fatal_err_notif

[Qemu-devel] [PATCH v2 0/3] vfio-pci: support recovery of AER non fatal error

2017-03-22 Thread Cao jin

v2 changelog:
Add the boilerplate code for new eventfd in patch 3. The corresponding
kernel patch is v5.

Test:
Test with func1 passthroughed while func0 doesn't have user.

Cao jin (3):
  pcie aer: verify if AER functionality is available
  vfio pci: new function to init AER capability
  vfio-pci: process non fatal error of AER

 hw/pci/pcie_aer.c  |  28 +
 hw/vfio/pci.c  | 288 -
 hw/vfio/pci.h  |   5 +
 linux-headers/linux/vfio.h |   2 +
 4 files changed, 318 insertions(+), 5 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v5] vfio error recovery: kernel support

2017-03-22 Thread Cao jin

From: "Michael S. Tsirkin" <m...@redhat.com>

0. What happens now (PCIE AER only)
   Fatal errors cause a link reset. Non fatal errors don't.
   All errors stop the QEMU guest eventually, but not immediately,
   because it's detected and reported asynchronously.
   Interrupts are forwarded as usual.
   Correctable errors are not reported to user at all.

   Note:
   PPC EEH is different, but this approach won' affect EEH, because
   EEH treat all errors as fatal ones in AER, will still signal user
   via the legacy eventfd. And all devices/functions in a PE belongs to
   the same IOMMU group, so the slot_reset handler in this approach
   won't affect EEH either.

1. Correctable errors
   Hardware can correct these errors without software intervention,
   clear the error status is enough, this is what already done now.
   No need to recover it, nothing changed, leave it as it is.

2. Fatal errors
   They will induce a link reset. This is troublesome when user is
   a QEMU guest. This approach doens't touch the existing mechanism.

3. Non-fatal errors
   Before, they are signalled to user the same as fatal ones. In this approach,
   a new eventfd is introduced only for non-fatal error notification. By
   splitting non-fatal ones out, it will benefit AER recovery of a QEMU guest
   user by reporting them to guest saparately.

   To maintain backwards compatibility with userspace, non-fatal errors
   will continue to trigger via the existing error interrupt index if a
   non-fatal signaling mechanism has not been registered.

   Note:
   In case of a multi-function device which has different device driver
   for each of them, and one of the functions is bound to vfio while
   others doesn't(i.e., functions belong to different IOMMU group), a new
   slot_reset handler & another new eventfd are introduced. This is
   useful when device driver wants a slot reset while vfio-pci doesn't,
   which means vfio-pci device will got a passive reset. Signal user
   via another new eventfd names passive_reset_trigger, this helps to
   avoid signalling user twice via the same legacy error trigger.

For the original design and discussion, refer:
https://www.spinics.net/lists/linux-virtualization/msg29843.html


Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---

v5 changelog:
1. Add another new eventfd passive_reset_trigger & the boilerplate code,
   used in slot_reset. Add comment for slot_reset().
2. Rewrite the commit log.

 drivers/vfio/pci/vfio_pci.c | 49 +++--
 drivers/vfio/pci/vfio_pci_intrs.c   | 38 
 drivers/vfio/pci/vfio_pci_private.h |  2 ++
 include/uapi/linux/vfio.h   |  2 ++
 4 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e..375ba20 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -441,7 +441,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
 
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
-   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
+  irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX ||
+  irq_type == VFIO_PCI_PASSIVE_RESET_IRQ_INDEX) {
if (pci_is_pcie(vdev->pdev))
return 1;
} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
@@ -796,6 +798,8 @@ static long vfio_pci_ioctl(void *device_data,
case VFIO_PCI_REQ_IRQ_INDEX:
break;
case VFIO_PCI_ERR_IRQ_INDEX:
+   case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
+   case VFIO_PCI_PASSIVE_RESET_IRQ_INDEX:
if (pci_is_pcie(vdev->pdev))
break;
/* pass thru to return error */
@@ -1282,7 +1286,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
 
mutex_lock(>igate);
 
-   if (vdev->err_trigger)
+   if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
+   eventfd_signal(vdev->non_fatal_err_trigger, 1);
+   else if (vdev->err_trigger)
eventfd_signal(vdev->err_trigger, 1);
 
mutex_unlock(>igate);
@@ -1292,8 +1298,47 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+/*
+ * In case of a function/device is bound to vfio, while other collateral ones
+ * are still controlled by device driver(i.e., they belongs to different iommu
+ * group), and device driver want a slot reset when seeing AER errors while
+ * vfio pci doesn't, signal user via with proprietary eventfd in precedence to
+ * the legacy one.
+ */
+static pci_ers_result_t vfio_pci_a

Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error

2017-03-21 Thread Cao jin



On 03/20/2017 10:30 PM, Alex Williamson wrote:
> On Mon, 20 Mar 2017 20:50:39 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> Sorry for late.
>>
>> On 03/14/2017 06:06 AM, Alex Williamson wrote:
>>> On Mon, 27 Feb 2017 15:28:43 +0800
>>> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
>>>   
>>>> 0. What happens now (PCIE AER only)
>>>>Fatal errors cause a link reset.
>>>>Non fatal errors don't.
>>>>All errors stop the VM eventually, but not immediately
>>>>because it's detected and reported asynchronously.
>>>>Interrupts are forwarded as usual.
>>>>Correctable errors are not reported to guest at all.
>>>>Note: PPC EEH is different. This focuses on AER.  
>>>
>>> Perhaps you're only focusing on AER, but don't the error handlers we're
>>> using support both AER and EEH generically?  I don't think we can
>>> completely disregard how this affects EEH behavior, if at all.
>>>   
>>
>> After taking a rough look at the EEH,  find that EEH always feed
>> error_detected with pci_channel_io_frozen, from perspective of
>> error_detected, EEH is not affected.  
>>
>> I am not sure about a question: when assign devices in spapr host,
>> should all functions/devices in a PE be bound to vfio? I am kind of
>> confused about the relationship between a PE & a tce iommu group
> 
> AIUI, yes all devices within the PE are part of the same IOMMU group
> and therefore all endpoints must be bound to vfio or pci-stub.
> 

Thanks. Then I think this approach won't affect EEH. I was considering
the same issue you mentioned for slot_reset may affect EEH, but if they
all must be bound to vfio, seems the issue won't happen to EEH.

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error

2017-03-20 Thread Cao jin

Sorry for late.

On 03/14/2017 06:06 AM, Alex Williamson wrote:
> On Mon, 27 Feb 2017 15:28:43 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> 0. What happens now (PCIE AER only)
>>Fatal errors cause a link reset.
>>Non fatal errors don't.
>>All errors stop the VM eventually, but not immediately
>>because it's detected and reported asynchronously.
>>Interrupts are forwarded as usual.
>>Correctable errors are not reported to guest at all.
>>Note: PPC EEH is different. This focuses on AER.
> 
> Perhaps you're only focusing on AER, but don't the error handlers we're
> using support both AER and EEH generically?  I don't think we can
> completely disregard how this affects EEH behavior, if at all.
> 

After taking a rough look at the EEH,  find that EEH always feed
error_detected with pci_channel_io_frozen, from perspective of
error_detected, EEH is not affected.  

I am not sure about a question: when assign devices in spapr host,
should all functions/devices in a PE be bound to vfio? I am kind of
confused about the relationship between a PE & a tce iommu group

>>
>> 1. Correctable errors
>>There is no need to report these to guest. So let's not.
> 
> What does this patch change to make this happen?  I don't see
> anything.  Was this always the case?  No change?
> 

yes, no change on correctable error.

>>
>> 2. Fatal errors
>>It's not easy to handle them gracefully since link reset
>>is needed. As a first step, let's use the existing mechanism
>>in that case.
> 
> Ok, so no change here either.
> 
>> 2. Non-fatal errors
>>Here we could make progress by reporting them to guest
>>and have guest handle them.
> 
> In practice, what actual errors do we expect userspace to see as
> non-fatal errors? It would be useful for the commit log to describe
> the actual benefit we're going to see by splitting out non-fatal errors
> for the user (not always a guest) to see separately.  Justify that this
> is actually useful.
> 
>>
>>Issues:
>>a. this behaviour should only be enabled with new userspace,
>>   old userspace should work without changes.
>>
>>   Suggestion: One way to address this would be to add a new eventfd
>>   non_fatal_err_trigger. If not set, invoke err_trigger.
> 
> This outline format was really more useful for Michael to try to
> generate discussion, for a commit log, I'd much rather see a definitive
> statement such as:
> 
>  "To maintain backwards compatibility with userspace, non-fatal errors
>  will continue to trigger via the existing error interrupt index if a
>  non-fatal signaling mechanism has not been registered."
> 
>>b. drivers are supposed to stop MMIO when error is reported,
>>   if vm keeps going, we will keep doing MMIO/config.
>>
>>   Suggestion 1: ignore this. vm stop happens much later when
>>   userspace runs anyway, so we are not making things much worse.
>>
>>   Suggestion 2: try to stop MMIO/config, resume on resume call
>>
>>   Patch below implements Suggestion 1.
>>
>>   Note that although this is really against the documentation, which
>>   states error_detected() is the point at which the driver should quiesce
>>   the device and not touch it further (until diagnostic poking at
>>   mmio_enabled or full access at resume callback).
>>
>>   Fixing this won't be easy. However, this is not a regression.
>>
>>   Also note this does nothing about interrupts, documentation
>>   suggests returning IRQ_NONE until reset.
>>   Again, not a regression.
> 
> So again, no change here.  I'm not sure what this adds to the commit
> log, perhaps we can reference this as a link to Michael's original
> proposal.
>  
>>c. PF driver might detect that function is completely broken,
>>   if vm keeps going, we will keep doing MMIO/config.
>>
>>   Suggestion 1: ignore this. vm stop happens much later when
>>   userspace runs anyway, so we are not making things much worse.
>>
>>   Suggestion 2: detect this and invoke err_trigger to stop VM.
>>
>>   Patch below implements Suggestion 2.
> 
> This needs more description and seems a bit misleading.  This patch
> adds a slot_reset handler, such that if the slot is reset, we notify
> the user, essentially promoting the non-fatal error to fatal.  But what
> condition gets us to this point?  AIUI, AER is a voting scheme and if
> any driver affected says they need a reset, everyone gets a reset.  So
> the PF driver we're tal

Re: [Qemu-devel] [PATCH 0/3] vfio-pci: support recovery of AER non fatal error

2017-03-07 Thread Cao jin

ping

On 02/27/2017 03:30 PM, Cao jin wrote:
> This is nearly new design of the feature, so re-number the verion from 0.
> 
> About The test:
> Hardware problem(unsteady) still occurs like before. The test server is in
> another country spot A, and my contact of the country located spot B, so
> it is not quite convenient to find help(plug cable, or check the hardware).
> So, my NIC(has 2 functions) still just has func1 connected to gateway.
> If there is other people who has the hardware could test the patches, that
> would be great help.
> 
> 
> Basically, there are two phenomenon of unsteady hardware:
> 1. Start vm, the hardware emit fatal error itself before I did anything,
>cause vm stop.
> 2. Start vm, assign IP to func1, then ping the gateway, it will show
>"Destination Host Unreachable" after dozens of or hundreds of successful
>ping, and guest dmesg shows nothing abnormal.  I think this phenomenon is
>the *strong evidence* of saying unsteady hardware, I speculate that
>the cable has problem.
> 
>on the opposite, I also saw perfect result 2 times in my numerous tests,
>which just assign func1 while func0 has no user. It can ping several 
> housrs(
>more than 15000 times ping) withtout any problem, during the period, inject
>non fatal error to func0 & func1, error recovery is very good.
> 
>So, most of time, I must do the test quickly before the hardware goes 
> crazy,
>until get what I expected.
> 
> 
> Test:
> scenario 1: assign func1 to vm while func0 has no user.
> scenario 2: assign both functions to 1 vm, with the same topology as host.
> scenario 3: assign both functions to 1 vm, under different bus.
> scenario 4: assign each function to a separate vm.
> 
> the steps is: assign IP to func1, ping the gateway, inject non fatal error to
> both functions, see if func1 still can ping after recovery.
> 
> Although we don't have cable for func0, but in the test like scenario 4,
> inject to func0, it doesn't affect func1's recovery, so I think it can prove
> that one function's recovery doesn't affect another.
> 
> 
> Extra info FYI:
> 1. During the test, some debug lines are added in vfio_err_notifier_handler,
>read the uncor status register in this function when fatal error occured,
>it shows all F's every time.
> 2. Based on the v10 patch & the corresponding kernel part, modified as
>comments: revert the eventfd handling(don't signal uncor status), and
>guest link reset will induce the host link reset. The test result shows:
>non fatal error recovery is good; fatal error recovery has same result
>with what Alex find before(guest kernel crash), because guest device
>driver's error_detected() access the MMIO registers, get all F's.
> 
> 
> Cao jin (3):
>   pcie aer: verify if AER functionality is available
>   vfio pci: new function to init AER capability
>   vfio-pci: process non fatal error of AER
> 
>  hw/pci/pcie_aer.c  |  28 +++
>  hw/vfio/pci.c      | 180 
> +++--
>  hw/vfio/pci.h  |   3 +
>  linux-headers/linux/vfio.h |   1 +
>  4 files changed, 207 insertions(+), 5 deletions(-)
> 

-- 
Sincerely,
Cao jin

[Qemu-devel] [PATCH v10] msix: rename and create a wrapper

2017-03-07 Thread Cao jin

Rename msix_init to msix_validate_and_init which doesn't assert;
New a wrapper msix_init to assert programming error.

CC: Alex Williamson <alex.william...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
revert the modification in msix_init_exclusive_bar()

 hw/pci/msix.c | 23 ++-
 hw/vfio/pci.c | 12 ++--
 include/hw/pci/msix.h |  5 +
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index bb54e8b0ac37..02697de32dfc 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -239,6 +239,19 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
nentries)
 }
 }
 
+/* Just a wrapper to check the return value */
+int msix_init(struct PCIDevice *dev, unsigned short nentries,
+  MemoryRegion *table_bar, uint8_t table_bar_nr,
+  unsigned table_offset, MemoryRegion *pba_bar,
+  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
+  Error **errp)
+{
+int ret = msix_validate_and_init(dev, nentries, table_bar, table_bar_nr,
+table_offset, pba_bar, pba_bar_nr, pba_offset, cap_pos, errp);
+
+assert(ret != -EINVAL);
+return ret;
+}
 /*
  * Make PCI device @dev MSI-X capable
  * @nentries is the max number of MSI-X vectors that the device support.
@@ -259,11 +272,11 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
nentries)
  * also means a programming error, except device assignment, which can check
  * if a real HW is broken.
  */
-int msix_init(struct PCIDevice *dev, unsigned short nentries,
-  MemoryRegion *table_bar, uint8_t table_bar_nr,
-  unsigned table_offset, MemoryRegion *pba_bar,
-  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
-  Error **errp)
+int msix_validate_and_init(struct PCIDevice *dev, unsigned short nentries,
+   MemoryRegion *table_bar, uint8_t table_bar_nr,
+   unsigned table_offset, MemoryRegion *pba_bar,
+   uint8_t pba_bar_nr, unsigned pba_offset,
+   uint8_t cap_pos, Error **errp)
 {
 int cap;
 unsigned table_size, pba_size;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d6627f..06828b537a75 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1436,12 +1436,12 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int 
pos, Error **errp)
 
 vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) *
 sizeof(unsigned long));
-ret = msix_init(>pdev, vdev->msix->entries,
-vdev->bars[vdev->msix->table_bar].region.mem,
-vdev->msix->table_bar, vdev->msix->table_offset,
-vdev->bars[vdev->msix->pba_bar].region.mem,
-vdev->msix->pba_bar, vdev->msix->pba_offset, pos,
-);
+ret = msix_validate_and_init(>pdev, vdev->msix->entries,
+ vdev->bars[vdev->msix->table_bar].region.mem,
+ vdev->msix->table_bar, vdev->msix->table_offset,
+ vdev->bars[vdev->msix->pba_bar].region.mem,
+ vdev->msix->pba_bar, vdev->msix->pba_offset, pos,
+ );
 if (ret < 0) {
 if (ret == -ENOTSUP) {
 error_report_err(err);
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 1f27658d352f..815e59bc96f3 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -11,6 +11,11 @@ int msix_init(PCIDevice *dev, unsigned short nentries,
   unsigned table_offset, MemoryRegion *pba_bar,
   uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
   Error **errp);
+int msix_validate_and_init(PCIDevice *dev, unsigned short nentries,
+   MemoryRegion *table_bar, uint8_t table_bar_nr,
+   unsigned table_offset, MemoryRegion *pba_bar,
+   uint8_t pba_bar_nr, unsigned pba_offset,
+   uint8_t cap_pos, Error **errp);
 int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
 uint8_t bar_nr, Error **errp);
 
-- 
2.1.0

Re: [Qemu-devel] [PATCH v10 1/8] msix: Rename and create a wrapper

2017-03-07 Thread Cao jin



On 03/07/2017 03:44 PM, Markus Armbruster wrote:
> Uh, two weeks since your posted this already.  I apologize for taking so
> long to review.
> 
> Cao jin <caoj.f...@cn.fujitsu.com> writes:
> 
>> Rename msix_init to msix_validate_and_init, and use it from vfio which
>> might get a reasonable -EINVAL.  New a wrapper msix_init which assert the
>> programming error for debug purpose and use it from other devices.
>>
>> CC: Alex Williamson <alex.william...@redhat.com>
>> CC: Markus Armbruster <arm...@redhat.com>
>> CC: Marcel Apfelbaum <mar...@redhat.com>
>> CC: Michael S. Tsirkin <m...@redhat.com>
>>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
>>  hw/pci/msix.c | 30 +-
>>  hw/vfio/pci.c | 12 ++--
>>  include/hw/pci/msix.h |  5 +
>>  3 files changed, 32 insertions(+), 15 deletions(-)
>>
>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>> index bb54e8b0ac37..2b7541ab2c8d 100644
>> --- a/hw/pci/msix.c
>> +++ b/hw/pci/msix.c
>> @@ -239,6 +239,19 @@ static void msix_mask_all(struct PCIDevice *dev, 
>> unsigned nentries)
>>  }
>>  }
>>  
>> +/* Just a wrapper to check the return value */
>> +int msix_init(struct PCIDevice *dev, unsigned short nentries,
>> +  MemoryRegion *table_bar, uint8_t table_bar_nr,
>> +  unsigned table_offset, MemoryRegion *pba_bar,
>> +  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
>> +  Error **errp)
>> +{
>> +int ret = msix_validate_and_init(dev, nentries, table_bar, table_bar_nr,
>> +table_offset, pba_bar, pba_bar_nr, pba_offset, cap_pos, errp);
>> +
>> +assert(ret != -EINVAL);
>> +return ret;
>> +}
>>  /*
>>   * Make PCI device @dev MSI-X capable
>>   * @nentries is the max number of MSI-X vectors that the device support.
>> @@ -259,11 +272,11 @@ static void msix_mask_all(struct PCIDevice *dev, 
>> unsigned nentries)
>>   * also means a programming error, except device assignment, which can check
>>   * if a real HW is broken.
>>   */
>> -int msix_init(struct PCIDevice *dev, unsigned short nentries,
>> -  MemoryRegion *table_bar, uint8_t table_bar_nr,
>> -  unsigned table_offset, MemoryRegion *pba_bar,
>> -  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
>> -  Error **errp)
>> +int msix_validate_and_init(struct PCIDevice *dev, unsigned short nentries,
>> +   MemoryRegion *table_bar, uint8_t table_bar_nr,
>> +   unsigned table_offset, MemoryRegion *pba_bar,
>> +   uint8_t pba_bar_nr, unsigned pba_offset,
>> +   uint8_t cap_pos, Error **errp)
>>  {
>>  int cap;
>>  unsigned table_size, pba_size;
>> @@ -361,10 +374,9 @@ int msix_init_exclusive_bar(PCIDevice *dev, unsigned 
>> short nentries,
>>  memory_region_init(>msix_exclusive_bar, OBJECT(dev), name, 
>> bar_size);
>>  g_free(name);
>>  
>> -ret = msix_init(dev, nentries, >msix_exclusive_bar, bar_nr,
>> -0, >msix_exclusive_bar,
>> -bar_nr, bar_pba_offset,
>> -0, errp);
>> +ret = msix_validate_and_init(dev, nentries, >msix_exclusive_bar,
>> + bar_nr, 0, >msix_exclusive_bar,
>> + bar_nr, bar_pba_offset, 0, errp);
>>  if (ret) {
>>  return ret;
>>  }
> 
> This change assumes that for the callers of msix_exclusive_bar(),
> -EINVAL (capability overlap) is not a programming error.  Is that true?
> 

Oh...it looks as you said. Will revert this part and send a new one
in-reply-to this one.

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v10 0/8] the reset of msix_init series

2017-03-06 Thread Cao jin

Michael,
Is this series ok for 2.9?

-- 
Sincerely,
Cao jin

On 02/25/2017 04:26 PM, Cao jin wrote:
> v10 changelog:
> 1. drop the unliked patch, introduce a new patch 1 according to mst's 
> comments.
> 2. base on the new patch, remove the following statements
> 
> /* Any error other than -ENOTSUP(board's MSI support is broken)
>  * is a programming error */
> assert(!ret || ret == -ENOTSUP);
> 
>for the affected device: megasas, hcd-xhci. This is trivial changes,
>so I left the R-bs where it was.
> 
> Test:
> 1. Detailed test via command line as v9
> 2. make check hangs at: GTESTER check-qtest-x86_64. After ctrl-C, it says:
> 
> make: *** [check-qtest-x86_64] Interrupt
> qemu-system-x86_64: Failed to read msg header. Read -1 instead of 12. 
> Original request 11.
> qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Input/output 
> error (5)
> qemu-system-x86_64: Failed to set msg fds.
> qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Invalid argument 
> (22)
> 
> qemu-system-x86_64: Failed to set msg fds.
> qemu-system-x86_64: vhost VQ 1 ring restore failed: -1: Invalid argument 
> (22)
> 
>Is it a regresstion or I missed something?
> 
> CC: Jason Wang <jasow...@redhat.com>
> CC: Gerd Hoffmann <kra...@redhat.com>
> CC: Dmitry Fleytman <dmi...@daynix.com>
> CC: Michael S. Tsirkin <m...@redhat.com>
> CC: Hannes Reinecke <h...@suse.de>
> CC: Paolo Bonzini <pbonz...@redhat.com>
> CC: Alex Williamson <alex.william...@redhat.com>
> CC: Markus Armbruster <arm...@redhat.com>
> CC: Marcel Apfelbaum <mar...@redhat.com>
> 
> Cao jin (8):
>   msix: Rename and create a wrapper
>   megasas: change behaviour of msix switch
>   hcd-xhci: change behaviour of msix switch
>   megasas: undo the overwrites of msi user configuration
>   vmxnet3: fix reference leak issue
>   vmxnet3: remove unnecessary internal msix flag
>   msi_init: convert assert to return -errno
>   megasas: remove unnecessary megasas_use_msix()
> 
>  hw/net/vmxnet3.c  | 40 +++-
>  hw/pci/msi.c  |  9 ++---
>  hw/pci/msix.c | 30 +-
>  hw/scsi/megasas.c | 48 +---
>  hw/usb/hcd-xhci.c | 29 +
>  hw/vfio/pci.c | 12 ++--
>  include/hw/pci/msix.h |  5 +
>  7 files changed, 99 insertions(+), 74 deletions(-)
>

[Qemu-devel] [PATCH v4] vfio error recovery: kernel support

2017-02-27 Thread Cao jin

From: "Michael S. Tsirkin" <m...@redhat.com>

0. What happens now (PCIE AER only)
   Fatal errors cause a link reset.
   Non fatal errors don't.
   All errors stop the VM eventually, but not immediately
   because it's detected and reported asynchronously.
   Interrupts are forwarded as usual.
   Correctable errors are not reported to guest at all.
   Note: PPC EEH is different. This focuses on AER.

1. Correctable errors
   There is no need to report these to guest. So let's not.

2. Fatal errors
   It's not easy to handle them gracefully since link reset
   is needed. As a first step, let's use the existing mechanism
   in that case.

2. Non-fatal errors
   Here we could make progress by reporting them to guest
   and have guest handle them.

   Issues:
   a. this behaviour should only be enabled with new userspace,
  old userspace should work without changes.

  Suggestion: One way to address this would be to add a new eventfd
  non_fatal_err_trigger. If not set, invoke err_trigger.

   b. drivers are supposed to stop MMIO when error is reported,
  if vm keeps going, we will keep doing MMIO/config.

  Suggestion 1: ignore this. vm stop happens much later when
  userspace runs anyway, so we are not making things much worse.

  Suggestion 2: try to stop MMIO/config, resume on resume call

  Patch below implements Suggestion 1.

  Note that although this is really against the documentation, which
  states error_detected() is the point at which the driver should quiesce
  the device and not touch it further (until diagnostic poking at
  mmio_enabled or full access at resume callback).

  Fixing this won't be easy. However, this is not a regression.

  Also note this does nothing about interrupts, documentation
  suggests returning IRQ_NONE until reset.
  Again, not a regression.

   c. PF driver might detect that function is completely broken,
  if vm keeps going, we will keep doing MMIO/config.

  Suggestion 1: ignore this. vm stop happens much later when
  userspace runs anyway, so we are not making things much worse.

  Suggestion 2: detect this and invoke err_trigger to stop VM.

  Patch below implements Suggestion 2.

Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
Inherit from MST's v3.
v4 changelog:
1. Tiny modification on commit message.
2. Remove unrelated virtio code.
3. VFIO_DEVICE_SET_IRQS process for VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX

 drivers/vfio/pci/vfio_pci.c | 38 +++--
 drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h   |  1 +
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e..3551cc9 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
 
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
-   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
+  irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
if (pci_is_pcie(vdev->pdev))
return 1;
} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
@@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
case VFIO_PCI_REQ_IRQ_INDEX:
break;
case VFIO_PCI_ERR_IRQ_INDEX:
+   case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
if (pci_is_pcie(vdev->pdev))
break;
/* pass thru to return error */
@@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
 
mutex_lock(>igate);
 
-   if (vdev->err_trigger)
+   if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
+   eventfd_signal(vdev->non_fatal_err_trigger, 1);
+   else if (vdev->err_trigger)
eventfd_signal(vdev->err_trigger, 1);
 
mutex_unlock(>igate);
@@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
+{
+   struct vfio_pci_device *vdev;
+   struct vfio_device *device;
+   static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
+
+   device = vfio_device_get_from_dev(>dev);
+   if (!device)
+   goto err_dev;
+
+   vdev = vfio_device_data(device);
+   if (!vdev)
+   goto err_data;
+
+   mutex_lock(>igate);
+
+   if (vdev->err_trigger)
+   eventfd_signal(vdev->err_trigger, 1);
+
+   mu

Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error

2017-02-27 Thread Cao jin



On 02/28/2017 12:16 AM, Michael S. Tsirkin wrote:
> On Mon, Feb 27, 2017 at 03:28:43PM +0800, Cao jin wrote:
>> Subject: Re: [PATCH] vfio pci: kernel support of error recovery only for non
>> fatal error
> 
> Don't make the subject so long. This is why I had
>   [PATCH v3] vfio error recovery: kernel support
> you also want to add versioning as you inherited my v3,
> you should make this v4 etc.
> 

Ok. I didn' see [PATCH v3], I guess I am not CCed.

>> 0. What happens now (PCIE AER only)
>>Fatal errors cause a link reset.
>>Non fatal errors don't.
>>All errors stop the VM eventually, but not immediately
>>because it's detected and reported asynchronously.
>>Interrupts are forwarded as usual.
>>Correctable errors are not reported to guest at all.
>>Note: PPC EEH is different. This focuses on AER.
>>
>> 1. Correctable errors
>>There is no need to report these to guest. So let's not.
>>
>> 2. Fatal errors
>>It's not easy to handle them gracefully since link reset
>>is needed. As a first step, let's use the existing mechanism
>>in that case.
>>
>> 2. Non-fatal errors
>>Here we could make progress by reporting them to guest
>>and have guest handle them.
>>
>>Issues:
>>a. this behaviour should only be enabled with new userspace,
>>   old userspace should work without changes.
>>
>>   Suggestion: One way to address this would be to add a new eventfd
>>   non_fatal_err_trigger. If not set, invoke err_trigger.
>>
>>b. drivers are supposed to stop MMIO when error is reported,
>>   if vm keeps going, we will keep doing MMIO/config.
>>
>>   Suggestion 1: ignore this. vm stop happens much later when
>>   userspace runs anyway, so we are not making things much worse.
>>
>>   Suggestion 2: try to stop MMIO/config, resume on resume call
>>
>>   Patch below implements Suggestion 1.
>>
>>   Note that although this is really against the documentation, which
>>   states error_detected() is the point at which the driver should quiesce
>>   the device and not touch it further (until diagnostic poking at
>>   mmio_enabled or full access at resume callback).
>>
>>   Fixing this won't be easy. However, this is not a regression.
>>
>>   Also note this does nothing about interrupts, documentation
>>   suggests returning IRQ_NONE until reset.
>>   Again, not a regression.
>>
>>c. PF driver might detect that function is completely broken,
>>   if vm keeps going, we will keep doing MMIO/config.
>>
>>   Suggestion 1: ignore this. vm stop happens much later when
>>   userspace runs anyway, so we are not making things much worse.
>>
>>   Suggestion 2: detect this and invoke err_trigger to stop VM.
>>
>>   Patch below implements Suggestion 2.
>>
>> Suggested-by: Michael S. Tsirkin <m...@redhat.com>
> 
> It's more than this, you are really reusing parts of my patch,
> so you should say so and include my signature.
> 
> If you only added a line or two you can keep
> the original author. To do this you add
>   From: Michael S. Tsirkin <m...@redhat.com>
> before commit text.

On this topic, I am really bewildered for a while, thanks for clarification.

-- 
Sincerely,
Cao jin

> 
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
> 
> Changelog from my v3?
> 
>>  drivers/vfio/pci/vfio_pci.c | 38 
>> +++--
>>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++
>>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>>  include/uapi/linux/vfio.h   |  1 +
>>  4 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 324c52e..3551cc9 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
>> *vdev, int irq_type)
>>  
>>  return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>>  }
>> -} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
>> +} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
>> +   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>>  if (pci_is_pcie(vdev->pdev))
>>  return 1;
>>  } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>> @@ -796,6 +797,7 @@ s

[Qemu-devel] [PATCH 2/3] vfio pci: new function to init AER capability

2017-02-26 Thread Cao jin

Enable AER opportunistically.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c | 41 -
 hw/vfio/pci.h |  1 +
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..3d0d005 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1855,18 +1855,42 @@ out:
 return 0;
 }
 
-static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
+  int pos, uint16_t size, Error **errp)
+{
+PCIDevice *pdev = >pdev;
+uint32_t errcap;
+
+errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+/*
+ * The ability to record multiple headers is depending on
+ * the state of the Multiple Header Recording Capable bit and
+ * enabled by the Multiple Header Recording Enable bit.
+ */
+if ((errcap & PCI_ERR_CAP_MHRC) &&
+(errcap & PCI_ERR_CAP_MHRE)) {
+pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+} else {
+pdev->exp.aer_log.log_max = 0;
+}
+
+pcie_cap_deverr_init(pdev);
+return pcie_aer_init(pdev, cap_ver, pos, size, errp);
+}
+
+static int vfio_add_ext_cap(VFIOPCIDevice *vdev, Error **errp)
 {
 PCIDevice *pdev = >pdev;
 uint32_t header;
 uint16_t cap_id, next, size;
 uint8_t cap_ver;
 uint8_t *config;
+int ret = 0;
 
 /* Only add extended caps if we have them and the guest can see them */
 if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
 !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
-return;
+return 0;
 }
 
 /*
@@ -1915,6 +1939,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
PCI_EXT_CAP_NEXT_MASK);
 
 switch (cap_id) {
+case PCI_EXT_CAP_ID_ERR:
+ret = vfio_setup_aer(vdev, cap_ver, next, size, errp);
+break;
 case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
 case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
 trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
@@ -1923,6 +1950,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pcie_add_capability(pdev, cap_id, cap_ver, next, size);
 }
 
+if (ret) {
+goto out;
+}
 }
 
 /* Cleanup chain head ID if necessary */
@@ -1930,8 +1960,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
 }
 
+out:
 g_free(config);
-return;
+return ret;
 }
 
 static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
@@ -1949,8 +1980,8 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, 
Error **errp)
 return ret;
 }
 
-vfio_add_ext_cap(vdev);
-return 0;
+ret = vfio_add_ext_cap(vdev, errp);
+return ret;
 }
 
 static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index a8366bb..34e8b04 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
 #include "qemu-common.h"
 #include "exec/memory.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
 #include "hw/vfio/vfio-common.h"
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
-- 
1.8.3.1

[Qemu-devel] [PATCH 3/3] vfio-pci: process non fatal error of AER

2017-02-26 Thread Cao jin

Make use of the non fatal error eventfd that the kernel module provide
to process the AER non fatal error. Fatal error still goes into the
legacy way which results in VM stop.

Register the handler, wait for notification. Construct aer message and
pass it to root port on notification. Root port will trigger an interrupt
to signal guest, then, the guest driver will do the recovery.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c  | 139 +
 hw/vfio/pci.h  |   2 +
 linux-headers/linux/vfio.h |   1 +
 3 files changed, 142 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3d0d005..55c6e05 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2422,6 +2422,21 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
  "Could not enable error recovery for the device",
  vbasedev->name);
 }
+
+irq_info.index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
+irq_info.count = 0; /* clear */
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
+if (ret) {
+/* This can fail for an old kernel or legacy PCI dev */
+trace_vfio_populate_device_get_irq_info_failure();
+} else if (irq_info.count == 1) {
+vdev->pci_aer_non_fatal = true;
+} else {
+error_report(WARN_PREFIX
+ "Couldn't enable non fatal error recovery for the device",
+ vbasedev->name);
+}
+
 }
 
 static void vfio_put_device(VFIOPCIDevice *vdev)
@@ -2432,6 +2447,128 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
 vfio_put_base_device(>vbasedev);
 }
 
+static void vfio_non_fatal_err_notifier_handler(void *opaque)
+{
+VFIOPCIDevice *vdev = opaque;
+PCIDevice *dev = >pdev;
+PCIEAERMsg msg = {
+.severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
+.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+};
+
+if (!event_notifier_test_and_clear(>non_fatal_err_notifier)) {
+return;
+}
+
+/* Populate the aer msg and send it to root port */
+if (dev->exp.aer_cap) {
+uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+uint32_t uncor_status;
+bool isfatal;
+
+uncor_status = vfio_pci_read_config(dev,
+dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
+if (!uncor_status) {
+return;
+}
+
+isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+if (isfatal) {
+goto stop;
+}
+
+error_report("%s sending non fatal event to root port. uncor status = "
+ "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
+pcie_aer_msg(dev, );
+return;
+}
+
+stop:
+/* Terminate the guest in case of fatal error */
+error_report("%s(%s) fatal error detected. Please collect any data"
+" possible and then kill the guest", __func__, 
vdev->vbasedev.name);
+vm_stop(RUN_STATE_INTERNAL_ERROR);
+}
+
+/*
+ * Register non fatal error notifier for devices supporting error recovery.
+ * If we encounter a failure in this function, we report an error
+ * and continue after disabling error recovery support for the device.
+ */
+static void vfio_register_non_fatal_err_notifier(VFIOPCIDevice *vdev)
+{
+int ret;
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+
+if (!vdev->pci_aer_non_fatal) {
+return;
+}
+
+if (event_notifier_init(>non_fatal_err_notifier, 0)) {
+error_report("vfio: Unable to init event notifier for non-fatal error 
detection");
+vdev->pci_aer_non_fatal = false;
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)_set->data;
+
+*pfd = event_notifier_get_fd(>non_fatal_err_notifier);
+qemu_set_fd_handler(*pfd, vfio_non_fatal_err_notifier_handler, NULL, vdev);
+
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+error_report("vfio: Failed to set up non-fatal error notification");
+qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
+event_notifier_cleanup(>non_fatal_err_notifier);
+vdev->pci_aer_non_fatal = false;
+}
+g_free(irq_set);
+}
+
+static void vfio_unregister_non_fatal_err_notifier(VFIOPCIDevice *vdev)
+{
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+int ret;
+
+i

[Qemu-devel] [PATCH 1/3] pcie aer: verify if AER functionality is available

2017-02-26 Thread Cao jin

For devices which support AER, verify it can work or not in the system:
1. AER capable device is a PCIe device, it can't be plugged into PCI bus
2. If root port doesn't support AER, then there is no need to expose the
   AER capability

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/pcie_aer.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index daf1f65..a2e9818 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -100,6 +100,34 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
 int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
   uint16_t size, Error **errp)
 {
+PCIDevice *parent_dev;
+uint8_t type;
+uint8_t parent_type;
+
+/* Topology test: see if there is need to expose AER cap */
+type = pcie_cap_get_type(dev);
+parent_dev = pci_bridge_get_device(dev->bus);
+while (parent_dev) {
+parent_type = pcie_cap_get_type(parent_dev);
+
+if (type == PCI_EXP_TYPE_ENDPOINT &&
+(parent_type != PCI_EXP_TYPE_ROOT_PORT &&
+ parent_type != PCI_EXP_TYPE_DOWNSTREAM)) {
+error_setg(errp, "Parent device is not a PCIe component");
+return -ENOTSUP;
+}
+
+if (parent_type == PCI_EXP_TYPE_ROOT_PORT) {
+if (!parent_dev->exp.aer_cap)
+{
+error_setg(errp, "Root port does not support AER");
+return -ENOTSUP;
+}
+}
+
+parent_dev = pci_bridge_get_device(parent_dev->bus);
+}
+
 pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, cap_ver,
 offset, size);
 dev->exp.aer_cap = offset;
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/3] vfio-pci: support recovery of AER non fatal error

2017-02-26 Thread Cao jin

This is nearly new design of the feature, so re-number the verion from 0.

About The test:
Hardware problem(unsteady) still occurs like before. The test server is in
another country spot A, and my contact of the country located spot B, so
it is not quite convenient to find help(plug cable, or check the hardware).
So, my NIC(has 2 functions) still just has func1 connected to gateway.
If there is other people who has the hardware could test the patches, that
would be great help.


Basically, there are two phenomenon of unsteady hardware:
1. Start vm, the hardware emit fatal error itself before I did anything,
   cause vm stop.
2. Start vm, assign IP to func1, then ping the gateway, it will show
   "Destination Host Unreachable" after dozens of or hundreds of successful
   ping, and guest dmesg shows nothing abnormal.  I think this phenomenon is
   the *strong evidence* of saying unsteady hardware, I speculate that
   the cable has problem.

   on the opposite, I also saw perfect result 2 times in my numerous tests,
   which just assign func1 while func0 has no user. It can ping several housrs(
   more than 15000 times ping) withtout any problem, during the period, inject
   non fatal error to func0 & func1, error recovery is very good.

   So, most of time, I must do the test quickly before the hardware goes crazy,
   until get what I expected.


Test:
scenario 1: assign func1 to vm while func0 has no user.
scenario 2: assign both functions to 1 vm, with the same topology as host.
scenario 3: assign both functions to 1 vm, under different bus.
scenario 4: assign each function to a separate vm.

the steps is: assign IP to func1, ping the gateway, inject non fatal error to
both functions, see if func1 still can ping after recovery.

Although we don't have cable for func0, but in the test like scenario 4,
inject to func0, it doesn't affect func1's recovery, so I think it can prove
that one function's recovery doesn't affect another.


Extra info FYI:
1. During the test, some debug lines are added in vfio_err_notifier_handler,
   read the uncor status register in this function when fatal error occured,
   it shows all F's every time.
2. Based on the v10 patch & the corresponding kernel part, modified as
   comments: revert the eventfd handling(don't signal uncor status), and
   guest link reset will induce the host link reset. The test result shows:
   non fatal error recovery is good; fatal error recovery has same result
   with what Alex find before(guest kernel crash), because guest device
   driver's error_detected() access the MMIO registers, get all F's.


Cao jin (3):
  pcie aer: verify if AER functionality is available
  vfio pci: new function to init AER capability
  vfio-pci: process non fatal error of AER

 hw/pci/pcie_aer.c  |  28 +++
 hw/vfio/pci.c  | 180 +++--
 hw/vfio/pci.h  |   3 +
 linux-headers/linux/vfio.h |   1 +
 4 files changed, 207 insertions(+), 5 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error

2017-02-26 Thread Cao jin

0. What happens now (PCIE AER only)
   Fatal errors cause a link reset.
   Non fatal errors don't.
   All errors stop the VM eventually, but not immediately
   because it's detected and reported asynchronously.
   Interrupts are forwarded as usual.
   Correctable errors are not reported to guest at all.
   Note: PPC EEH is different. This focuses on AER.

1. Correctable errors
   There is no need to report these to guest. So let's not.

2. Fatal errors
   It's not easy to handle them gracefully since link reset
   is needed. As a first step, let's use the existing mechanism
   in that case.

2. Non-fatal errors
   Here we could make progress by reporting them to guest
   and have guest handle them.

   Issues:
   a. this behaviour should only be enabled with new userspace,
  old userspace should work without changes.

  Suggestion: One way to address this would be to add a new eventfd
  non_fatal_err_trigger. If not set, invoke err_trigger.

   b. drivers are supposed to stop MMIO when error is reported,
  if vm keeps going, we will keep doing MMIO/config.

  Suggestion 1: ignore this. vm stop happens much later when
  userspace runs anyway, so we are not making things much worse.

  Suggestion 2: try to stop MMIO/config, resume on resume call

  Patch below implements Suggestion 1.

  Note that although this is really against the documentation, which
  states error_detected() is the point at which the driver should quiesce
  the device and not touch it further (until diagnostic poking at
  mmio_enabled or full access at resume callback).

  Fixing this won't be easy. However, this is not a regression.

  Also note this does nothing about interrupts, documentation
  suggests returning IRQ_NONE until reset.
  Again, not a regression.

   c. PF driver might detect that function is completely broken,
  if vm keeps going, we will keep doing MMIO/config.

  Suggestion 1: ignore this. vm stop happens much later when
  userspace runs anyway, so we are not making things much worse.

  Suggestion 2: detect this and invoke err_trigger to stop VM.

  Patch below implements Suggestion 2.

Suggested-by: Michael S. Tsirkin <m...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 drivers/vfio/pci/vfio_pci.c | 38 +++--
 drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h   |  1 +
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e..3551cc9 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
 
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
-   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
+  irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
if (pci_is_pcie(vdev->pdev))
return 1;
} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
@@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
case VFIO_PCI_REQ_IRQ_INDEX:
break;
case VFIO_PCI_ERR_IRQ_INDEX:
+   case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
if (pci_is_pcie(vdev->pdev))
break;
/* pass thru to return error */
@@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
 
mutex_lock(>igate);
 
-   if (vdev->err_trigger)
+   if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
+   eventfd_signal(vdev->non_fatal_err_trigger, 1);
+   else if (vdev->err_trigger)
eventfd_signal(vdev->err_trigger, 1);
 
mutex_unlock(>igate);
@@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
+{
+   struct vfio_pci_device *vdev;
+   struct vfio_device *device;
+   static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
+
+   device = vfio_device_get_from_dev(>dev);
+   if (!device)
+   goto err_dev;
+
+   vdev = vfio_device_data(device);
+   if (!vdev)
+   goto err_data;
+
+   mutex_lock(>igate);
+
+   if (vdev->err_trigger)
+   eventfd_signal(vdev->err_trigger, 1);
+
+   mutex_unlock(>igate);
+
+   err = PCI_ERS_RESULT_RECOVERED;
+
+err_data:
+   vfio_device_put(device);
+err_dev:
+   return err;
+}
+
 static const struct pci_error_handlers vfio_er

[Qemu-devel] [PATCH v10 7/8] msi_init: convert assert to return -errno

2017-02-25 Thread Cao jin

According to the disscussion:
http://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg08215.html

Let leaf function returns reasonable -errno, let caller decide how to
handle the return value.

Suggested-by: Markus Armbruster <arm...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/msi.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index a87b2278a373..af3efbe525ce 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -201,9 +201,12 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
" 64bit %d mask %d\n",
offset, nr_vectors, msi64bit, msi_per_vector_mask);
 
-assert(!(nr_vectors & (nr_vectors - 1)));   /* power of 2 */
-assert(nr_vectors > 0);
-assert(nr_vectors <= PCI_MSI_VECTORS_MAX);
+/* vector sanity test: should in range 1 - 32, should be power of 2 */
+if (!is_power_of_2(nr_vectors) || nr_vectors > PCI_MSI_VECTORS_MAX) {
+error_setg(errp, "Invalid vector number: %d", nr_vectors);
+return -EINVAL;
+}
+
 /* the nr of MSI vectors is up to 32 */
 vectors_order = ctz32(nr_vectors);
 
-- 
2.1.0

[Qemu-devel] [PATCH v10 8/8] megasas: remove unnecessary megasas_use_msix()

2017-02-25 Thread Cao jin

Also move certain hunk above, to place msix init related code together.

CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index ca98ae7cc329..49f38002448e 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -155,11 +155,6 @@ static bool megasas_use_queue64(MegasasState *s)
 return s->flags & MEGASAS_MASK_USE_QUEUE64;
 }
 
-static bool megasas_use_msix(MegasasState *s)
-{
-return s->msix != ON_OFF_AUTO_OFF;
-}
-
 static bool megasas_is_jbod(MegasasState *s)
 {
 return s->flags & MEGASAS_MASK_USE_JBOD;
@@ -2306,9 +2301,7 @@ static void megasas_scsi_uninit(PCIDevice *d)
 {
 MegasasState *s = MEGASAS(d);
 
-if (megasas_use_msix(s)) {
-msix_uninit(d, >mmio_io, >mmio_io);
-}
+msix_uninit(d, >mmio_io, >mmio_io);
 msi_uninit(d);
 }
 
@@ -2358,7 +2351,7 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
   "megasas-mmio", 0x4000);
-if (megasas_use_msix(s)) {
+if (s->msix != ON_OFF_AUTO_OFF) {
 ret = msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
 >mmio_io, b->mmio_bar, 0x3800, 0x68, );
 if (ret && s->msix == ON_OFF_AUTO_ON) {
@@ -2375,6 +2368,10 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 error_free(err);
 }
 
+if (s->msix != ON_OFF_AUTO_OFF) {
+msix_vector_use(dev, 0);
+}
+
 memory_region_init_io(>port_io, OBJECT(s), _port_ops, s,
   "megasas-io", 256);
 memory_region_init_io(>queue_io, OBJECT(s), _queue_ops, s,
@@ -2390,10 +2387,6 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 pci_register_bar(dev, b->mmio_bar, bar_type, >mmio_io);
 pci_register_bar(dev, 3, bar_type, >queue_io);
 
-if (megasas_use_msix(s)) {
-msix_vector_use(dev, 0);
-}
-
 s->fw_state = MFI_FWSTATE_READY;
 if (!s->sas_addr) {
 s->sas_addr = ((NAA_LOCALLY_ASSIGNED_ID << 24) |
-- 
2.1.0

[Qemu-devel] [PATCH v10 0/8] the reset of msix_init series

2017-02-25 Thread Cao jin

v10 changelog:
1. drop the unliked patch, introduce a new patch 1 according to mst's comments.
2. base on the new patch, remove the following statements

/* Any error other than -ENOTSUP(board's MSI support is broken)
 * is a programming error */
assert(!ret || ret == -ENOTSUP);

   for the affected device: megasas, hcd-xhci. This is trivial changes,
   so I left the R-bs where it was.

Test:
1. Detailed test via command line as v9
2. make check hangs at: GTESTER check-qtest-x86_64. After ctrl-C, it says:

make: *** [check-qtest-x86_64] Interrupt
qemu-system-x86_64: Failed to read msg header. Read -1 instead of 12. 
Original request 11.
qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Input/output error 
(5)
qemu-system-x86_64: Failed to set msg fds.
qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Invalid argument 
(22)

qemu-system-x86_64: Failed to set msg fds.
qemu-system-x86_64: vhost VQ 1 ring restore failed: -1: Invalid argument 
(22)

   Is it a regresstion or I missed something?

CC: Jason Wang <jasow...@redhat.com>
CC: Gerd Hoffmann <kra...@redhat.com>
CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Alex Williamson <alex.william...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Cao jin (8):
  msix: Rename and create a wrapper
  megasas: change behaviour of msix switch
  hcd-xhci: change behaviour of msix switch
  megasas: undo the overwrites of msi user configuration
  vmxnet3: fix reference leak issue
  vmxnet3: remove unnecessary internal msix flag
  msi_init: convert assert to return -errno
  megasas: remove unnecessary megasas_use_msix()

 hw/net/vmxnet3.c  | 40 +++-
 hw/pci/msi.c  |  9 ++---
 hw/pci/msix.c | 30 +-
 hw/scsi/megasas.c | 48 +---
 hw/usb/hcd-xhci.c | 29 +
 hw/vfio/pci.c | 12 ++--
 include/hw/pci/msix.h |  5 +
 7 files changed, 99 insertions(+), 74 deletions(-)

-- 
2.1.0

[Qemu-devel] [PATCH v10 6/8] vmxnet3: remove unnecessary internal msix flag

2017-02-25 Thread Cao jin

Internal flag msix_used is unnecessary, it has the same effect as
msix_enabled().

The corresponding msi flag is already dropped in commit 1070048e.

CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Dmitry Fleytman <dmi...@daynix.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/net/vmxnet3.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 81ab131e4a47..a3ed77f04b7c 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -281,8 +281,6 @@ typedef struct {
 Vmxnet3RxqDescr rxq_descr[VMXNET3_DEVICE_MAX_RX_QUEUES];
 Vmxnet3TxqDescr txq_descr[VMXNET3_DEVICE_MAX_TX_QUEUES];
 
-/* Whether MSI-X support was installed successfully */
-bool msix_used;
 hwaddr drv_shmem;
 hwaddr temp_shared_guest_driver_memory;
 
@@ -359,7 +357,7 @@ static bool _vmxnet3_assert_interrupt_line(VMXNET3State *s, 
uint32_t int_idx)
 {
 PCIDevice *d = PCI_DEVICE(s);
 
-if (s->msix_used && msix_enabled(d)) {
+if (msix_enabled(d)) {
 VMW_IRPRN("Sending MSI-X notification for vector %u", int_idx);
 msix_notify(d, int_idx);
 return false;
@@ -383,7 +381,7 @@ static void _vmxnet3_deassert_interrupt_line(VMXNET3State 
*s, int lidx)
  * This function should never be called for MSI(X) interrupts
  * because deassertion never required for message interrupts
  */
-assert(!s->msix_used || !msix_enabled(d));
+assert(!msix_enabled(d));
 /*
  * This function should never be called for MSI(X) interrupts
  * because deassertion never required for message interrupts
@@ -421,7 +419,7 @@ static void vmxnet3_trigger_interrupt(VMXNET3State *s, int 
lidx)
 s->interrupt_states[lidx].is_pending = true;
 vmxnet3_update_interrupt_line_state(s, lidx);
 
-if (s->msix_used && msix_enabled(d) && s->auto_int_masking) {
+if (msix_enabled(d) && s->auto_int_masking) {
 goto do_automask;
 }
 
@@ -1428,7 +1426,7 @@ static void vmxnet3_update_features(VMXNET3State *s)
 
 static bool vmxnet3_verify_intx(VMXNET3State *s, int intx)
 {
-return s->msix_used || msi_enabled(PCI_DEVICE(s))
+return msix_enabled(PCI_DEVICE(s)) || msi_enabled(PCI_DEVICE(s))
 || intx == pci_get_byte(s->parent_obj.config + PCI_INTERRUPT_PIN) - 1;
 }
 
@@ -1445,18 +1443,18 @@ static void vmxnet3_validate_interrupts(VMXNET3State *s)
 int i;
 
 VMW_CFPRN("Verifying event interrupt index (%d)", s->event_int_idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, s->event_int_idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), 
s->event_int_idx);
 
 for (i = 0; i < s->txq_num; i++) {
 int idx = s->txq_descr[i].intr_idx;
 VMW_CFPRN("Verifying TX queue %d interrupt index (%d)", i, idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), idx);
 }
 
 for (i = 0; i < s->rxq_num; i++) {
 int idx = s->rxq_descr[i].intr_idx;
 VMW_CFPRN("Verifying RX queue %d interrupt index (%d)", i, idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), idx);
 }
 }
 
@@ -2185,6 +2183,7 @@ vmxnet3_use_msix_vectors(VMXNET3State *s, int num_vectors)
 static bool
 vmxnet3_init_msix(VMXNET3State *s)
 {
+bool msix;
 PCIDevice *d = PCI_DEVICE(s);
 int res = msix_init(d, VMXNET3_MAX_INTRS,
 >msix_bar,
@@ -2194,18 +2193,19 @@ vmxnet3_init_msix(VMXNET3State *s)
 VMXNET3_MSIX_OFFSET(s), NULL);
 
 if (0 > res) {
-VMW_WRPRN("Failed to initialize MSI-X, error %d", res);
-s->msix_used = false;
+VMW_WRPRN("Failed to initialize MSI-X, board's MSI support is broken");
+msix = false;
 } else {
 if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
 VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
 msix_uninit(d, >msix_bar, >msix_bar);
-s->msix_used = false;
+msix = false;
 } else {
-s->msix_used = true;
+msix = true;
 }
 }
-return s->msix_used;
+
+return msix;
 }
 
 static void
@@ -2213,7 +2213,7 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
 {
 PCIDevice *d = PCI_DEVICE(s);
 
-if (s->msix_used) {
+if (msix_enabled(d)) {
 vmxnet3_unuse_msix_vectors(s, VMXNET3_MAX_INTRS);
 msix_uninit(d, >msix_bar, >msix_bar);
 }
-- 
2.1.0

[Qemu-devel] [PATCH v10 4/8] megasas: undo the overwrites of msi user configuration

2017-02-25 Thread Cao jin

Commit afea4e14 seems forgetting to undo the overwrites, which is
unsuitable.

CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index eab480da354e..ca98ae7cc329 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2350,11 +2350,10 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 "msi=off with this machine type.\n");
 error_propagate(errp, err);
 return;
-} else if (ret) {
-/* With msi=auto, we fall back to MSI off silently */
-s->msi = ON_OFF_AUTO_OFF;
-error_free(err);
 }
+assert(!err || s->msix == ON_OFF_AUTO_AUTO);
+/* With msi=auto, we fall back to MSI off silently */
+error_free(err);
 }
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
-- 
2.1.0

[Qemu-devel] [PATCH v10 1/8] msix: Rename and create a wrapper

2017-02-25 Thread Cao jin

Rename msix_init to msix_validate_and_init, and use it from vfio which
might get a reasonable -EINVAL.  New a wrapper msix_init which assert the
programming error for debug purpose and use it from other devices.

CC: Alex Williamson <alex.william...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/msix.c | 30 +-
 hw/vfio/pci.c | 12 ++--
 include/hw/pci/msix.h |  5 +
 3 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index bb54e8b0ac37..2b7541ab2c8d 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -239,6 +239,19 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
nentries)
 }
 }
 
+/* Just a wrapper to check the return value */
+int msix_init(struct PCIDevice *dev, unsigned short nentries,
+  MemoryRegion *table_bar, uint8_t table_bar_nr,
+  unsigned table_offset, MemoryRegion *pba_bar,
+  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
+  Error **errp)
+{
+int ret = msix_validate_and_init(dev, nentries, table_bar, table_bar_nr,
+table_offset, pba_bar, pba_bar_nr, pba_offset, cap_pos, errp);
+
+assert(ret != -EINVAL);
+return ret;
+}
 /*
  * Make PCI device @dev MSI-X capable
  * @nentries is the max number of MSI-X vectors that the device support.
@@ -259,11 +272,11 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
nentries)
  * also means a programming error, except device assignment, which can check
  * if a real HW is broken.
  */
-int msix_init(struct PCIDevice *dev, unsigned short nentries,
-  MemoryRegion *table_bar, uint8_t table_bar_nr,
-  unsigned table_offset, MemoryRegion *pba_bar,
-  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
-  Error **errp)
+int msix_validate_and_init(struct PCIDevice *dev, unsigned short nentries,
+   MemoryRegion *table_bar, uint8_t table_bar_nr,
+   unsigned table_offset, MemoryRegion *pba_bar,
+   uint8_t pba_bar_nr, unsigned pba_offset,
+   uint8_t cap_pos, Error **errp)
 {
 int cap;
 unsigned table_size, pba_size;
@@ -361,10 +374,9 @@ int msix_init_exclusive_bar(PCIDevice *dev, unsigned short 
nentries,
 memory_region_init(>msix_exclusive_bar, OBJECT(dev), name, bar_size);
 g_free(name);
 
-ret = msix_init(dev, nentries, >msix_exclusive_bar, bar_nr,
-0, >msix_exclusive_bar,
-bar_nr, bar_pba_offset,
-0, errp);
+ret = msix_validate_and_init(dev, nentries, >msix_exclusive_bar,
+ bar_nr, 0, >msix_exclusive_bar,
+ bar_nr, bar_pba_offset, 0, errp);
 if (ret) {
 return ret;
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d6627f..06828b537a75 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1436,12 +1436,12 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int 
pos, Error **errp)
 
 vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) *
 sizeof(unsigned long));
-ret = msix_init(>pdev, vdev->msix->entries,
-vdev->bars[vdev->msix->table_bar].region.mem,
-vdev->msix->table_bar, vdev->msix->table_offset,
-vdev->bars[vdev->msix->pba_bar].region.mem,
-vdev->msix->pba_bar, vdev->msix->pba_offset, pos,
-);
+ret = msix_validate_and_init(>pdev, vdev->msix->entries,
+ vdev->bars[vdev->msix->table_bar].region.mem,
+ vdev->msix->table_bar, vdev->msix->table_offset,
+ vdev->bars[vdev->msix->pba_bar].region.mem,
+ vdev->msix->pba_bar, vdev->msix->pba_offset, pos,
+ );
 if (ret < 0) {
 if (ret == -ENOTSUP) {
 error_report_err(err);
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 1f27658d352f..815e59bc96f3 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -11,6 +11,11 @@ int msix_init(PCIDevice *dev, unsigned short nentries,
   unsigned table_offset, MemoryRegion *pba_bar,
   uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
   Error **errp);
+int msix_validate_and_init(PCIDevice *dev, unsigned short nentries,
+   MemoryRegion *table_bar, uint8_t table_bar_nr,
+   unsigned table_offset, MemoryRegion *pba_

[Qemu-devel] [PATCH v10 2/8] megasas: change behaviour of msix switch

2017-02-25 Thread Cao jin

Resolve the TODO, msix=auto means msix on; if user specify msix=on,
then device creation fail on msix_init failure.
Also undo the overwrites of user configuration of msix.

CC: Michael S. Tsirkin <m...@redhat.com>
CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index e3d59b7c83c7..eab480da354e 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2359,18 +2359,28 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
   "megasas-mmio", 0x4000);
+if (megasas_use_msix(s)) {
+ret = msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
+>mmio_io, b->mmio_bar, 0x3800, 0x68, );
+if (ret && s->msix == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msix=on request, fail */
+error_append_hint(, "You have to use msix=auto (default) or "
+"msix=off with this machine type.\n");
+/* No instance_finalize method, need to free the resource here */
+object_unref(OBJECT(>mmio_io));
+error_propagate(errp, err);
+return;
+}
+assert(!err || s->msix == ON_OFF_AUTO_AUTO);
+/* With msix=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
 memory_region_init_io(>port_io, OBJECT(s), _port_ops, s,
   "megasas-io", 256);
 memory_region_init_io(>queue_io, OBJECT(s), _queue_ops, s,
   "megasas-queue", 0x4);
 
-if (megasas_use_msix(s) &&
-msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
-  >mmio_io, b->mmio_bar, 0x3800, 0x68, NULL)) {
-/* TODO: check msix_init's error, and should fail on msix=on */
-s->msix = ON_OFF_AUTO_OFF;
-}
-
 if (pci_is_express(dev)) {
 pcie_endpoint_cap_init(dev, 0xa0);
 }
-- 
2.1.0

[Qemu-devel] [PATCH v10 3/8] hcd-xhci: change behaviour of msix switch

2017-02-25 Thread Cao jin

Resolve the TODO, msix=auto means msix on; if user specify msix=on,
then device creation fail on msix_init failure.

CC: Gerd Hoffmann <kra...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Gerd Hoffmann <kra...@redhat.com>
Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/usb/hcd-xhci.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 28dd2f2c9a97..4551a3758b3e 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3554,12 +3554,14 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 if (xhci->numintrs < 1) {
 xhci->numintrs = 1;
 }
+
 if (xhci->numslots > MAXSLOTS) {
 xhci->numslots = MAXSLOTS;
 }
 if (xhci->numslots < 1) {
 xhci->numslots = 1;
 }
+
 if (xhci_get_flag(xhci, XHCI_FLAG_ENABLE_STREAMS)) {
 xhci->max_pstreams_mask = 7; /* == 256 primary streams */
 } else {
@@ -3587,6 +3589,25 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, 
xhci);
 
 memory_region_init(>mem, OBJECT(xhci), "xhci", LEN_REGS);
+if (xhci->msix != ON_OFF_AUTO_OFF) {
+ret = msix_init(dev, xhci->numintrs,
+>mem, 0, OFF_MSIX_TABLE,
+>mem, 0, OFF_MSIX_PBA,
+0x90, );
+if (ret && xhci->msix == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msix=on request, fail */
+error_append_hint(, "You have to use msix=auto (default) or "
+"msix=off with this machine type.\n");
+/* No instance_finalize method, need to free the resource here */
+object_unref(OBJECT(>mem));
+error_propagate(errp, err);
+return;
+}
+assert(!err || xhci->msix == ON_OFF_AUTO_AUTO);
+/* With msix=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
 memory_region_init_io(>mem_cap, OBJECT(xhci), _cap_ops, xhci,
   "capabilities", LEN_CAP);
 memory_region_init_io(>mem_oper, OBJECT(xhci), _oper_ops, xhci,
@@ -3619,14 +3640,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 ret = pcie_endpoint_cap_init(dev, 0xa0);
 assert(ret >= 0);
 }
-
-if (xhci->msix != ON_OFF_AUTO_OFF) {
-/* TODO check for errors, and should fail when msix=on */
-msix_init(dev, xhci->numintrs,
-  >mem, 0, OFF_MSIX_TABLE,
-  >mem, 0, OFF_MSIX_PBA,
-  0x90, NULL);
-}
 }
 
 static void usb_xhci_exit(PCIDevice *dev)
-- 
2.1.0

[Qemu-devel] [PATCH v10 5/8] vmxnet3: fix reference leak issue

2017-02-25 Thread Cao jin

On migration target, msix_vector_use() will be called in vmxnet3_post_load()
in second time, without a matching second call to msi_vector_unuse(),
which results in vector reference leak.

CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Dmitry Fleytman <dmi...@daynix.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/net/vmxnet3.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index e13a798b3b00..81ab131e4a47 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2556,21 +2556,11 @@ static int vmxnet3_put_rxq_descr(QEMUFile *f, void *pv, 
size_t size,
 static int vmxnet3_post_load(void *opaque, int version_id)
 {
 VMXNET3State *s = opaque;
-PCIDevice *d = PCI_DEVICE(s);
 
 net_tx_pkt_init(>tx_pkt, PCI_DEVICE(s),
 s->max_tx_frags, s->peer_has_vhdr);
 net_rx_pkt_init(>rx_pkt, s->peer_has_vhdr);
 
-if (s->msix_used) {
-if  (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
-VMW_WRPRN("Failed to re-use MSI-X vectors");
-msix_uninit(d, >msix_bar, >msix_bar);
-s->msix_used = false;
-return -1;
-}
-}
-
 vmxnet3_validate_queues(s);
 vmxnet3_validate_interrupts(s);
 
-- 
2.1.0

Re: [Qemu-devel] seek help for AER

2017-02-21 Thread Cao jin



On 02/22/2017 12:07 AM, Alex Williamson wrote:
> On Tue, 21 Feb 2017 18:21:53 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> Hi,
>>
>> First, sorry for such a long time delay on the AER job. I was on 12 days
>> holiday, and start to work on the patch 2 weeks ago, because I use a
>> newer version kernel(4.10 rc8) to start off the work, several tiny
>> problems slow me.
>>
>> Now I meet a confusing issue on aer_inject module. When I use aer_inject
>> command before, it says:
>>
>> Error: Failed to write, No such device
>>
>> then in dmesg, it says:
>>
>> pcieport :00:09.0: aer_inject: AER device not found
>>
>> and then I add some debug line in find_aer_device_iter() to check
>> device->bus-name, find that it is "pci", so that is why I can't inject,
>> it should have been "pci_express". But everything is same as before
>> except a newer host kernel.
>>
>> I still don't have a clue on this problem for almost 2 days, could you help?
> 
> If it only happens with a new host kernel, I would suggest finding a
> kernel where it works and bisecting between working and non-working
> kernels to find the commit that introduced the change.  You may have
> found a regression in the kernel that should be reported upstream.
> Thanks,
> 
> Alex
> 

Finally find the reason, I missed a kernel command line parameter
pcie_ports=native. Sorry for the noise.

It seems when make install the kernel, it will blow away the content of
old grub.cfg, so that I cannot check the parameters used in previous kernel.

-- 
Sincerely,
Cao jin

[Qemu-devel] seek help for AER

2017-02-21 Thread Cao jin

Hi,

First, sorry for such a long time delay on the AER job. I was on 12 days
holiday, and start to work on the patch 2 weeks ago, because I use a
newer version kernel(4.10 rc8) to start off the work, several tiny
problems slow me.

Now I meet a confusing issue on aer_inject module. When I use aer_inject
command before, it says:

Error: Failed to write, No such device

then in dmesg, it says:

pcieport :00:09.0: aer_inject: AER device not found

and then I add some debug line in find_aer_device_iter() to check
device->bus-name, find that it is "pci", so that is why I can't inject,
it should have been "pci_express". But everything is same as before
except a newer host kernel.

I still don't have a clue on this problem for almost 2 days, could you help?

The topology is my host is:

+-09.0-[06]--+-00.0 Intel Corporation 82576 Gigabit Network Connection
| \-00.1 Intel Corporation 82576 Gigabit Network Connection

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH] pcie: simplify pcie_add_capability()

2017-02-15 Thread Cao jin

Hi peter

On 02/14/2017 03:51 PM, Peter Xu wrote:
> When we add PCIe extended capabilities, we should be following the rule
> that we add the head extended cap (at offset 0x100) first, then the rest
> of them. Meanwhile, we are always adding new capability bits at the end
> of the list. Here the "next" looks meaningless in all cases since it
> should always be zero (along with the "header").
> 
> Simplify the function a bit, and it looks more readable now.
> 

See if this suggestion could be incorporated into your patch:)
http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg01418.html

-- 
Sincerely,
Cao jin

> Signed-off-by: Peter Xu <pet...@redhat.com>
> ---
>  hw/pci/pcie.c | 15 ---
>  1 file changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index cbd4bb4..e0e6f6a 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -664,30 +664,23 @@ void pcie_add_capability(PCIDevice *dev,
>   uint16_t cap_id, uint8_t cap_ver,
>   uint16_t offset, uint16_t size)
>  {
> -uint32_t header;
> -uint16_t next;
> -
>  assert(offset >= PCI_CONFIG_SPACE_SIZE);
>  assert(offset < offset + size);
>  assert(offset + size <= PCIE_CONFIG_SPACE_SIZE);
>  assert(size >= 8);
>  assert(pci_is_express(dev));
>  
> -if (offset == PCI_CONFIG_SPACE_SIZE) {
> -header = pci_get_long(dev->config + offset);
> -next = PCI_EXT_CAP_NEXT(header);
> -} else {
> +if (offset != PCI_CONFIG_SPACE_SIZE) {
>  uint16_t prev;
>  
>  /* 0 is reserved cap id. use internally to find the last capability
> in the linked list */
> -next = pcie_find_capability_list(dev, 0, );
> -
> +assert(pcie_find_capability_list(dev, 0, ) == 0);
>  assert(prev >= PCI_CONFIG_SPACE_SIZE);
> -assert(next == 0);
>  pcie_ext_cap_set_next(dev, prev, offset);
>  }
> -pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, next));
> +
> +pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, 0));
>  
>  /* Make capability read-only by default */
>  memset(dev->wmask + offset, 0, size);
>

Re: [Qemu-devel] [PATCH RFC] vfio error recovery: kernel support

2017-01-24 Thread Cao jin



On 01/21/2017 01:01 AM, Michael S. Tsirkin wrote:
> On Fri, Jan 20, 2017 at 06:13:22PM +0800, Cao jin wrote:
>>
>>
>> On 01/20/2017 04:16 AM, Michael S. Tsirkin wrote:
>>> This is a design and an initial patch for kernel side for AER
>>> support in VFIO.
>>>
>>> 0. What happens now (PCIE AER only)
>>>Fatal errors cause a link reset.
>>>Non fatal errors don't.
>>>All errors stop the VM eventually, but not immediately
>>>because it's detected and reported asynchronously.
>>>Interrupts are forwarded as usual.
>>>Correctable errors are not reported to guest at all.
>>>Note: PPC EEH is different. This focuses on AER.
>>>
>>> 1. Correctable errors
>>>I don't see a need to report these to guest. So let's not.
>>>
>>> 2. Fatal errors
>>>It's not easy to handle them gracefully since link reset
>>>is needed. As a first step, let's use the existing mechanism
>>>in that case.
>>>
>>> 2. Non-fatal errors
>>>Here we could make progress by reporting them to guest
>>>and have guest handle them.
>>>Issues:
>>> a. this behaviour should only be enabled with new userspace
>>>old userspace should work without changes
>>> Suggestion: One way to address this would be to add a new eventfd
>>> non_fatal_err_trigger. If not set, invoke err_trigger.
>>>
>>> b. drivers are supposed to stop MMIO when error is reported
>>> if vm keeps going, we will keep doing MMIO/config
>>> Suggestion 1: ignore this. vm stop happens much later when userspace 
>>> runs anyway,
>>> so we are not making things much worse
>>> Suggestion 2: try to stop MMIO/config, resume on resume call
>>>
>>> Patch below implements Suggestion 1.
>>>
>>> c. PF driver might detect that function is completely broken,
>>> if vm keeps going, we will keep doing MMIO/config
>>> Suggestion 1: ignore this. vm stop happens much later when userspace 
>>> runs anyway,
>>> so we are not making things much worse
>>> Suggestion 2: detect this and invoke err_trigger to stop VM
>>>
>>> Patch below implements Suggestion 2.
>>>
>>> Aside: we currently return PCI_ERS_RESULT_DISCONNECT when device
>>> is not attached. This seems bogus, likely based on the confusing name.
>>> We probably should return PCI_ERS_RESULT_CAN_RECOVER.
>>>
>>> The following patch does not change that.
>>>
>>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
>>>
>>> ---
>>>
>>> The patch is completely untested. Let's discuss the design first.
>>> Cao jin, if this is deemed acceptable please take it from here.
>>>
>>
>> Ok, thanks very much.
>>
>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>> index dce511f..fdca683 100644
>>> --- a/drivers/vfio/pci/vfio_pci.c
>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>> @@ -1292,7 +1292,9 @@ static pci_ers_result_t 
>>> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>>  
>>> mutex_lock(>igate);
>>>  
>>> -   if (vdev->err_trigger)
>>> +   if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
>>> +   eventfd_signal(vdev->err_trigger, 1);
>>> +   else if (vdev->err_trigger)
>>> eventfd_signal(vdev->err_trigger, 1);
>>>  
>>> mutex_unlock(>igate);
>>> @@ -1302,8 +1304,38 @@ static pci_ers_result_t 
>>> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>> return PCI_ERS_RESULT_CAN_RECOVER;
>>>  }
>>>  
>>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev,
>>> +   pci_channel_state_t state)
>>> +{
>>> +   struct vfio_pci_device *vdev;
>>> +   struct vfio_device *device;
>>> +
>>> +   device = vfio_device_get_from_dev(>dev);
>>> +   if (!device)
>>> +   goto err_dev;
>>> +
>>> +   vdev = vfio_device_data(device);
>>> +   if (!vdev)
>>> +   goto err_dev;
>>> +
>>> +   mutex_lock(>igate);
>>> +
>>> +   if (vdev->err_trigger)
>>> +   eventfd_signal(vdev->err_trigger, 1);
>>> +
>>> +   mutex_unlock(>igate);
>>> +
&

Re: [Qemu-devel] [PATCH RFC] vfio error recovery: kernel support

2017-01-20 Thread Cao jin



On 01/20/2017 04:16 AM, Michael S. Tsirkin wrote:
> This is a design and an initial patch for kernel side for AER
> support in VFIO.
> 
> 0. What happens now (PCIE AER only)
>Fatal errors cause a link reset.
>Non fatal errors don't.
>All errors stop the VM eventually, but not immediately
>because it's detected and reported asynchronously.
>Interrupts are forwarded as usual.
>Correctable errors are not reported to guest at all.
>Note: PPC EEH is different. This focuses on AER.
> 
> 1. Correctable errors
>I don't see a need to report these to guest. So let's not.
> 
> 2. Fatal errors
>It's not easy to handle them gracefully since link reset
>is needed. As a first step, let's use the existing mechanism
>in that case.
>
> 2. Non-fatal errors
>Here we could make progress by reporting them to guest
>and have guest handle them.
>Issues:
> a. this behaviour should only be enabled with new userspace
>old userspace should work without changes
> Suggestion: One way to address this would be to add a new eventfd
> non_fatal_err_trigger. If not set, invoke err_trigger.
> 
> b. drivers are supposed to stop MMIO when error is reported
> if vm keeps going, we will keep doing MMIO/config
> Suggestion 1: ignore this. vm stop happens much later when userspace runs 
> anyway,
> so we are not making things much worse
> Suggestion 2: try to stop MMIO/config, resume on resume call
> 
> Patch below implements Suggestion 1.
> 
> c. PF driver might detect that function is completely broken,
> if vm keeps going, we will keep doing MMIO/config
> Suggestion 1: ignore this. vm stop happens much later when userspace runs 
> anyway,
> so we are not making things much worse
> Suggestion 2: detect this and invoke err_trigger to stop VM
> 
> Patch below implements Suggestion 2.
> 
> Aside: we currently return PCI_ERS_RESULT_DISCONNECT when device
> is not attached. This seems bogus, likely based on the confusing name.
> We probably should return PCI_ERS_RESULT_CAN_RECOVER.
> 
> The following patch does not change that.
> 
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> 
> ---
> 
> The patch is completely untested. Let's discuss the design first.
> Cao jin, if this is deemed acceptable please take it from here.
> 

Ok, thanks very much.

> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index dce511f..fdca683 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1292,7 +1292,9 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  
>   mutex_lock(>igate);
>  
> - if (vdev->err_trigger)
> + if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> + eventfd_signal(vdev->err_trigger, 1);
> + else if (vdev->err_trigger)
>   eventfd_signal(vdev->err_trigger, 1);
>  
>   mutex_unlock(>igate);
> @@ -1302,8 +1304,38 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>   return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev,
> + pci_channel_state_t state)
> +{
> + struct vfio_pci_device *vdev;
> + struct vfio_device *device;
> +
> + device = vfio_device_get_from_dev(>dev);
> + if (!device)
> + goto err_dev;
> +
> + vdev = vfio_device_data(device);
> + if (!vdev)
> + goto err_dev;
> +
> + mutex_lock(>igate);
> +
> + if (vdev->err_trigger)
> + eventfd_signal(vdev->err_trigger, 1);
> +
> + mutex_unlock(>igate);
> +
> + vfio_device_put(device);
> +
> +err_data:
> + vfio_device_put(device);
> +err_dev:
> + return PCI_ERS_RESULT_RECOVERED;
> +}
> +
>  static const struct pci_error_handlers vfio_err_handlers = {
>   .error_detected = vfio_pci_aer_err_detected,
> + .slot_reset = vfio_pci_aer_slot_reset,
>  };
>  

if .slot_reset wants to be called, .error_detected should return
PCI_ERS_RESULT_NEED_RESET, as pci-error-recovery.txt said, so does code.

Is .slot_reset now just a copy of .error_detected and we are going do
some tricks here? or else don't get why .slot_reset signal user again.

>  static struct pci_driver vfio_pci_driver = {
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
> b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1c46045..e883db5 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>

Re: [Qemu-devel] [PATCH RFC v11 3/4] vfio-pci: pass the aer error to guest

2017-01-19 Thread Cao jin



On 01/19/2017 06:31 AM, Alex Williamson wrote:
> On Sat, 31 Dec 2016 17:13:07 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> From: Chen Fan <chen.fan.f...@cn.fujitsu.com>
>>

>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 76a8ac3..9861f72 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2470,21 +2470,55 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>>  static void vfio_err_notifier_handler(void *opaque)
>>  {
>>  VFIOPCIDevice *vdev = opaque;
>> +PCIDevice *dev = >pdev;
>> +PCIEAERMsg msg = {
>> +.severity = 0,
>> +.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>> +};
>> +int len;
>> +uint64_t uncor_status;
>> +
>> +/* Read uncorrectable error status from driver */
>> +len = read(vdev->err_notifier.rfd, _status, sizeof(uncor_status));
> 
> Whoa this seems bogus.  In the kernel eventfd_signal() adds the value
> to the internal counter.  We can't guarantee that the user is going to
> immediately respond to every signal, multiple signals might occur, each
> incrementing the counter.  I don't think we can use the eventfd like
> this.
> 

Ah, got your concern, make sense. Michael had the same comments as you,
I didn't got him at that time...

I explained the reason in v10 cover letter(5 of changelog): I find that
error status register reading in error handler sometime returns ALL F's.
 I may want to take a look at v10, incorporate your comments, and test
to see if the issue still exists.  Currently if we only handle non-fatal
error, we can still use eventfd like before.

-- 
Sincerely,
Cao jin
>> +if (len != sizeof(uncor_status)) {
>> +error_report("vfio-pci: uncor error status reading returns"
>> + " invalid number of bytes: %d", len);
>> +return; //Or goto stop?
> 
> It's bogus use of the eventfd anyway afaict, but
> event_notifier_test_and_clear() certainly handles at least EINTR.
> 
>> +}
>> +
>> +if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
>> +goto stop;
>> +}
> 
> I'm not sure this should be user selected anymore.
> 
>> +
>> +/* Populate the aer msg and send it to root port */
>> +if (dev->exp.aer_cap) {
>> +uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>> +bool isfatal = uncor_status &
>> +   pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
>> +
>> +if (isfatal) {
>> +goto stop;
>> +}
> 
> QEMU uses spaces not tabs.
> 
>> +
>> +msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>> + PCI_ERR_ROOT_CMD_NONFATAL_EN;
> 
> We don't get here if isfatal.
> 
>>  
>> -if (!event_notifier_test_and_clear(>err_notifier)) {
>> +error_report("vfio-pci device %d sending AER to root port. uncor"
>> + " status = 0x%"PRIx64, dev->devfn, uncor_status);
>> +pcie_aer_msg(dev, );
>>  return;
>>  }
>>  
>> +stop:
>>  /*
>> - * TBD. Retrieve the error details and decide what action
>> - * needs to be taken. One of the actions could be to pass
>> - * the error to the guest and have the guest driver recover
>> - * from the error. This requires that PCIe capabilities be
>> - * exposed to the guest. For now, we just terminate the
>> - * guest to contain the error.
>> + * Terminate the guest in case of
>> + * 1. AER capability is not exposed to guest.
>> + * 2. AER capability is exposed, but error is fatal, only non-fatal
>> + * error is handled now.
> 
> You're also currently requiring the user to enable aer per device.
> 
>>   */
>>  
>> -error_report("%s(%s) Unrecoverable error detected. Please collect any 
>> data possible and then kill the guest", __func__, vdev->vbasedev.name);
>> +error_report("%s(%s) fatal error detected. Please collect any data"
>> +" possible and then kill the guest", __func__, 
>> vdev->vbasedev.name);
>>  
>>  vm_stop(RUN_STATE_INTERNAL_ERROR);
>>  }
> 
> 
> 
> .
>

Re: [Qemu-devel] [PATCH RFC v11 4/4] vfio: add 'aer' property to expose aercap

2017-01-19 Thread Cao jin



On 01/19/2017 06:36 AM, Alex Williamson wrote:
> On Sat, 31 Dec 2016 17:13:08 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> From: Chen Fan <chen.fan.f...@cn.fujitsu.com>
>>
>> Add 'aer' property, let user choose whether expose the aer capability
>> or not.
> 
> But that's not what it does, it only controls the behavior in response
> to non-fatal errors, the capability is exposed regardless.
> 

This commit log is legacy, and defaults to off is a result of the
configuration restriction & your previous discussion, right?

In current version, if 'aer' property is off, we just allocate the
config space via pcie_add_capability(), we don't init the AER
capability, the value is all 0s there, so does that still mean
"capability is exposed regardless"?

>> Should disable aer feature by default, because only non-fatal
>> error is supported now. 
> 
> Why does that mean it should be disabled by default?  What bad thing
> happens if we enable this opportunistically?
> 
>> Signed-off-by: Chen Fan <chen.fan.f...@cn.fujitsu.com>
>> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
>>  hw/vfio/pci.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 9861f72..fc9db66 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3057,6 +3057,8 @@ static Property vfio_pci_dev_properties[] = {
>>  DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
>> sub_device_id, PCI_ANY_ID),
>>  DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
>> +DEFINE_PROP_BIT("aer", VFIOPCIDevice, features,
>> +VFIO_FEATURE_ENABLE_AER_BIT, false),
>>  /*
>>   * TODO - support passed fds... is this necessary?
>>   * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> 
> 
> 
> .
> 

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH RFC v11 2/4] vfio: new function to init aer cap for vfio device

2017-01-19 Thread Cao jin



On 01/19/2017 06:09 AM, Alex Williamson wrote:
> On Sat, 31 Dec 2016 17:13:06 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> From: Chen Fan <chen.fan.f...@cn.fujitsu.com>
>>
>> Introduce new function to initilize AER capability registers
>> for vfio-pci device.
>>
>> Signed-off-by: Chen Fan <chen.fan.f...@cn.fujitsu.com>
>> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
>>  hw/vfio/pci.c | 87 
>> +++
>>  hw/vfio/pci.h |  3 +++
>>  2 files changed, 85 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index d7dbe0e..76a8ac3 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -1851,18 +1851,81 @@ out:
>>  return 0;
>>  }
>>  
>> -static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>> +static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
>> +  int pos, uint16_t size, Error **errp)
>> +{
>> +PCIDevice *pdev = >pdev;
>> +PCIDevice *dev_iter;
>> +uint8_t type;
>> +uint32_t errcap;
>> +
>> +/* In case the physical device has AER cap while user doesn't enable 
>> AER,
>> + * still allocate the config space in the emulated device for AER */
> 
> Bad comment style
> 
> /*
>  * Multi-line comments should
>  * look like this.
>  */
> 
> /* Single line comments may look like this */
> 
> /* Muli-line comments may
>  * absolutely not look like this */
> 
>> +if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
>> +pcie_add_capability(pdev, PCI_EXT_CAP_ID_ERR,
>> +cap_ver, pos, size);
>> +return 0;
>> +}
>> +
>> +dev_iter = pci_bridge_get_device(pdev->bus);
>> +if (!dev_iter) {
>> +goto error;
>> +}
>> +
>> +while (dev_iter) {
>> +if (!pci_is_express(dev_iter)) {
>> +goto error;
>> +}
>> +
>> +type = pcie_cap_get_type(dev_iter);
>> +if ((type != PCI_EXP_TYPE_ROOT_PORT &&
>> + type != PCI_EXP_TYPE_UPSTREAM &&
>> + type != PCI_EXP_TYPE_DOWNSTREAM)) {
>> +goto error;
>> +}
>> +
>> +if (!dev_iter->exp.aer_cap) {
>> +goto error;
>> +}
>> +
>> +dev_iter = pci_bridge_get_device(dev_iter->bus);
>> +}
>> +
>> +errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
>> +/*
>> + * The ability to record multiple headers is depending on
>> + * the state of the Multiple Header Recording Capable bit and
>> + * enabled by the Multiple Header Recording Enable bit.
>> + */
>> +if ((errcap & PCI_ERR_CAP_MHRC) &&
>> +(errcap & PCI_ERR_CAP_MHRE)) {
>> +pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
>> +} else {
>> +pdev->exp.aer_log.log_max = 0;
>> +}
>> +
>> +pcie_cap_deverr_init(pdev);
>> +return pcie_aer_init(pdev, cap_ver, pos, size);
>> +
>> +error:
>> +error_setg(errp, "vfio: Unable to enable AER for device %s, parent bus "
>> +   "does not support AER signaling", vdev->vbasedev.name);
>> +return -1;
> 
> If we're only handling non-fatal errors, should we place the burden on
> the user to know when to add the aer flag for the device or should we
> just enable it opportunistically as available?

If we only handle non-fatal error, It make sense to me that we don't
need aer property as a switch and enable aer opportunistically as
available, it is no harm.

But considering that:
1. non-fatal error support is incomplete AER functionality, so I think
make it defaults to off is reasonable.

2. we may support fatal error too in the future, that will bring the
configuration restriction we used before. In this condition, make it
defaults to off is your discussion result(I guess). If this is the
finally choice, adopt it a little earlier is acceptable?

3. from another perspective, if 'aer' property shows in the future once
we support fatal-error too, would it seems that the 'aer' property is
dedicated to fatal-error only?(of course we could make it as the switch
of both error at that time)

> Should pcie_aer_init() be the one testing the topology?  It doesn't
> seem vfio specific anymore.
>

It does is not vfio specific, but I guess not. Question: could a
aer-capable device plugged under certain (root/downstream)port that is
not aer-capable? the answer I think is YES. If topology testing is done
in pcie_aer_init(), that means the answer is NO.

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v9 04/11] msix: check msix_init's return value

2017-01-19 Thread Cao jin



On 01/18/2017 11:21 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 18, 2017 at 02:29:19PM +0800, Cao jin wrote:
>>
>>
>> On 01/18/2017 12:01 AM, Michael S. Tsirkin wrote:
>>> On Tue, Jan 17, 2017 at 02:50:38PM +0800, Cao jin wrote:
>>>> forget to cc maintainers in this new patch
>>>>
>>>> On 01/17/2017 02:18 PM, Cao jin wrote:
>>>>> Doesn't do it for megasas & hcd-xhci, later patches will fix them.
>>>>>
>>>>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>>>
>>> I don't like this one, frankly. That's a bunch of code duplication.
>>
>> Yes, code duplication, seems inevitable if move the asserts into a
>> separate patch.
>>
>>> I suspect vfio is the only one who might reasonably get EINVAL here.
>>> So how about e.g. msix_validate_and_init that doesn't assert and use that
>>> from vfio, then switch msix_init to assert instead?
>>>
>>
>> Not sure if I get your idea. Do you mean: do param check via assert in
>> msix_init(), so that no need check its returned error outside, and
>> introduce new api msix_validate_and_init(same content as msix_init,
>> except param check) dedicated to vfio?
> 
> Something like this.
> 
>> If I understand you right, the way we do param check for msi_init[*] &
>> msix_init will be inconsistent.
> 
> Right, we should consolidate these for msi too.
> 
> 

I got confused: for msi_init, convert assert to return -errno is a
choice from a long discussion:
http://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg08215.html

then now we will revert again? And IIRC, I did use assert in msix_init
to do sanity test, and revert as suggest. And this is the way we have
done for msi_init: assert the return error outside.  And if it need to
be modified as your suggestion, I see lots of place need to be taken
care, does that worth the trouble?

I see there is a simpler way helping us: drop this one from the
patchset, at least there is no regression, just a few devices doesn't
assert the return error while other(megasas, hcd-xhci) does.  What would
you say?
-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH RFC v11 0/4] vfio-pci: pass non-fatal error to guest

2017-01-18 Thread Cao jin

On 01/19/2017 05:43 AM, Alex Williamson wrote:
> On Sat, 31 Dec 2016 17:13:04 +0800

> How can you know if he other function was affected if you don't even
> have a cable connected?

Will try ask for another cable for the other function.

>  How is testing on something that doesn't seem
> to work correctly already valid?  Thanks,

Not sure I understand you right. Even if I find those abnormal
phenomenon in test, I finally got the very exact result I expected in
each scenario, does it enough to prove this new functionality works? and
especially these abnormal phenomenon exists even without this patchset

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH RFC v11 0/4] vfio-pci: pass non-fatal error to guest

2017-01-18 Thread Cao jin



On 01/19/2017 05:43 AM, Alex Williamson wrote:
> On Sat, 31 Dec 2016 17:13:04 +0800
> Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> 
>> As previous discussion suggest, we could take a step back to handle non-fatal
>> error first, this will make this patchset much more thinner, because we could
>> drop all the configuration restriction related patches.
>>
>> FYI: patch 1 has been cherry picked into another series, and wait to be 
>> merged
>> first, so this patchset can't compile in your host.
>>
>> v11 changelog:
>> 1. drop a bunch of code which check the configuration.
>> 2. modify patch 3 to handle non-fatal error only, fatal error still
>>results in vm stop.
>>Doesn't modify as suggestion "add another eventfd do distinguish fatal &
>>non-fatal error", because 1st, user has the ability to distinguish them
>>just from the uncorrectable error status; 2nd, for back compatible, e.g.
>>an old user handle both error, rely on the current error eventfd.
>>
>> Test:
>> Test it with intel 82576 NIC, which has 2 functions, function 1 has cable
>> connected to gateway, function 0 has no link. Test in 4 scenario.
>> 1. just assign function 1 to one vm, function 0 has no user
>> 2. assign 2 function to one vm, totally comply previous configuraton 
>> restrction
>> 3. assign 2 function to one vm, under different virtual bus
>> 4, assign functions to 2 different vm
>>
>> The test steps are the same as v10: assign ip to function 1, add route info,
>> and ping the gateway. The results meet expectation. But the unsteady hardware
>> often emit fatal error, still don't know why. And igb driver in guest seems
>> has bug: ping gateway for a while, even if I don't do anything, it will show
>> "Destination Host Unreachable" after many successful ping. But obviously,
>> neither of these has relationship with this patchset.
> 
> So something doesn't work right regardless and this doesn't describe
> what testing was done of the functionality added here.  Were AER events
> injected?  Did fatal ones cause a vmstop, did non-fatal ones continue?
> How can you know if he other function was affected if you don't even
> have a cable connected?  How is testing on something that doesn't seem
> to work correctly already valid?  Thanks,
> 
> Alex
> 

Well, of course I test the functionality added here via aer-inject tool
& driver, the same test as I did in v10. The unsteady hardware slows my
test, make me did the test again & again & again, until see what I
expected, maybe I should share some details.

General test steps(linux only)
  * assign ip to function 1 in guest, add route info, and ping the gateway
  * on host, injecting various non fatal error to both devices.
Robustness test is: fast repeat injecting manually.

Expectation: in all scenario, function 1 who is pinging still works
after non fatal error recovery; function 0(in host, same guest, another
guest) doesn't have any abnormal phenomenon(crash guest, or any abnormal
log in dmesg); fatal error cause a vmstop(surely)

I did the test in each scenario, and I got what I expected: fatal error
definitely cause a vmstop; even bothered by unsteady hardware, non fatal
error recovery is ok

IIRC, in test of v10, I even find that the NIC would emit the fatal
error after I reboot the host, before I start vm.  This is what I called
unsteady hardware(I will try to figure out on which scenario the
hardware is easy to be unsteady). So I think, something doesn't work
right, but in my opinion, none of this patchset's business.

-- 
Sincerely,
Cao jin

>> Chen Fan (3):
>>   vfio: new function to init aer cap for vfio device
>>   vfio-pci: pass the aer error to guest
>>   vfio: add 'aer' property to expose aercap
>>
>> Dou Liyang (1):
>>   pcie_aer: support configurable AER capa version
>>
>>  hw/net/e1000e.c|   3 +-
>>  hw/pci-bridge/ioh3420.c|   2 +-
>>  hw/pci-bridge/xio3130_downstream.c |   2 +-
>>  hw/pci-bridge/xio3130_upstream.c   |   2 +-
>>  hw/pci/pcie_aer.c  |   5 +-
>>  hw/vfio/pci.c  | 139 
>> +
>>  hw/vfio/pci.h  |   3 +
>>  include/hw/pci/pcie_aer.h  |   3 +-
>>  8 files changed, 139 insertions(+), 20 deletions(-)
>>
> 
> 
> 
> .
>

Re: [Qemu-devel] [PATCH v2] vfio/pci: Support error recovery

2017-01-18 Thread Cao jin



On 01/19/2017 05:32 AM, Alex Williamson wrote:
> On Tue, 10 Jan 2017 17:11:01 +0200
> "Michael S. Tsirkin" <m...@redhat.com> wrote:
> 
>> On Tue, Jan 10, 2017 at 07:46:17PM +0800, Cao jin wrote:
>>>
>>>
>>> On 01/10/2017 07:04 AM, Michael S. Tsirkin wrote:  
>>>> On Sat, Dec 31, 2016 at 05:15:36PM +0800, Cao jin wrote:  
>>>>> Support serious device error recovery  
>>>>
>>>> serious?
>>>>  
>>>
>>> Sorry for my poor vocabulary if it confuses people. I wanted to express
>>> the meaning that: vfio-pci actually cannot do a real recovery for device
>>> even if it provides the callbacks, it relies on the user to do a
>>> effective(or word "serious"?) recovery.
>>>
>>> Welcome the amendment on the commit log.  
>>
>> It's up to Alex, maybe he's able to figure it all out from
>> code, but the rest of us could benefit from a description
>> of what the patch does from userspace point of view.
>>
>> Also, is it a pre-requisite of the userspace patches you posted?
> 
> This is the same blocking user accesses while the device is in recovery
> that you thought was ineffective/wrong before.  Why do we still need it
> if QEMU isn't trying to handle fatal errors?  If the kernel is doing a
> reset shouldn't the user consider the device dead?  A commit log
> explaining this is absolutely necessary.  Thanks,
> 
> Alex
> 

Yes, it is the same blocking user access as before, and I did said it is
not effective as we expected, and I drew the figure to illustrate my
analysis. I think the blocking is right, maybe just not enough to work
fine, because it is possible that vfio's blocking is over, while
hardware reset is not done, results in inaccessible device.

Leave the blocking here is no harm for now, and could be useful in
future(when we handle fatal error).

We don't forward fatal error events to guest, why would guest kernel do
a reset? Or do you mean some device driver would do hardware reset on
non-fatal error?

-- 
Sincerely,
Cao jin

>>>>>
>>>>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>>>>> ---
>>>>>  drivers/vfio/pci/vfio_pci.c | 70 
>>>>> +++--
>>>>>  drivers/vfio/pci/vfio_pci_private.h |  2 ++
>>>>>  2 files changed, 70 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>>>> index 712a849..752af20 100644
>>>>> --- a/drivers/vfio/pci/vfio_pci.c
>>>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>>>> @@ -534,6 +534,15 @@ static long vfio_pci_ioctl(void *device_data,
>>>>>  {
>>>>>   struct vfio_pci_device *vdev = device_data;
>>>>>   unsigned long minsz;
>>>>> + int ret;
>>>>> +
>>>>> + if (vdev->aer_recovering && (cmd == VFIO_DEVICE_SET_IRQS ||
>>>>> + cmd == VFIO_DEVICE_RESET || cmd == VFIO_DEVICE_PCI_HOT_RESET)) {
>>>>> + ret = wait_for_completion_interruptible(
>>>>> + >aer_completion);  
>>>>
>>>> don't split it like that.
>>>>   
>>>>> + if (ret)
>>>>> + return ret;
>>>>> + }
>>>>>  
>>>>>   if (cmd == VFIO_DEVICE_GET_INFO) {
>>>>>   struct vfio_device_info info;
>>>>> @@ -953,6 +962,15 @@ static ssize_t vfio_pci_rw(void *device_data, char 
>>>>> __user *buf,
>>>>>  {
>>>>>   unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>>>   struct vfio_pci_device *vdev = device_data;
>>>>> + int ret;
>>>>> +
>>>>> + /* block all kinds of access during host recovery */
>>>>> + if (vdev->aer_recovering) {
>>>>> + ret = wait_for_completion_interruptible(
>>>>> + >aer_completion);
>>>>> + if (ret)
>>>>> + return ret;
>>>>> + }
>>>>>  
>>>>>   if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
>>>>>   return -EINVAL;
>>>>> @@ -1117,6 +1135,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
>>>>> const struct pci_device_id *id)
>>>>>   vdev->irq_type = VFIO_PCI_NUM_IRQS;
>>>>>   mutex_init(>igate);
>>>>>   spin_lock_i

Re: [Qemu-devel] [PATCH v9 04/11] msix: check msix_init's return value

2017-01-17 Thread Cao jin

On 01/18/2017 12:01 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 17, 2017 at 02:50:38PM +0800, Cao jin wrote:
>> forget to cc maintainers in this new patch
>>
>> On 01/17/2017 02:18 PM, Cao jin wrote:
>>> Doesn't do it for megasas & hcd-xhci, later patches will fix them.
>>>
>>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
> 
> I don't like this one, frankly. That's a bunch of code duplication.

Yes, code duplication, seems inevitable if move the asserts into a
separate patch.

> I suspect vfio is the only one who might reasonably get EINVAL here.
> So how about e.g. msix_validate_and_init that doesn't assert and use that
> from vfio, then switch msix_init to assert instead?
> 

Not sure if I get your idea. Do you mean: do param check via assert in
msix_init(), so that no need check its returned error outside, and
introduce new api msix_validate_and_init(same content as msix_init,
except param check) dedicated to vfio?

If I understand you right, the way we do param check for msi_init[*] &
msix_init will be inconsistent.

[*] patch: msi_init: convert assert to return -errno

-- 
Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v2] vfio/pci: Support error recovery

2017-01-17 Thread Cao jin

Alex,
Do you have any comments on this version & and the qemu parts?

-- 
Sincerely,
Cao jin

On 12/31/2016 05:15 PM, Cao jin wrote:
> Support serious device error recovery
> 
> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
> ---
>  drivers/vfio/pci/vfio_pci.c | 70 
> +++--
>  drivers/vfio/pci/vfio_pci_private.h |  2 ++
>  2 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 712a849..752af20 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -534,6 +534,15 @@ static long vfio_pci_ioctl(void *device_data,
>  {
>   struct vfio_pci_device *vdev = device_data;
>   unsigned long minsz;
> + int ret;
> +
> + if (vdev->aer_recovering && (cmd == VFIO_DEVICE_SET_IRQS ||
> + cmd == VFIO_DEVICE_RESET || cmd == VFIO_DEVICE_PCI_HOT_RESET)) {
> + ret = wait_for_completion_interruptible(
> + >aer_completion);
> + if (ret)
> + return ret;
> + }
>  
>   if (cmd == VFIO_DEVICE_GET_INFO) {
>   struct vfio_device_info info;
> @@ -953,6 +962,15 @@ static ssize_t vfio_pci_rw(void *device_data, char 
> __user *buf,
>  {
>   unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>   struct vfio_pci_device *vdev = device_data;
> + int ret;
> +
> + /* block all kinds of access during host recovery */
> + if (vdev->aer_recovering) {
> + ret = wait_for_completion_interruptible(
> + >aer_completion);
> + if (ret)
> + return ret;
> + }
>  
>   if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
>   return -EINVAL;
> @@ -1117,6 +1135,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   vdev->irq_type = VFIO_PCI_NUM_IRQS;
>   mutex_init(>igate);
>   spin_lock_init(>irqlock);
> + init_completion(>aer_completion);
>  
>   ret = vfio_add_group_dev(>dev, _pci_ops, vdev);
>   if (ret) {
> @@ -1176,6 +1195,9 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  {
>   struct vfio_pci_device *vdev;
>   struct vfio_device *device;
> + u32 uncor_status;
> + unsigned int aer_cap_offset;
> + int ret;
>  
>   device = vfio_device_get_from_dev(>dev);
>   if (device == NULL)
> @@ -1187,10 +1209,29 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>   return PCI_ERS_RESULT_DISCONNECT;
>   }
>  
> + /*
> +  * get device's uncorrectable error status as soon as possible,
> +  * and signal it to user space. The later we read it, the possibility
> +  * the register value is mangled grows.
> +  */
> + aer_cap_offset = pci_find_ext_capability(vdev->pdev, 
> PCI_EXT_CAP_ID_ERR);
> + ret = pci_read_config_dword(vdev->pdev, aer_cap_offset +
> +PCI_ERR_UNCOR_STATUS, _status);
> +if (ret)
> +return PCI_ERS_RESULT_DISCONNECT;
> +
> + pr_info("device %d got AER detect notification. uncorrectable error 
> status = 0x%x\n", pdev->devfn, uncor_status);//to be removed
>   mutex_lock(>igate);
>  
> - if (vdev->err_trigger)
> - eventfd_signal(vdev->err_trigger, 1);
> + vdev->aer_recovering = true;
> + reinit_completion(>aer_completion);
> +
> + if (vdev->err_trigger && uncor_status) {
> + pr_info("device %d signal uncor status 0x%x to user",
> + pdev->devfn, uncor_status);
> + /* signal uncorrectable error status to user space */
> + eventfd_signal(vdev->err_trigger, uncor_status);
> +}
>  
>   mutex_unlock(>igate);
>  
> @@ -1199,8 +1240,33 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>   return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +static void vfio_pci_aer_resume(struct pci_dev *pdev)
> +{
> + struct vfio_pci_device *vdev;
> + struct vfio_device *device;
> +
> + device = vfio_device_get_from_dev(>dev);
> + if (device == NULL)
> + return;
> +
> + vdev = vfio_device_data(device);
> + if (vdev == NULL) {
> + vfio_device_put(device);
> + return;
> + }
> +
> + mutex_lock(>igate);
> + vdev->aer_recovering = false;
> + mutex_unlock(>igate);
> +
> + co

Re: [Qemu-devel] [PATCH v9 04/11] msix: check msix_init's return value

2017-01-16 Thread Cao jin

forget to cc maintainers in this new patch

On 01/17/2017 02:18 PM, Cao jin wrote:
> Doesn't do it for megasas & hcd-xhci, later patches will fix them.
> 
> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
> ---
>  hw/net/e1000e.c|  4 
>  hw/net/rocker/rocker.c |  5 +
>  hw/net/vmxnet3.c   |  6 +-
>  hw/virtio/virtio-pci.c | 13 +++--
>  4 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
> index ed04adce061c..74cbbef30366 100644
> --- a/hw/net/e1000e.c
> +++ b/hw/net/e1000e.c
> @@ -294,6 +294,10 @@ e1000e_init_msix(E1000EState *s)
>  E1000E_MSIX_IDX, E1000E_MSIX_PBA,
>  0xA0, NULL);
>  
> +/* Any error other than -ENOTSUP(board's MSI support is broken)
> + * is a programming error. Fall back to INTx silently on -ENOTSUP */
> +assert(!res || res == -ENOTSUP);
> +
>  if (res < 0) {
>  trace_e1000e_msix_init_fail(res);
>  } else {
> diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
> index 6e70fddee36b..e394fd61fe64 100644
> --- a/hw/net/rocker/rocker.c
> +++ b/hw/net/rocker/rocker.c
> @@ -1264,6 +1264,11 @@ static int rocker_msix_init(Rocker *r)
>  >msix_bar,
>  ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
>  0, _err);
> +
> +/* Any error other than -ENOTSUP(board's MSI support is broken)
> + * is a programming error. */
> +assert(!err || err == -ENOTSUP);
> +
>  if (err) {
>  error_report_err(local_err);
>  return err;
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 7b2971fe5902..a433cc017cb1 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -2193,8 +2193,12 @@ vmxnet3_init_msix(VMXNET3State *s)
>  VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_PBA(s),
>  VMXNET3_MSIX_OFFSET(s), NULL);
>  
> +/* Any error other than -ENOTSUP(board's MSI support is broken)
> + * is a programming error. Fall back to INTx on -ENOTSUP */
> +assert(!res || res == -ENOTSUP);
> +
>  if (0 > res) {
> -VMW_WRPRN("Failed to initialize MSI-X, error %d", res);
> +VMW_WRPRN("Failed to initialize MSI-X, board's MSI support is 
> broken");
>  s->msix_used = false;
>  } else {
>  if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 4c2c4941d245..2417c78c477e 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1670,13 +1670,14 @@ static void virtio_pci_device_plugged(DeviceState *d, 
> Error **errp)
>  
>  if (proxy->nvectors) {
>  int err = msix_init_exclusive_bar(>pci_dev, proxy->nvectors,
> -  proxy->msix_bar_idx, NULL);
> +  proxy->msix_bar_idx, errp);
> +
> +/* Any error other than -ENOTSUP(board's MSI support is broken)
> + * is a programming error. */
> +assert(!err || err == -ENOTSUP);
> +
>  if (err) {
> -/* Notice when a system that supports MSIx can't initialize it */
> -        if (err != -ENOTSUP) {
> -error_report("unable to init msix vectors to %" PRIu32,
> - proxy->nvectors);
> -}
> +error_report_err(*errp);
>  proxy->nvectors = 0;
>  }
>  }
> 

-- 
Sincerely,
Cao jin

[Qemu-devel] [PATCH v9 07/11] megasas: undo the overwrites of msi user configuration

2017-01-16 Thread Cao jin

Commit afea4e14 seems forgetting to undo the overwrites, which is
unsuitable.

CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 14d6e0c6d565..c208d520c4df 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2350,11 +2350,10 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 "msi=off with this machine type.\n");
 error_propagate(errp, err);
 return;
-} else if (ret) {
-/* With msi=auto, we fall back to MSI off silently */
-s->msi = ON_OFF_AUTO_OFF;
-error_free(err);
 }
+assert(!err || s->msix == ON_OFF_AUTO_AUTO);
+/* With msi=auto, we fall back to MSI off silently */
+error_free(err);
 }
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
-- 
2.1.0

[Qemu-devel] [PATCH v9 06/11] hcd-xhci: change behaviour of msix switch

2017-01-16 Thread Cao jin

Resolve the TODO, msix=auto means msix on; if user specify msix=on,
then device creation fail on msix_init failure.

CC: Gerd Hoffmann <kra...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Gerd Hoffmann <kra...@redhat.com>
Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/usb/hcd-xhci.c | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index fe64dd8525c7..aaca57cb5f1f 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3636,12 +3636,14 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 if (xhci->numintrs < 1) {
 xhci->numintrs = 1;
 }
+
 if (xhci->numslots > MAXSLOTS) {
 xhci->numslots = MAXSLOTS;
 }
 if (xhci->numslots < 1) {
 xhci->numslots = 1;
 }
+
 if (xhci_get_flag(xhci, XHCI_FLAG_ENABLE_STREAMS)) {
 xhci->max_pstreams_mask = 7; /* == 256 primary streams */
 } else {
@@ -3669,6 +3671,28 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, 
xhci);
 
 memory_region_init(>mem, OBJECT(xhci), "xhci", LEN_REGS);
+if (xhci->msix != ON_OFF_AUTO_OFF) {
+ret = msix_init(dev, xhci->numintrs,
+>mem, 0, OFF_MSIX_TABLE,
+>mem, 0, OFF_MSIX_PBA,
+0x90, );
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error */
+assert(!ret || ret == -ENOTSUP);
+if (ret && xhci->msix == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msix=on request, fail */
+error_append_hint(, "You have to use msix=auto (default) or "
+"msix=off with this machine type.\n");
+/* No instance_finalize method, need to free the resource here */
+object_unref(OBJECT(>mem));
+error_propagate(errp, err);
+return;
+}
+assert(!err || xhci->msix == ON_OFF_AUTO_AUTO);
+/* With msix=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
 memory_region_init_io(>mem_cap, OBJECT(xhci), _cap_ops, xhci,
   "capabilities", LEN_CAP);
 memory_region_init_io(>mem_oper, OBJECT(xhci), _oper_ops, xhci,
@@ -3701,14 +3725,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 ret = pcie_endpoint_cap_init(dev, 0xa0);
 assert(ret >= 0);
 }
-
-if (xhci->msix != ON_OFF_AUTO_OFF) {
-/* TODO check for errors, and should fail when msix=on */
-msix_init(dev, xhci->numintrs,
-  >mem, 0, OFF_MSIX_TABLE,
-  >mem, 0, OFF_MSIX_PBA,
-  0x90, NULL);
-}
 }
 
 static void usb_xhci_exit(PCIDevice *dev)
-- 
2.1.0

[Qemu-devel] [PATCH v9 09/11] vmxnet3: remove unnecessary internal msix flag

2017-01-16 Thread Cao jin

Internal flag msix_used is unnecessary, it has the same effect as
msix_enabled().

The corresponding msi flag is already dropped in commit 1070048e.

CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Dmitry Fleytman <dmi...@daynix.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/net/vmxnet3.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 45e125e92c8a..af39965c8cc2 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -281,8 +281,6 @@ typedef struct {
 Vmxnet3RxqDescr rxq_descr[VMXNET3_DEVICE_MAX_RX_QUEUES];
 Vmxnet3TxqDescr txq_descr[VMXNET3_DEVICE_MAX_TX_QUEUES];
 
-/* Whether MSI-X support was installed successfully */
-bool msix_used;
 hwaddr drv_shmem;
 hwaddr temp_shared_guest_driver_memory;
 
@@ -359,7 +357,7 @@ static bool _vmxnet3_assert_interrupt_line(VMXNET3State *s, 
uint32_t int_idx)
 {
 PCIDevice *d = PCI_DEVICE(s);
 
-if (s->msix_used && msix_enabled(d)) {
+if (msix_enabled(d)) {
 VMW_IRPRN("Sending MSI-X notification for vector %u", int_idx);
 msix_notify(d, int_idx);
 return false;
@@ -383,7 +381,7 @@ static void _vmxnet3_deassert_interrupt_line(VMXNET3State 
*s, int lidx)
  * This function should never be called for MSI(X) interrupts
  * because deassertion never required for message interrupts
  */
-assert(!s->msix_used || !msix_enabled(d));
+assert(!msix_enabled(d));
 /*
  * This function should never be called for MSI(X) interrupts
  * because deassertion never required for message interrupts
@@ -421,7 +419,7 @@ static void vmxnet3_trigger_interrupt(VMXNET3State *s, int 
lidx)
 s->interrupt_states[lidx].is_pending = true;
 vmxnet3_update_interrupt_line_state(s, lidx);
 
-if (s->msix_used && msix_enabled(d) && s->auto_int_masking) {
+if (msix_enabled(d) && s->auto_int_masking) {
 goto do_automask;
 }
 
@@ -1428,7 +1426,7 @@ static void vmxnet3_update_features(VMXNET3State *s)
 
 static bool vmxnet3_verify_intx(VMXNET3State *s, int intx)
 {
-return s->msix_used || msi_enabled(PCI_DEVICE(s))
+return msix_enabled(PCI_DEVICE(s)) || msi_enabled(PCI_DEVICE(s))
 || intx == pci_get_byte(s->parent_obj.config + PCI_INTERRUPT_PIN) - 1;
 }
 
@@ -1445,18 +1443,18 @@ static void vmxnet3_validate_interrupts(VMXNET3State *s)
 int i;
 
 VMW_CFPRN("Verifying event interrupt index (%d)", s->event_int_idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, s->event_int_idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), 
s->event_int_idx);
 
 for (i = 0; i < s->txq_num; i++) {
 int idx = s->txq_descr[i].intr_idx;
 VMW_CFPRN("Verifying TX queue %d interrupt index (%d)", i, idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), idx);
 }
 
 for (i = 0; i < s->rxq_num; i++) {
 int idx = s->rxq_descr[i].intr_idx;
 VMW_CFPRN("Verifying RX queue %d interrupt index (%d)", i, idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), idx);
 }
 }
 
@@ -2185,6 +2183,7 @@ vmxnet3_use_msix_vectors(VMXNET3State *s, int num_vectors)
 static bool
 vmxnet3_init_msix(VMXNET3State *s)
 {
+bool msix;
 PCIDevice *d = PCI_DEVICE(s);
 int res = msix_init(d, VMXNET3_MAX_INTRS,
 >msix_bar,
@@ -2199,17 +2198,18 @@ vmxnet3_init_msix(VMXNET3State *s)
 
 if (0 > res) {
 VMW_WRPRN("Failed to initialize MSI-X, board's MSI support is broken");
-s->msix_used = false;
+msix = false;
 } else {
 if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
 VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
 msix_uninit(d, >msix_bar, >msix_bar);
-s->msix_used = false;
+msix = false;
 } else {
-s->msix_used = true;
+msix = true;
 }
 }
-return s->msix_used;
+
+return msix;
 }
 
 static void
@@ -2217,7 +2217,7 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
 {
 PCIDevice *d = PCI_DEVICE(s);
 
-if (s->msix_used) {
+if (msix_enabled(d)) {
 vmxnet3_unuse_msix_vectors(s, VMXNET3_MAX_INTRS);
 msix_uninit(d, >msix_bar, >msix_bar);
 }
-- 
2.1.0

[Qemu-devel] [PATCH v9 08/11] vmxnet3: fix reference leak issue

2017-01-16 Thread Cao jin

On migration target, msix_vector_use() will be called in vmxnet3_post_load()
in second time, without a matching second call to msi_vector_unuse(),
which results in vector reference leak.

CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Dmitry Fleytman <dmi...@daynix.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/net/vmxnet3.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index a433cc017cb1..45e125e92c8a 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2552,21 +2552,11 @@ static void vmxnet3_put_rxq_descr(QEMUFile *f, void 
*pv, size_t size)
 static int vmxnet3_post_load(void *opaque, int version_id)
 {
 VMXNET3State *s = opaque;
-PCIDevice *d = PCI_DEVICE(s);
 
 net_tx_pkt_init(>tx_pkt, PCI_DEVICE(s),
 s->max_tx_frags, s->peer_has_vhdr);
 net_rx_pkt_init(>rx_pkt, s->peer_has_vhdr);
 
-if (s->msix_used) {
-if  (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
-VMW_WRPRN("Failed to re-use MSI-X vectors");
-msix_uninit(d, >msix_bar, >msix_bar);
-s->msix_used = false;
-return -1;
-}
-}
-
 vmxnet3_validate_queues(s);
 vmxnet3_validate_interrupts(s);
 
-- 
2.1.0

[Qemu-devel] [PATCH v9 05/11] megasas: change behaviour of msix switch

2017-01-16 Thread Cao jin

Resolve the TODO, msix=auto means msix on; if user specify msix=on,
then device creation fail on msix_init failure.
Also undo the overwrites of user configuration of msix.

CC: Michael S. Tsirkin <m...@redhat.com>
CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 095dba8b36de..14d6e0c6d565 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2359,18 +2359,31 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
   "megasas-mmio", 0x4000);
+if (megasas_use_msix(s)) {
+ret = msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
+>mmio_io, b->mmio_bar, 0x3800, 0x68, );
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error */
+assert(!ret || ret == -ENOTSUP);
+if (ret && s->msix == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msix=on request, fail */
+error_append_hint(, "You have to use msix=auto (default) or "
+"msix=off with this machine type.\n");
+/* No instance_finalize method, need to free the resource here */
+object_unref(OBJECT(>mmio_io));
+error_propagate(errp, err);
+return;
+}
+assert(!err || s->msix == ON_OFF_AUTO_AUTO);
+/* With msix=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
 memory_region_init_io(>port_io, OBJECT(s), _port_ops, s,
   "megasas-io", 256);
 memory_region_init_io(>queue_io, OBJECT(s), _queue_ops, s,
   "megasas-queue", 0x4);
 
-if (megasas_use_msix(s) &&
-msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
-  >mmio_io, b->mmio_bar, 0x3800, 0x68, NULL)) {
-/* TODO: check msix_init's error, and should fail on msix=on */
-s->msix = ON_OFF_AUTO_OFF;
-}
-
 if (pci_is_express(dev)) {
 pcie_endpoint_cap_init(dev, 0xa0);
 }
-- 
2.1.0

[Qemu-devel] [PATCH v9 10/11] msi_init: convert assert to return -errno

2017-01-16 Thread Cao jin

According to the disscussion:
http://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg08215.html

Let leaf function returns reasonable -errno, let caller decide how to
handle the return value.

Suggested-by: Markus Armbruster <arm...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/msi.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index a87b2278a373..af3efbe525ce 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -201,9 +201,12 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
" 64bit %d mask %d\n",
offset, nr_vectors, msi64bit, msi_per_vector_mask);
 
-assert(!(nr_vectors & (nr_vectors - 1)));   /* power of 2 */
-assert(nr_vectors > 0);
-assert(nr_vectors <= PCI_MSI_VECTORS_MAX);
+/* vector sanity test: should in range 1 - 32, should be power of 2 */
+if (!is_power_of_2(nr_vectors) || nr_vectors > PCI_MSI_VECTORS_MAX) {
+error_setg(errp, "Invalid vector number: %d", nr_vectors);
+return -EINVAL;
+}
+
 /* the nr of MSI vectors is up to 32 */
 vectors_order = ctz32(nr_vectors);
 
-- 
2.1.0

[Qemu-devel] [PATCH v9 03/11] pci: Convert msix_init() to Error and fix callers

2017-01-16 Thread Cao jin

msix_init() reports errors with error_report(), which is wrong when
it's used in realize().  The same issue was fixed for msi_init() in
commit 1108b2f. In order to make the API change as small as possible,
leave the return value check to later patch.

For some devices(like e1000e, vmxnet3, nvme) who won't fail because of
msix_init's failure, suppress the error report by passing NULL error
object.

Bonus: add comment for msix_init.

CC: Jiri Pirko <j...@resnulli.us>
CC: Gerd Hoffmann <kra...@redhat.com>
CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Alex Williamson <alex.william...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/block/nvme.c|  2 +-
 hw/misc/ivshmem.c  |  8 
 hw/net/e1000e.c|  2 +-
 hw/net/rocker/rocker.c |  4 +++-
 hw/net/vmxnet3.c   |  2 +-
 hw/pci/msix.c  | 36 +++-
 hw/scsi/megasas.c  |  4 +++-
 hw/usb/hcd-xhci.c  |  4 ++--
 hw/vfio/pci.c  |  8 ++--
 hw/virtio/virtio-pci.c |  4 ++--
 include/hw/pci/msix.h  |  5 +++--
 11 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d479fd22f573..ae91a18f1724 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -872,7 +872,7 @@ static int nvme_init(PCIDevice *pci_dev)
 pci_register_bar(>parent_obj, 0,
 PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
 >iomem);
-msix_init_exclusive_bar(>parent_obj, n->num_queues, 4);
+msix_init_exclusive_bar(>parent_obj, n->num_queues, 4, NULL);
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index abeaf3da0800..70e71a597b9c 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -749,13 +749,13 @@ static void ivshmem_reset(DeviceState *d)
 }
 }
 
-static int ivshmem_setup_interrupts(IVShmemState *s)
+static int ivshmem_setup_interrupts(IVShmemState *s, Error **errp)
 {
 /* allocate QEMU callback data for receiving interrupts */
 s->msi_vectors = g_malloc0(s->vectors * sizeof(MSIVector));
 
 if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
-if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1)) {
+if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1, errp)) {
 return -1;
 }
 
@@ -897,8 +897,8 @@ static void ivshmem_common_realize(PCIDevice *dev, Error 
**errp)
 qemu_chr_fe_set_handlers(>server_chr, ivshmem_can_receive,
  ivshmem_read, NULL, s, NULL, true);
 
-if (ivshmem_setup_interrupts(s) < 0) {
-error_setg(errp, "failed to initialize interrupts");
+if (ivshmem_setup_interrupts(s, errp) < 0) {
+error_prepend(errp, "Failed to initialize interrupts: ");
 return;
 }
 }
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 4994e1ca0062..ed04adce061c 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -292,7 +292,7 @@ e1000e_init_msix(E1000EState *s)
 E1000E_MSIX_IDX, E1000E_MSIX_TABLE,
 >msix,
 E1000E_MSIX_IDX, E1000E_MSIX_PBA,
-0xA0);
+0xA0, NULL);
 
 if (res < 0) {
 trace_e1000e_msix_init_fail(res);
diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index e9d215aa4df1..6e70fddee36b 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1256,14 +1256,16 @@ static int rocker_msix_init(Rocker *r)
 {
 PCIDevice *dev = PCI_DEVICE(r);
 int err;
+Error *local_err = NULL;
 
 err = msix_init(dev, ROCKER_MSIX_VEC_COUNT(r->fp_ports),
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_TABLE_OFFSET,
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
-0);
+0, _err);
 if (err) {
+error_report_err(local_err);
 return err;
 }
 
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 92f6af9620f1..7b2971fe5902 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2191,7 +2191,7 @@ vmxnet3_init_msix(VMXNET3State *s)
 VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_TABLE,
 >msix_bar,
 VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_PBA(s),
-VMXNET3_MSIX_OFFSET(s));
+VMXNET3_MSIX_OFFSET(s), NULL);
 
 if (0 > re

[Qemu-devel] [PATCH v9 11/11] megasas: remove unnecessary megasas_use_msix()

2017-01-16 Thread Cao jin

Also move certain hunk above, to place msix init related code together.

CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index c208d520c4df..73eab7844ee3 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -155,11 +155,6 @@ static bool megasas_use_queue64(MegasasState *s)
 return s->flags & MEGASAS_MASK_USE_QUEUE64;
 }
 
-static bool megasas_use_msix(MegasasState *s)
-{
-return s->msix != ON_OFF_AUTO_OFF;
-}
-
 static bool megasas_is_jbod(MegasasState *s)
 {
 return s->flags & MEGASAS_MASK_USE_JBOD;
@@ -2305,9 +2300,7 @@ static void megasas_scsi_uninit(PCIDevice *d)
 {
 MegasasState *s = MEGASAS(d);
 
-if (megasas_use_msix(s)) {
-msix_uninit(d, >mmio_io, >mmio_io);
-}
+msix_uninit(d, >mmio_io, >mmio_io);
 msi_uninit(d);
 }
 
@@ -2358,7 +2351,7 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
   "megasas-mmio", 0x4000);
-if (megasas_use_msix(s)) {
+if (s->msix != ON_OFF_AUTO_OFF) {
 ret = msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
 >mmio_io, b->mmio_bar, 0x3800, 0x68, );
 /* Any error other than -ENOTSUP(board's MSI support is broken)
@@ -2378,6 +2371,10 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 error_free(err);
 }
 
+if (s->msix != ON_OFF_AUTO_OFF) {
+msix_vector_use(dev, 0);
+}
+
 memory_region_init_io(>port_io, OBJECT(s), _port_ops, s,
   "megasas-io", 256);
 memory_region_init_io(>queue_io, OBJECT(s), _queue_ops, s,
@@ -2393,10 +2390,6 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 pci_register_bar(dev, b->mmio_bar, bar_type, >mmio_io);
 pci_register_bar(dev, 3, bar_type, >queue_io);
 
-if (megasas_use_msix(s)) {
-msix_vector_use(dev, 0);
-}
-
 s->fw_state = MFI_FWSTATE_READY;
 if (!s->sas_addr) {
 s->sas_addr = ((NAA_LOCALLY_ASSIGNED_ID << 24) |
-- 
2.1.0

[Qemu-devel] [PATCH v9 04/11] msix: check msix_init's return value

2017-01-16 Thread Cao jin

Doesn't do it for megasas & hcd-xhci, later patches will fix them.

Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/net/e1000e.c|  4 
 hw/net/rocker/rocker.c |  5 +
 hw/net/vmxnet3.c   |  6 +-
 hw/virtio/virtio-pci.c | 13 +++--
 4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index ed04adce061c..74cbbef30366 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -294,6 +294,10 @@ e1000e_init_msix(E1000EState *s)
 E1000E_MSIX_IDX, E1000E_MSIX_PBA,
 0xA0, NULL);
 
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error. Fall back to INTx silently on -ENOTSUP */
+assert(!res || res == -ENOTSUP);
+
 if (res < 0) {
 trace_e1000e_msix_init_fail(res);
 } else {
diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 6e70fddee36b..e394fd61fe64 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1264,6 +1264,11 @@ static int rocker_msix_init(Rocker *r)
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
 0, _err);
+
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error. */
+assert(!err || err == -ENOTSUP);
+
 if (err) {
 error_report_err(local_err);
 return err;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 7b2971fe5902..a433cc017cb1 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2193,8 +2193,12 @@ vmxnet3_init_msix(VMXNET3State *s)
 VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_PBA(s),
 VMXNET3_MSIX_OFFSET(s), NULL);
 
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error. Fall back to INTx on -ENOTSUP */
+assert(!res || res == -ENOTSUP);
+
 if (0 > res) {
-VMW_WRPRN("Failed to initialize MSI-X, error %d", res);
+VMW_WRPRN("Failed to initialize MSI-X, board's MSI support is broken");
 s->msix_used = false;
 } else {
 if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 4c2c4941d245..2417c78c477e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1670,13 +1670,14 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 
 if (proxy->nvectors) {
 int err = msix_init_exclusive_bar(>pci_dev, proxy->nvectors,
-  proxy->msix_bar_idx, NULL);
+  proxy->msix_bar_idx, errp);
+
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error. */
+assert(!err || err == -ENOTSUP);
+
 if (err) {
-/* Notice when a system that supports MSIx can't initialize it */
-if (err != -ENOTSUP) {
-error_report("unable to init msix vectors to %" PRIu32,
- proxy->nvectors);
-}
+error_report_err(*errp);
 proxy->nvectors = 0;
 }
 }
-- 
2.1.0

[Qemu-devel] [PATCH v9 02/11] hcd-xhci: check & correct param before using it

2017-01-16 Thread Cao jin

usb_xhci_realize() corrects invalid values of property "intrs"
automatically, but the uncorrected value is passed to msi_init(),
which chokes on invalid values.  Delay that until after the
correction.

Resources allocated by usb_xhci_init() are leaked when msi_init()
fails.  Fix by calling it after msi_init().

CC: Gerd Hoffmann <kra...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/usb/hcd-xhci.c | 37 ++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 4acf0c6dd8c0..0ace273da472 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3627,25 +3627,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
 dev->config[0x60] = 0x30; /* release number */
 
-usb_xhci_init(xhci);
-
-if (xhci->msi != ON_OFF_AUTO_OFF) {
-ret = msi_init(dev, 0x70, xhci->numintrs, true, false, );
-/* Any error other than -ENOTSUP(board's MSI support is broken)
- * is a programming error */
-assert(!ret || ret == -ENOTSUP);
-if (ret && xhci->msi == ON_OFF_AUTO_ON) {
-/* Can't satisfy user's explicit msi=on request, fail */
-error_append_hint(, "You have to use msi=auto (default) or "
-"msi=off with this machine type.\n");
-error_propagate(errp, err);
-return;
-}
-assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
-/* With msi=auto, we fall back to MSI off silently */
-error_free(err);
-}
-
 if (xhci->numintrs > MAXINTRS) {
 xhci->numintrs = MAXINTRS;
 }
@@ -3667,6 +3648,24 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 xhci->max_pstreams_mask = 0;
 }
 
+if (xhci->msi != ON_OFF_AUTO_OFF) {
+ret = msi_init(dev, 0x70, xhci->numintrs, true, false, );
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error */
+assert(!ret || ret == -ENOTSUP);
+if (ret && xhci->msi == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msi=on request, fail */
+error_append_hint(, "You have to use msi=auto (default) or "
+"msi=off with this machine type.\n");
+error_propagate(errp, err);
+return;
+}
+assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
+/* With msi=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
+usb_xhci_init(xhci);
 xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, 
xhci);
 
 memory_region_init(>mem, OBJECT(xhci), "xhci", LEN_REGS);
-- 
2.1.0

[Qemu-devel] [PATCH v9 00/11] Convert msix_init() to error

2017-01-16 Thread Cao jin

v9 changelog:
1. split previous patch 3 into two separate patches(3 & 4), per mst's review.

test:
1. make check ok.
2. detailed test on megasas/megasas-gen2, hcd-xhci, vmxnet3.
   megasas/megasas-gen2(M q35...bus=pcie.0): install a distro
   ./qemu-system-x86_64 --enable-kvm -m 1024
   -device megasas,id=scsi0,bus=pci.0
   -drive file=/xx/scsi-disk.img,if=none,id=drive-scsi0
   -device 
scsi-disk,bus=scsi0.0,channel=0,scsi-id=4,lun=0,drive=drive-scsi0,id=scsi0-4
   -cdrom /xx/Fedora-Server-DVD-x86_64-23.iso -boot once=d --monitor stdio

hcd-xhci: fdisk, mkfs.ext4, write file to usbstick.img
./qemu-system-x86_64 -M q35 --enable-kvm -m 1024
-drive if=none,id=usbstick,file=/xx/usbstick.img
-device nec-usb-xhci,id=usb,p2=8,p3=8,bus=pcie.0
-device usb-storage,bus=usb.0,drive=usbstick
/xx/FedoraServer23-X86_64.img --monitor stdio

vmxnet3: ping another destination belongs to host's network is ok; then
 migrate to another qemu instance on the same host is ok. After
 migration, ping can't work as before, out of the patchset's scope,
 it is the same issue as upstream.
./qemu-system-x86_64 -M q35 --enable-kvm -m 1024
-netdev tap,id=mynet0 -device vmxnet3,netdev=mynet0
/xx/FedoraServer23-X86_64.img --monitor stdio

CC: Jiri Pirko <j...@resnulli.us>
CC: Gerd Hoffmann <kra...@redhat.com>
CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Alex Williamson <alex.william...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Cao jin (11):
  msix: Follow CODING_STYLE
  hcd-xhci: check & correct param before using it
  pci: Convert msix_init() to Error and fix callers
  msix: check msix_init's return value
  megasas: change behaviour of msix switch
  hcd-xhci: change behaviour of msix switch
  megasas: undo the overwrites of msi user configuration
  vmxnet3: fix reference leak issue
  vmxnet3: remove unnecessary internal msix flag
  msi_init: convert assert to return -errno
  megasas: remove unnecessary megasas_use_msix()

 hw/block/nvme.c|  2 +-
 hw/misc/ivshmem.c  |  8 +++---
 hw/net/e1000e.c|  6 -
 hw/net/rocker/rocker.c |  9 ++-
 hw/net/vmxnet3.c   | 46 +++--
 hw/pci/msi.c   |  9 ---
 hw/pci/msix.c  | 44 +++-
 hw/scsi/megasas.c  | 49 ---
 hw/usb/hcd-xhci.c  | 69 ++
 hw/vfio/pci.c  |  8 --
 hw/virtio/virtio-pci.c | 13 +-
 include/hw/pci/msix.h  |  5 ++--
 12 files changed, 167 insertions(+), 101 deletions(-)

-- 
2.1.0

[Qemu-devel] [PATCH v9 01/11] msix: Follow CODING_STYLE

2017-01-16 Thread Cao jin

CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/msix.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 0ec1cb14fc60..0cee631ecc55 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -447,8 +447,10 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 {
 MSIMessage msg;
 
-if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) {
 return;
+}
+
 if (msix_is_masked(dev, vector)) {
 msix_set_pending(dev, vector);
 return;
@@ -483,8 +485,10 @@ void msix_reset(PCIDevice *dev)
 /* Mark vector as used. */
 int msix_vector_use(PCIDevice *dev, unsigned vector)
 {
-if (vector >= dev->msix_entries_nr)
+if (vector >= dev->msix_entries_nr) {
 return -EINVAL;
+}
+
 dev->msix_entry_used[vector]++;
 return 0;
 }
-- 
2.1.0

Re: [Qemu-devel] [PATCH v8 00/10] Convert msix_init() to error

2017-01-13 Thread Cao jin



On 01/13/2017 04:22 PM, Markus Armbruster wrote:
> Cao jin <caoj.f...@cn.fujitsu.com> writes:
> 
>> Only a tiny modification in patch "megasas: remove unnecessary
>> megasas_use_msix()" to fix a megasas issue.
> 
> Please have a look at Michael's review in
> Message-ID: <20170112163519-mutt-send-email-...@kernel.org>
> 

Oh...thanks for reminding, I missed that mail...

>> v8 changelog:
>> 1. reorder: place the "megasas: remove unnecessary megasas_use_msix()"
>>as the last one. and fix the bug in it, detailed description in it,
>>also removed the R-b of it.
>> 2. Add the Acked-by from Marcel for first 9 patches; add the R-b from Markus
>>to "hcd-xhci: check & correct param before using it".
>>
>> Test:
>> 1. make check ok
>> 2. command line test for all affected device, make sure their realization
>>is ok.
>> 3. detailed test for megasas, hcd-xhci, vmxnet3.
>>megasas: install a linux distro is ok
>>./qemu-system-x86_64 --enable-kvm -m 1024
>>-device megasas,id=scsi0,bus=pci.0
>>-drive file=/xx/scsi-disk.img,if=none,id=drive-scsi0
>>-device 
>> scsi-disk,bus=scsi0.0,channel=0,scsi-id=4,lun=0,drive=drive-scsi0,id=scsi0-4
>>-cdrom /xx/Fedora-Server-DVD-x86_64-23.iso --monitor stdio
>>
>>hcd-xhci: partition the usbstick.img, mkfs, write to file, is ok
>>./qemu-system-x86_64 -M q35 -m 1024 --enable-kvm
>>-drive if=none,id=usbstick,file=/xx/usbstick.img
>>-device nec-usb-xhci,id=usb,p2=8,p3=8,bus=pcie.0
>>-device usb-storage,bus=usb.0,drive=usbstick --monitor stdio
>>/xx/FedoraWorkStatsion23-x86_64.img
>>
>>vmxnet3: ping another destination belongs to host's network is ok.
>>But no migration test, because I don't have a spare machine for now.
>>./qemu-system-x86_64 -M q35 -m 1024 --enable-kvm
>>-netdev tap,id=mynet0 -device vmxnet3,netdev=mynet0
>>--monitor stdio /home/pino/vm/FedoraWorkStatsion23-x86_64.img
> 
> You could migrate to a new QEMU on the same machine.
> 
> Or, if you don't want to deal with two instances of QEMU running at the
> same time, migrate to file:
> 
> (qemu) migrate "exec:cat >zzz"
> 
> Start the target with -incoming "exec:cat zzz".  Better than nothing.
> 

Thanks for the info, I will try

-- 
Sincerely,
Cao jin

[Qemu-devel] [PATCH v8 07/10] vmxnet3: fix reference leak issue

2017-01-12 Thread Cao jin

On migration target, msix_vector_use() will be called in vmxnet3_post_load()
in second time, without a matching second call to msi_vector_unuse(),
which results in vector reference leak.

CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Dmitry Fleytman <dmi...@daynix.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/net/vmxnet3.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index a433cc017cb1..45e125e92c8a 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2552,21 +2552,11 @@ static void vmxnet3_put_rxq_descr(QEMUFile *f, void 
*pv, size_t size)
 static int vmxnet3_post_load(void *opaque, int version_id)
 {
 VMXNET3State *s = opaque;
-PCIDevice *d = PCI_DEVICE(s);
 
 net_tx_pkt_init(>tx_pkt, PCI_DEVICE(s),
 s->max_tx_frags, s->peer_has_vhdr);
 net_rx_pkt_init(>rx_pkt, s->peer_has_vhdr);
 
-if (s->msix_used) {
-if  (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
-VMW_WRPRN("Failed to re-use MSI-X vectors");
-msix_uninit(d, >msix_bar, >msix_bar);
-s->msix_used = false;
-return -1;
-}
-}
-
 vmxnet3_validate_queues(s);
 vmxnet3_validate_interrupts(s);
 
-- 
2.1.0

[Qemu-devel] [PATCH v8 08/10] vmxnet3: remove unnecessary internal msix flag

2017-01-12 Thread Cao jin

Internal flag msix_used is unnecessary, it has the same effect as
msix_enabled().

The corresponding msi flag is already dropped in commit 1070048e.

CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Dmitry Fleytman <dmi...@daynix.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/net/vmxnet3.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 45e125e92c8a..af39965c8cc2 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -281,8 +281,6 @@ typedef struct {
 Vmxnet3RxqDescr rxq_descr[VMXNET3_DEVICE_MAX_RX_QUEUES];
 Vmxnet3TxqDescr txq_descr[VMXNET3_DEVICE_MAX_TX_QUEUES];
 
-/* Whether MSI-X support was installed successfully */
-bool msix_used;
 hwaddr drv_shmem;
 hwaddr temp_shared_guest_driver_memory;
 
@@ -359,7 +357,7 @@ static bool _vmxnet3_assert_interrupt_line(VMXNET3State *s, 
uint32_t int_idx)
 {
 PCIDevice *d = PCI_DEVICE(s);
 
-if (s->msix_used && msix_enabled(d)) {
+if (msix_enabled(d)) {
 VMW_IRPRN("Sending MSI-X notification for vector %u", int_idx);
 msix_notify(d, int_idx);
 return false;
@@ -383,7 +381,7 @@ static void _vmxnet3_deassert_interrupt_line(VMXNET3State 
*s, int lidx)
  * This function should never be called for MSI(X) interrupts
  * because deassertion never required for message interrupts
  */
-assert(!s->msix_used || !msix_enabled(d));
+assert(!msix_enabled(d));
 /*
  * This function should never be called for MSI(X) interrupts
  * because deassertion never required for message interrupts
@@ -421,7 +419,7 @@ static void vmxnet3_trigger_interrupt(VMXNET3State *s, int 
lidx)
 s->interrupt_states[lidx].is_pending = true;
 vmxnet3_update_interrupt_line_state(s, lidx);
 
-if (s->msix_used && msix_enabled(d) && s->auto_int_masking) {
+if (msix_enabled(d) && s->auto_int_masking) {
 goto do_automask;
 }
 
@@ -1428,7 +1426,7 @@ static void vmxnet3_update_features(VMXNET3State *s)
 
 static bool vmxnet3_verify_intx(VMXNET3State *s, int intx)
 {
-return s->msix_used || msi_enabled(PCI_DEVICE(s))
+return msix_enabled(PCI_DEVICE(s)) || msi_enabled(PCI_DEVICE(s))
 || intx == pci_get_byte(s->parent_obj.config + PCI_INTERRUPT_PIN) - 1;
 }
 
@@ -1445,18 +1443,18 @@ static void vmxnet3_validate_interrupts(VMXNET3State *s)
 int i;
 
 VMW_CFPRN("Verifying event interrupt index (%d)", s->event_int_idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, s->event_int_idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), 
s->event_int_idx);
 
 for (i = 0; i < s->txq_num; i++) {
 int idx = s->txq_descr[i].intr_idx;
 VMW_CFPRN("Verifying TX queue %d interrupt index (%d)", i, idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), idx);
 }
 
 for (i = 0; i < s->rxq_num; i++) {
 int idx = s->rxq_descr[i].intr_idx;
 VMW_CFPRN("Verifying RX queue %d interrupt index (%d)", i, idx);
-vmxnet3_validate_interrupt_idx(s->msix_used, idx);
+vmxnet3_validate_interrupt_idx(msix_enabled(PCI_DEVICE(s)), idx);
 }
 }
 
@@ -2185,6 +2183,7 @@ vmxnet3_use_msix_vectors(VMXNET3State *s, int num_vectors)
 static bool
 vmxnet3_init_msix(VMXNET3State *s)
 {
+bool msix;
 PCIDevice *d = PCI_DEVICE(s);
 int res = msix_init(d, VMXNET3_MAX_INTRS,
 >msix_bar,
@@ -2199,17 +2198,18 @@ vmxnet3_init_msix(VMXNET3State *s)
 
 if (0 > res) {
 VMW_WRPRN("Failed to initialize MSI-X, board's MSI support is broken");
-s->msix_used = false;
+msix = false;
 } else {
 if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
 VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
 msix_uninit(d, >msix_bar, >msix_bar);
-s->msix_used = false;
+msix = false;
 } else {
-s->msix_used = true;
+msix = true;
 }
 }
-return s->msix_used;
+
+return msix;
 }
 
 static void
@@ -2217,7 +2217,7 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
 {
 PCIDevice *d = PCI_DEVICE(s);
 
-if (s->msix_used) {
+if (msix_enabled(d)) {
 vmxnet3_unuse_msix_vectors(s, VMXNET3_MAX_INTRS);
 msix_uninit(d, >msix_bar, >msix_bar);
 }
-- 
2.1.0

[Qemu-devel] [PATCH v8 03/10] pci: Convert msix_init() to Error and fix callers to check it

2017-01-12 Thread Cao jin

msix_init() reports errors with error_report(), which is wrong when
it's used in realize().  The same issue was fixed for msi_init() in
commit 1108b2f.

For some devices(like e1000e, vmxnet3) who won't fail because of
msix_init's failure, suppress the error report by passing NULL error object.

Bonus: add comment for msix_init.

CC: Jiri Pirko <j...@resnulli.us>
CC: Gerd Hoffmann <kra...@redhat.com>
CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Alex Williamson <alex.william...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/block/nvme.c|  5 -
 hw/misc/ivshmem.c  |  8 
 hw/net/e1000e.c|  6 +-
 hw/net/rocker/rocker.c |  7 ++-
 hw/net/vmxnet3.c   |  8 ++--
 hw/pci/msix.c  | 34 +-
 hw/scsi/megasas.c  |  5 -
 hw/usb/hcd-xhci.c  | 13 -
 hw/vfio/pci.c  |  8 ++--
 hw/virtio/virtio-pci.c | 11 +--
 include/hw/pci/msix.h  |  5 +++--
 11 files changed, 80 insertions(+), 30 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d479fd22f573..2d703c8a712a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -831,6 +831,7 @@ static int nvme_init(PCIDevice *pci_dev)
 {
 NvmeCtrl *n = NVME(pci_dev);
 NvmeIdCtrl *id = >id_ctrl;
+Error *err = NULL;
 
 int i;
 int64_t bs_size;
@@ -872,7 +873,9 @@ static int nvme_init(PCIDevice *pci_dev)
 pci_register_bar(>parent_obj, 0,
 PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
 >iomem);
-msix_init_exclusive_bar(>parent_obj, n->num_queues, 4);
+if (msix_init_exclusive_bar(>parent_obj, n->num_queues, 4, )) {
+error_report_err(err);
+}
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index abeaf3da0800..70e71a597b9c 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -749,13 +749,13 @@ static void ivshmem_reset(DeviceState *d)
 }
 }
 
-static int ivshmem_setup_interrupts(IVShmemState *s)
+static int ivshmem_setup_interrupts(IVShmemState *s, Error **errp)
 {
 /* allocate QEMU callback data for receiving interrupts */
 s->msi_vectors = g_malloc0(s->vectors * sizeof(MSIVector));
 
 if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
-if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1)) {
+if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1, errp)) {
 return -1;
 }
 
@@ -897,8 +897,8 @@ static void ivshmem_common_realize(PCIDevice *dev, Error 
**errp)
 qemu_chr_fe_set_handlers(>server_chr, ivshmem_can_receive,
  ivshmem_read, NULL, s, NULL, true);
 
-if (ivshmem_setup_interrupts(s) < 0) {
-error_setg(errp, "failed to initialize interrupts");
+if (ivshmem_setup_interrupts(s, errp) < 0) {
+error_prepend(errp, "Failed to initialize interrupts: ");
 return;
 }
 }
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 4994e1ca0062..74cbbef30366 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -292,7 +292,11 @@ e1000e_init_msix(E1000EState *s)
 E1000E_MSIX_IDX, E1000E_MSIX_TABLE,
 >msix,
 E1000E_MSIX_IDX, E1000E_MSIX_PBA,
-0xA0);
+0xA0, NULL);
+
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error. Fall back to INTx silently on -ENOTSUP */
+assert(!res || res == -ENOTSUP);
 
 if (res < 0) {
 trace_e1000e_msix_init_fail(res);
diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index e9d215aa4df1..8f829f2946d8 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1256,14 +1256,19 @@ static int rocker_msix_init(Rocker *r)
 {
 PCIDevice *dev = PCI_DEVICE(r);
 int err;
+Error *local_err = NULL;
 
 err = msix_init(dev, ROCKER_MSIX_VEC_COUNT(r->fp_ports),
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_TABLE_OFFSET,
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
-0);
+0, _err);
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * i

[Qemu-devel] [PATCH v8 10/10] megasas: remove unnecessary megasas_use_msix()

2017-01-12 Thread Cao jin

Also move certain hunk above, to place msix init related code together.

CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

msix_init() doesn't set the MSI-X enable bit, so use msix_enabled()
is not right here, restore the old check without the
megasas_use_msix() wrapper.

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index c208d520c4df..73eab7844ee3 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -155,11 +155,6 @@ static bool megasas_use_queue64(MegasasState *s)
 return s->flags & MEGASAS_MASK_USE_QUEUE64;
 }
 
-static bool megasas_use_msix(MegasasState *s)
-{
-return s->msix != ON_OFF_AUTO_OFF;
-}
-
 static bool megasas_is_jbod(MegasasState *s)
 {
 return s->flags & MEGASAS_MASK_USE_JBOD;
@@ -2305,9 +2300,7 @@ static void megasas_scsi_uninit(PCIDevice *d)
 {
 MegasasState *s = MEGASAS(d);
 
-if (megasas_use_msix(s)) {
-msix_uninit(d, >mmio_io, >mmio_io);
-}
+msix_uninit(d, >mmio_io, >mmio_io);
 msi_uninit(d);
 }
 
@@ -2358,7 +2351,7 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
   "megasas-mmio", 0x4000);
-if (megasas_use_msix(s)) {
+if (s->msix != ON_OFF_AUTO_OFF) {
 ret = msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
 >mmio_io, b->mmio_bar, 0x3800, 0x68, );
 /* Any error other than -ENOTSUP(board's MSI support is broken)
@@ -2378,6 +2371,10 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 error_free(err);
 }
 
+if (s->msix != ON_OFF_AUTO_OFF) {
+msix_vector_use(dev, 0);
+}
+
 memory_region_init_io(>port_io, OBJECT(s), _port_ops, s,
   "megasas-io", 256);
 memory_region_init_io(>queue_io, OBJECT(s), _queue_ops, s,
@@ -2393,10 +2390,6 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 pci_register_bar(dev, b->mmio_bar, bar_type, >mmio_io);
 pci_register_bar(dev, 3, bar_type, >queue_io);
 
-if (megasas_use_msix(s)) {
-msix_vector_use(dev, 0);
-}
-
 s->fw_state = MFI_FWSTATE_READY;
 if (!s->sas_addr) {
 s->sas_addr = ((NAA_LOCALLY_ASSIGNED_ID << 24) |
-- 
2.1.0

[Qemu-devel] [PATCH v8 04/10] megasas: change behaviour of msix switch

2017-01-12 Thread Cao jin

Resolve the TODO, msix=auto means msix on; if user specify msix=on,
then device creation fail on msix_init failure.
Also undo the overwrites of user configuration of msix.

CC: Michael S. Tsirkin <m...@redhat.com>
CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index a946424ab234..14d6e0c6d565 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2359,19 +2359,31 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
   "megasas-mmio", 0x4000);
+if (megasas_use_msix(s)) {
+ret = msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
+>mmio_io, b->mmio_bar, 0x3800, 0x68, );
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error */
+assert(!ret || ret == -ENOTSUP);
+if (ret && s->msix == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msix=on request, fail */
+error_append_hint(, "You have to use msix=auto (default) or "
+"msix=off with this machine type.\n");
+/* No instance_finalize method, need to free the resource here */
+object_unref(OBJECT(>mmio_io));
+error_propagate(errp, err);
+return;
+}
+assert(!err || s->msix == ON_OFF_AUTO_AUTO);
+/* With msix=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
 memory_region_init_io(>port_io, OBJECT(s), _port_ops, s,
   "megasas-io", 256);
 memory_region_init_io(>queue_io, OBJECT(s), _queue_ops, s,
   "megasas-queue", 0x4);
 
-if (megasas_use_msix(s) &&
-msix_init(dev, 15, >mmio_io, b->mmio_bar, 0x2000,
-  >mmio_io, b->mmio_bar, 0x3800, 0x68, )) {
-/*TODO: check msix_init's error, and should fail on msix=on */
-error_report_err(err);
-s->msix = ON_OFF_AUTO_OFF;
-}
-
 if (pci_is_express(dev)) {
 pcie_endpoint_cap_init(dev, 0xa0);
 }
-- 
2.1.0

[Qemu-devel] [PATCH v8 05/10] hcd-xhci: change behaviour of msix switch

2017-01-12 Thread Cao jin

Resolve the TODO, msix=auto means msix on; if user specify msix=on,
then device creation fail on msix_init failure.

CC: Gerd Hoffmann <kra...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Gerd Hoffmann <kra...@redhat.com>
Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/usb/hcd-xhci.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 153b006ca01b..aaca57cb5f1f 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3636,12 +3636,14 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 if (xhci->numintrs < 1) {
 xhci->numintrs = 1;
 }
+
 if (xhci->numslots > MAXSLOTS) {
 xhci->numslots = MAXSLOTS;
 }
 if (xhci->numslots < 1) {
 xhci->numslots = 1;
 }
+
 if (xhci_get_flag(xhci, XHCI_FLAG_ENABLE_STREAMS)) {
 xhci->max_pstreams_mask = 7; /* == 256 primary streams */
 } else {
@@ -3669,6 +3671,28 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, 
xhci);
 
 memory_region_init(>mem, OBJECT(xhci), "xhci", LEN_REGS);
+if (xhci->msix != ON_OFF_AUTO_OFF) {
+ret = msix_init(dev, xhci->numintrs,
+>mem, 0, OFF_MSIX_TABLE,
+>mem, 0, OFF_MSIX_PBA,
+0x90, );
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error */
+assert(!ret || ret == -ENOTSUP);
+if (ret && xhci->msix == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msix=on request, fail */
+error_append_hint(, "You have to use msix=auto (default) or "
+"msix=off with this machine type.\n");
+/* No instance_finalize method, need to free the resource here */
+object_unref(OBJECT(>mem));
+error_propagate(errp, err);
+return;
+}
+assert(!err || xhci->msix == ON_OFF_AUTO_AUTO);
+/* With msix=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
 memory_region_init_io(>mem_cap, OBJECT(xhci), _cap_ops, xhci,
   "capabilities", LEN_CAP);
 memory_region_init_io(>mem_oper, OBJECT(xhci), _oper_ops, xhci,
@@ -3701,17 +3725,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 ret = pcie_endpoint_cap_init(dev, 0xa0);
 assert(ret >= 0);
 }
-
-if (xhci->msix != ON_OFF_AUTO_OFF) {
-/* TODO check for errors, and should fail when msix=on */
-ret = msix_init(dev, xhci->numintrs,
->mem, 0, OFF_MSIX_TABLE,
->mem, 0, OFF_MSIX_PBA,
-0x90, );
-if (ret) {
-error_report_err(err);
-}
-}
 }
 
 static void usb_xhci_exit(PCIDevice *dev)
-- 
2.1.0

[Qemu-devel] [PATCH v8 06/10] megasas: undo the overwrites of msi user configuration

2017-01-12 Thread Cao jin

Commit afea4e14 seems forgetting to undo the overwrites, which is
unsuitable.

CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 14d6e0c6d565..c208d520c4df 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2350,11 +2350,10 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
 "msi=off with this machine type.\n");
 error_propagate(errp, err);
 return;
-} else if (ret) {
-/* With msi=auto, we fall back to MSI off silently */
-s->msi = ON_OFF_AUTO_OFF;
-error_free(err);
 }
+assert(!err || s->msix == ON_OFF_AUTO_AUTO);
+/* With msi=auto, we fall back to MSI off silently */
+error_free(err);
 }
 
 memory_region_init_io(>mmio_io, OBJECT(s), _mmio_ops, s,
-- 
2.1.0

[Qemu-devel] [PATCH v8 09/10] msi_init: convert assert to return -errno

2017-01-12 Thread Cao jin

According to the disscussion:
http://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg08215.html

Let leaf function returns reasonable -errno, let caller decide how to
handle the return value.

Suggested-by: Markus Armbruster <arm...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Marcel Apfelbaum <mar...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/msi.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index a87b2278a373..af3efbe525ce 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -201,9 +201,12 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
" 64bit %d mask %d\n",
offset, nr_vectors, msi64bit, msi_per_vector_mask);
 
-assert(!(nr_vectors & (nr_vectors - 1)));   /* power of 2 */
-assert(nr_vectors > 0);
-assert(nr_vectors <= PCI_MSI_VECTORS_MAX);
+/* vector sanity test: should in range 1 - 32, should be power of 2 */
+if (!is_power_of_2(nr_vectors) || nr_vectors > PCI_MSI_VECTORS_MAX) {
+error_setg(errp, "Invalid vector number: %d", nr_vectors);
+return -EINVAL;
+}
+
 /* the nr of MSI vectors is up to 32 */
 vectors_order = ctz32(nr_vectors);
 
-- 
2.1.0

[Qemu-devel] [PATCH v8 00/10] Convert msix_init() to error

2017-01-12 Thread Cao jin

Only a tiny modification in patch "megasas: remove unnecessary
megasas_use_msix()" to fix a megasas issue.

v8 changelog:
1. reorder: place the "megasas: remove unnecessary megasas_use_msix()"
   as the last one. and fix the bug in it, detailed description in it,
   also removed the R-b of it.
2. Add the Acked-by from Marcel for first 9 patches; add the R-b from Markus
   to "hcd-xhci: check & correct param before using it".

Test:
1. make check ok
2. command line test for all affected device, make sure their realization
   is ok.
3. detailed test for megasas, hcd-xhci, vmxnet3.
   megasas: install a linux distro is ok
   ./qemu-system-x86_64 --enable-kvm -m 1024
   -device megasas,id=scsi0,bus=pci.0
   -drive file=/xx/scsi-disk.img,if=none,id=drive-scsi0
   -device 
scsi-disk,bus=scsi0.0,channel=0,scsi-id=4,lun=0,drive=drive-scsi0,id=scsi0-4
   -cdrom /xx/Fedora-Server-DVD-x86_64-23.iso --monitor stdio

   hcd-xhci: partition the usbstick.img, mkfs, write to file, is ok
   ./qemu-system-x86_64 -M q35 -m 1024 --enable-kvm
   -drive if=none,id=usbstick,file=/xx/usbstick.img
   -device nec-usb-xhci,id=usb,p2=8,p3=8,bus=pcie.0
   -device usb-storage,bus=usb.0,drive=usbstick --monitor stdio
   /xx/FedoraWorkStatsion23-x86_64.img

   vmxnet3: ping another destination belongs to host's network is ok.
   But no migration test, because I don't have a spare machine for now.
   ./qemu-system-x86_64 -M q35 -m 1024 --enable-kvm
   -netdev tap,id=mynet0 -device vmxnet3,netdev=mynet0
   --monitor stdio /home/pino/vm/FedoraWorkStatsion23-x86_64.img

CC: Jiri Pirko <j...@resnulli.us>
CC: Gerd Hoffmann <kra...@redhat.com>
CC: Dmitry Fleytman <dmi...@daynix.com>
CC: Jason Wang <jasow...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Hannes Reinecke <h...@suse.de>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Alex Williamson <alex.william...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>

Cao jin (10):
  msix: Follow CODING_STYLE
  hcd-xhci: check & correct param before using it
  pci: Convert msix_init() to Error and fix callers to check it
  megasas: change behaviour of msix switch
  hcd-xhci: change behaviour of msix switch
  megasas: undo the overwrites of msi user configuration
  vmxnet3: fix reference leak issue
  vmxnet3: remove unnecessary internal msix flag
  msi_init: convert assert to return -errno
  megasas: remove unnecessary megasas_use_msix()

 hw/block/nvme.c|  5 +++-
 hw/misc/ivshmem.c  |  8 +++---
 hw/net/e1000e.c|  6 -
 hw/net/rocker/rocker.c |  7 -
 hw/net/vmxnet3.c   | 46 +++--
 hw/pci/msi.c   |  9 ---
 hw/pci/msix.c  | 42 +-
 hw/scsi/megasas.c  | 49 ---
 hw/usb/hcd-xhci.c  | 69 ++
 hw/vfio/pci.c  |  8 --
 hw/virtio/virtio-pci.c | 11 
 include/hw/pci/msix.h  |  5 ++--
 12 files changed, 164 insertions(+), 101 deletions(-)

-- 
2.1.0

[Qemu-devel] [PATCH v8 02/10] hcd-xhci: check & correct param before using it

2017-01-12 Thread Cao jin

usb_xhci_realize() corrects invalid values of property "intrs"
automatically, but the uncorrected value is passed to msi_init(),
which chokes on invalid values.  Delay that until after the
correction.

Resources allocated by usb_xhci_init() are leaked when msi_init()
fails.  Fix by calling it after msi_init().

CC: Gerd Hoffmann <kra...@redhat.com>
CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>
Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/usb/hcd-xhci.c | 37 ++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 4acf0c6dd8c0..0ace273da472 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3627,25 +3627,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
 dev->config[0x60] = 0x30; /* release number */
 
-usb_xhci_init(xhci);
-
-if (xhci->msi != ON_OFF_AUTO_OFF) {
-ret = msi_init(dev, 0x70, xhci->numintrs, true, false, );
-/* Any error other than -ENOTSUP(board's MSI support is broken)
- * is a programming error */
-assert(!ret || ret == -ENOTSUP);
-if (ret && xhci->msi == ON_OFF_AUTO_ON) {
-/* Can't satisfy user's explicit msi=on request, fail */
-error_append_hint(, "You have to use msi=auto (default) or "
-"msi=off with this machine type.\n");
-error_propagate(errp, err);
-return;
-}
-assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
-/* With msi=auto, we fall back to MSI off silently */
-error_free(err);
-}
-
 if (xhci->numintrs > MAXINTRS) {
 xhci->numintrs = MAXINTRS;
 }
@@ -3667,6 +3648,24 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 xhci->max_pstreams_mask = 0;
 }
 
+if (xhci->msi != ON_OFF_AUTO_OFF) {
+ret = msi_init(dev, 0x70, xhci->numintrs, true, false, );
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error */
+assert(!ret || ret == -ENOTSUP);
+if (ret && xhci->msi == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msi=on request, fail */
+error_append_hint(, "You have to use msi=auto (default) or "
+"msi=off with this machine type.\n");
+error_propagate(errp, err);
+return;
+}
+assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
+/* With msi=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
+usb_xhci_init(xhci);
 xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, 
xhci);
 
 memory_region_init(>mem, OBJECT(xhci), "xhci", LEN_REGS);
-- 
2.1.0

[Qemu-devel] [PATCH v8 01/10] msix: Follow CODING_STYLE

2017-01-12 Thread Cao jin

CC: Markus Armbruster <arm...@redhat.com>
CC: Marcel Apfelbaum <mar...@redhat.com>
CC: Michael S. Tsirkin <m...@redhat.com>

Reviewed-by: Markus Armbruster <arm...@redhat.com>
Acked-by: Marcel Apfelbaum <mar...@redhat.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/pci/msix.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 0ec1cb14fc60..0cee631ecc55 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -447,8 +447,10 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 {
 MSIMessage msg;
 
-if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) {
 return;
+}
+
 if (msix_is_masked(dev, vector)) {
 msix_set_pending(dev, vector);
 return;
@@ -483,8 +485,10 @@ void msix_reset(PCIDevice *dev)
 /* Mark vector as used. */
 int msix_vector_use(PCIDevice *dev, unsigned vector)
 {
-if (vector >= dev->msix_entries_nr)
+if (vector >= dev->msix_entries_nr) {
 return -EINVAL;
+}
+
 dev->msix_entry_used[vector]++;
 return 0;
 }
-- 
2.1.0

[Qemu-devel] [PATCH] doc/usb2: fix typo

2017-01-11 Thread Cao jin

Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 docs/usb2.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/usb2.txt b/docs/usb2.txt
index c7a445afcd55..b9e75480737c 100644
--- a/docs/usb2.txt
+++ b/docs/usb2.txt
@@ -19,7 +19,7 @@ the controller so the USB 2.0 bus gets a individual name, for 
example
 '-device usb-ehci,id=ehci".  This will give you a USB 2.0 bus named
 "ehci.0".
 
-I strongly recomment to also use -device to attach usb devices because
+I strongly recommend to also use -device to attach usb devices because
 you can specify the bus they should be attached to this way.  Here is
 a complete example:
 
-- 
2.1.0

Re: [Qemu-devel] [PATCH RFC] migration: set cpu throttle value by workload

2017-01-11 Thread Cao jin

Hi,
We have been waiting for a long time on this topic，we have interests in
improving the migration performance, and we think this could benefit in
certain condition like heavy work load, the throttle value is a dynamic
value than fixed increment. Your comments would be important to us,
thanks in advance.

-- 
Sincerely,
Cao jin

On 12/29/2016 05:16 PM, Chao Fan wrote:
> This RFC PATCH is my demo about the new feature, here is my POC mail:
> https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00646.html
> 
> When migration_bitmap_sync executed, get the time and read bitmap to
> calculate how many dirty pages born between two sync.
> Use inst_dirty_pages / (time_now - time_prev) / ram_size to get
> inst_dirty_pages_rate. Then map from the inst_dirty_pages_rate
> to cpu throttle value. I have no idea how to map it. So I just do
> that in a simple way. The mapping way is just a guess and should
> be improved.
> 
> This is just a demo. There are more methods.
> 1.In another file, calculate the inst_dirty_pages_rate every second
>   or two seconds or another fixed time. Then set the cpu throttle
>   value according to the inst_dirty_pages_rate
> 2.When inst_dirty_pages_rate gets a threshold, begin cpu throttle
>   and set the throttle value.
> 
> Any comments will be welcome.
> 
> Signed-off-by: Chao Fan <fanc.f...@cn.fujitsu.com>
> ---
>  include/qemu/bitmap.h | 17 +
>  migration/ram.c   | 49 +
>  2 files changed, 66 insertions(+)
> 
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index 63ea2d0..dc99f9b 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -235,4 +235,21 @@ static inline unsigned long *bitmap_zero_extend(unsigned 
> long *old,
>  return new;
>  }
>  
> +static inline unsigned long bitmap_weight(const unsigned long *src, long 
> nbits)
> +{
> +unsigned long i, count = 0, nlong = nbits / BITS_PER_LONG;
> +
> +if (small_nbits(nbits)) {
> +return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
> +}
> +for (i = 0; i < nlong; i++) {
> +count += hweight_long(src[i]);
> +}
> +if (nbits % BITS_PER_LONG) {
> +count += hweight_long(src[i] & BITMAP_LAST_WORD_MASK(nbits));
> +}
> +
> +return count;
> +}
> +
>  #endif /* BITMAP_H */
> diff --git a/migration/ram.c b/migration/ram.c
> index a1c8089..f96e3e3 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -44,6 +44,7 @@
>  #include "exec/ram_addr.h"
>  #include "qemu/rcu_queue.h"
>  #include "migration/colo.h"
> +#include "hw/boards.h"
>  
>  #ifdef DEBUG_MIGRATION_RAM
>  #define DPRINTF(fmt, ...) \
> @@ -599,6 +600,9 @@ static int64_t num_dirty_pages_period;
>  static uint64_t xbzrle_cache_miss_prev;
>  static uint64_t iterations_prev;
>  
> +static int64_t dirty_pages_time_prev;
> +static int64_t dirty_pages_time_now;
> +
>  static void migration_bitmap_sync_init(void)
>  {
>  start_time = 0;
> @@ -606,6 +610,49 @@ static void migration_bitmap_sync_init(void)
>  num_dirty_pages_period = 0;
>  xbzrle_cache_miss_prev = 0;
>  iterations_prev = 0;
> +
> +dirty_pages_time_prev = 0;
> +dirty_pages_time_now = 0;
> +}
> +
> +static void migration_inst_rate(void)
> +{
> +RAMBlock *block;
> +MigrationState *s = migrate_get_current();
> +int64_t inst_dirty_pages_rate, inst_dirty_pages = 0;
> +int64_t i;
> +unsigned long *num;
> +unsigned long len = 0;
> +
> +dirty_pages_time_now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +if (dirty_pages_time_prev != 0) {
> +rcu_read_lock();
> +DirtyMemoryBlocks *blocks = atomic_rcu_read(
> + _list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
> +QLIST_FOREACH_RCU(block, _list.blocks, next) {
> +if (len == 0) {
> +len = block->offset;
> +}
> +len += block->used_length;
> +}
> +ram_addr_t idx = (len >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
> +if (((len >> TARGET_PAGE_BITS) % DIRTY_MEMORY_BLOCK_SIZE) != 0) {
> +idx++;
> +}
> +for (i = 0; i < idx; i++) {
> +num = blocks->blocks[i];
> +inst_dirty_pages += bitmap_weight(num, DIRTY_MEMORY_BLOCK_SIZE);
> +}
> +rcu_read_unlock();
> +
> +inst_dirty_pages_rate = inst_dirty_pages * TARGET_PAGE_SIZE *
> +1024 * 1024 * 1000 /
> +(dirty_pages_time_now - dirty_pages_time_pre

Re: [Qemu-devel] [PATCH v7 00/10] Convert msix_init() to error

2017-01-11 Thread Cao jin



On 01/10/2017 05:45 AM, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2016 at 03:25:30PM +0800, Cao jin wrote:
>> v7 changelog:
>> 1. fix the segfaut bug in patch 2. So drop the all the R-b of it,
>>please take a look, there is detailed description in the patch.
>> 2. add the R-b from Hannes Reinecke
>>
>> Test:
>> 1. make check: pass
>> 2. After applied all the patch, command line test for all the
>>affected devices, just make sure device realize process is ok,
>>no crash, but no further use of device.
> 
> Consider the megasas device for example, don't you
> need to test that the change actually does what
> it's intended to do?
> 

Thanks very much for your reminding, it does has a problem in patch
"megasas: remove unnecessary megasas_use_msix()", it is only one screw
megasas. Sorry for that.

Install linux distro to test megasas as following:

./qemu-system-x86_64 --enable-kvm -m 1024 -device
megasas,id=scsi0,bus=pci.0 -drive
file=/xx/xx/scsi-disk.img,if=none,id=drive-scsi0 -device
scsi-disk,bus=scsi0.0,channel=0,scsi-id=4,lun=0,drive=drive-scsi0,id=scsi0-4
-cdrom /xx/vm/Fedora-Server-DVD-x86_64-23.iso [-boot once=d] --monitor stdio

Sincerely,
Cao jin

Re: [Qemu-devel] [PATCH v2] vfio/pci: Support error recovery

2017-01-10 Thread Cao jin



On 01/10/2017 11:11 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 10, 2017 at 07:46:17PM +0800, Cao jin wrote:
>>
>>
>> On 01/10/2017 07:04 AM, Michael S. Tsirkin wrote:
>>> On Sat, Dec 31, 2016 at 05:15:36PM +0800, Cao jin wrote:
>>>> Support serious device error recovery
>>>
>>> serious?
>>>
>>
>> Sorry for my poor vocabulary if it confuses people. I wanted to express
>> the meaning that: vfio-pci actually cannot do a real recovery for device
>> even if it provides the callbacks, it relies on the user to do a
>> effective(or word "serious"?) recovery.
>>
>> Welcome the amendment on the commit log.
> 
> It's up to Alex, maybe he's able to figure it all out from
> code, but the rest of us could benefit from a description
> of what the patch does from userspace point of view.
> 
> Also, is it a pre-requisite of the userspace patches you posted?
> 

Yes, it is.
-- 
Sincerely,
Cao jin

>>>>
>>>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>>>> ---
>>>>  drivers/vfio/pci/vfio_pci.c | 70 
>>>> +++--
>>>>  drivers/vfio/pci/vfio_pci_private.h |  2 ++
>>>>  2 files changed, 70 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>>> index 712a849..752af20 100644
>>>> --- a/drivers/vfio/pci/vfio_pci.c
>>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>>> @@ -534,6 +534,15 @@ static long vfio_pci_ioctl(void *device_data,
>>>>  {
>>>>struct vfio_pci_device *vdev = device_data;
>>>>unsigned long minsz;
>>>> +  int ret;
>>>> +
>>>> +  if (vdev->aer_recovering && (cmd == VFIO_DEVICE_SET_IRQS ||
>>>> +  cmd == VFIO_DEVICE_RESET || cmd == VFIO_DEVICE_PCI_HOT_RESET)) {
>>>> +  ret = wait_for_completion_interruptible(
>>>> +  >aer_completion);
>>>
>>> don't split it like that.
>>>
>>>> +  if (ret)
>>>> +  return ret;
>>>> +  }
>>>>  
>>>>if (cmd == VFIO_DEVICE_GET_INFO) {
>>>>struct vfio_device_info info;
>>>> @@ -953,6 +962,15 @@ static ssize_t vfio_pci_rw(void *device_data, char 
>>>> __user *buf,
>>>>  {
>>>>unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>>struct vfio_pci_device *vdev = device_data;
>>>> +  int ret;
>>>> +
>>>> +  /* block all kinds of access during host recovery */
>>>> +  if (vdev->aer_recovering) {
>>>> +  ret = wait_for_completion_interruptible(
>>>> +  >aer_completion);
>>>> +  if (ret)
>>>> +  return ret;
>>>> +  }
>>>>  
>>>>if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
>>>>return -EINVAL;
>>>> @@ -1117,6 +1135,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
>>>> const struct pci_device_id *id)
>>>>vdev->irq_type = VFIO_PCI_NUM_IRQS;
>>>>mutex_init(>igate);
>>>>spin_lock_init(>irqlock);
>>>> +  init_completion(>aer_completion);
>>>>  
>>>>ret = vfio_add_group_dev(>dev, _pci_ops, vdev);
>>>>if (ret) {
>>>> @@ -1176,6 +1195,9 @@ static pci_ers_result_t 
>>>> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>>>  {
>>>>struct vfio_pci_device *vdev;
>>>>struct vfio_device *device;
>>>> +  u32 uncor_status;
>>>> +  unsigned int aer_cap_offset;
>>>> +  int ret;
>>>>  
>>>>device = vfio_device_get_from_dev(>dev);
>>>>if (device == NULL)
>>>> @@ -1187,10 +1209,29 @@ static pci_ers_result_t 
>>>> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>>>return PCI_ERS_RESULT_DISCONNECT;
>>>>}
>>>>  
>>>> +  /*
>>>> +   * get device's uncorrectable error status as soon as possible,
>>>
>>> should be "Get".
>>>
>>>> +   * and signal it to user space. The later we read it, the possibility
>>>> +   * the register value is mangled grows.
>>>> +   */
>>>> +  aer_cap_offset = pci_find_ext_capability(vdev->pdev, 
>>>> PCI_EXT_CAP_ID_ERR);
>>

Re: [Qemu-devel] [PATCH v2] vfio/pci: Support error recovery

2017-01-10 Thread Cao jin



On 01/10/2017 07:04 AM, Michael S. Tsirkin wrote:
> On Sat, Dec 31, 2016 at 05:15:36PM +0800, Cao jin wrote:
>> Support serious device error recovery
> 
> serious?
>

Sorry for my poor vocabulary if it confuses people. I wanted to express
the meaning that: vfio-pci actually cannot do a real recovery for device
even if it provides the callbacks, it relies on the user to do a
effective(or word "serious"?) recovery.

Welcome the amendment on the commit log.

-- 
Sincerely,
Cao jin

>>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> ---
>>  drivers/vfio/pci/vfio_pci.c | 70 
>> +++--
>>  drivers/vfio/pci/vfio_pci_private.h |  2 ++
>>  2 files changed, 70 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 712a849..752af20 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -534,6 +534,15 @@ static long vfio_pci_ioctl(void *device_data,
>>  {
>>  struct vfio_pci_device *vdev = device_data;
>>  unsigned long minsz;
>> +int ret;
>> +
>> +if (vdev->aer_recovering && (cmd == VFIO_DEVICE_SET_IRQS ||
>> +cmd == VFIO_DEVICE_RESET || cmd == VFIO_DEVICE_PCI_HOT_RESET)) {
>> +ret = wait_for_completion_interruptible(
>> +>aer_completion);
> 
> don't split it like that.
> 
>> +if (ret)
>> +return ret;
>> +}
>>  
>>  if (cmd == VFIO_DEVICE_GET_INFO) {
>>  struct vfio_device_info info;
>> @@ -953,6 +962,15 @@ static ssize_t vfio_pci_rw(void *device_data, char 
>> __user *buf,
>>  {
>>  unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>  struct vfio_pci_device *vdev = device_data;
>> +int ret;
>> +
>> +/* block all kinds of access during host recovery */
>> +if (vdev->aer_recovering) {
>> +ret = wait_for_completion_interruptible(
>> +>aer_completion);
>> +if (ret)
>> +return ret;
>> +}
>>  
>>  if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
>>  return -EINVAL;
>> @@ -1117,6 +1135,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
>> struct pci_device_id *id)
>>  vdev->irq_type = VFIO_PCI_NUM_IRQS;
>>  mutex_init(>igate);
>>  spin_lock_init(>irqlock);
>> +init_completion(>aer_completion);
>>  
>>  ret = vfio_add_group_dev(>dev, _pci_ops, vdev);
>>  if (ret) {
>> @@ -1176,6 +1195,9 @@ static pci_ers_result_t 
>> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>  {
>>  struct vfio_pci_device *vdev;
>>  struct vfio_device *device;
>> +u32 uncor_status;
>> +unsigned int aer_cap_offset;
>> +int ret;
>>  
>>  device = vfio_device_get_from_dev(>dev);
>>  if (device == NULL)
>> @@ -1187,10 +1209,29 @@ static pci_ers_result_t 
>> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>  return PCI_ERS_RESULT_DISCONNECT;
>>  }
>>  
>> +/*
>> + * get device's uncorrectable error status as soon as possible,
> 
> should be "Get".
> 
>> + * and signal it to user space. The later we read it, the possibility
>> + * the register value is mangled grows.
>> + */
>> +aer_cap_offset = pci_find_ext_capability(vdev->pdev, 
>> PCI_EXT_CAP_ID_ERR);
>> +ret = pci_read_config_dword(vdev->pdev, aer_cap_offset +
>> +PCI_ERR_UNCOR_STATUS, _status);
>> +if (ret)
>> +return PCI_ERS_RESULT_DISCONNECT;
>> +
>> +pr_info("device %d got AER detect notification. uncorrectable error 
>> status = 0x%x\n", pdev->devfn, uncor_status);//to be removed
> 
> Pls drop this.
> 
>>  mutex_lock(>igate);
>>  
>> -if (vdev->err_trigger)
>> -eventfd_signal(vdev->err_trigger, 1);
>> +vdev->aer_recovering = true;
>> +reinit_completion(>aer_completion);
>> +
>> +if (vdev->err_trigger && uncor_status) {
>> +pr_info("device %d signal uncor status 0x%x to user",
>> +pdev->devfn, uncor_status);
>> +/* signal uncorrectable error status to user space */
>> +eventfd_signal(vdev->err_trigger, uncor_status);
>> +}
>

Re: [Qemu-devel] [PATCH] pcie: remove duplicate assertion

2017-01-10 Thread Cao jin



On 01/10/2017 06:37 AM, Michael S. Tsirkin wrote:
> On Fri, Dec 23, 2016 at 10:16:30AM +0800, Cao jin wrote:
>> "size >= 8" connote "size > 0"
>>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
> 
> Isn't the point to check for overflows?
> 

Make sense. If it is intended to check overflows, the following sequence
would make more sense:

assert(offset >= PCI_CONFIG_SPACE_SIZE);
assert(size >= 8);
assert(offset < offset + size);
assert(offset + size <= PCIE_CONFIG_SPACE_SIZE);

or else, size 0 will pass the assert(offset < offset + size) first and
hit assert(size >= 8)
-- 
Sincerely,
Cao jin

>> ---
>>  hw/pci/pcie.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>> index 39b10b852d91..f864c5cd5458 100644
>> --- a/hw/pci/pcie.c
>> +++ b/hw/pci/pcie.c
>> @@ -668,7 +668,6 @@ void pcie_add_capability(PCIDevice *dev,
>>  uint16_t next;
>>  
>>  assert(offset >= PCI_CONFIG_SPACE_SIZE);
>> -assert(offset < offset + size);
>>  assert(offset + size <= PCIE_CONFIG_SPACE_SIZE);
>>  assert(size >= 8);
>>  assert(pci_is_express(dev));
>> -- 
>> 2.1.0
>>
>>
> 
> 
>

Re: [Qemu-devel] [PATCH v4 2/2] pcie_aer: support configurable AER capa version

2017-01-09 Thread Cao jin



On 01/10/2017 11:27 AM, Michael S. Tsirkin wrote:
> On Wed, Dec 21, 2016 at 04:21:31PM +0800, Cao jin wrote:
>> From: Dou Liyang <douly.f...@cn.fujitsu.com>
>>
>> Now, AER capa version is fixed to v2, if assigned device isn't v2,
>> then this value will be inconsistent between guest and host
>>
>> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>> Reviewed-by: Michael S. Tsirkin <m...@redhat.com>
> 
> I assume this is good for AER work so I'll merge this,
> but these patches don't do anything by themselves
> in the future pls make this explicit in commit log.
> 

Thanks for the reminding, please amend the commit log if you want.

-- 
Sincerely,
Cao jin

>> ---
>>  hw/net/e1000e.c| 3 ++-
>>  hw/pci-bridge/ioh3420.c| 3 ++-
>>  hw/pci-bridge/xio3130_downstream.c | 3 ++-
>>  hw/pci-bridge/xio3130_upstream.c   | 3 ++-
>>  hw/pci/pcie_aer.c  | 6 +++---
>>  include/hw/pci/pcie_aer.h  | 4 ++--
>>  6 files changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
>> index 89f96eb4a076..77a4b3e5bf9d 100644
>> --- a/hw/net/e1000e.c
>> +++ b/hw/net/e1000e.c
>> @@ -472,7 +472,8 @@ static void e1000e_pci_realize(PCIDevice *pci_dev, Error 
>> **errp)
>>  hw_error("Failed to initialize PM capability");
>>  }
>>  
>> -if (pcie_aer_init(pci_dev, e1000e_aer_offset, PCI_ERR_SIZEOF, NULL) < 
>> 0) {
>> +if (pcie_aer_init(pci_dev, PCI_ERR_VER, e1000e_aer_offset,
>> +  PCI_ERR_SIZEOF, NULL) < 0) {
>>  hw_error("Failed to initialize AER capability");
>>  }
>>  
>> diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
>> index 04180af79471..84b7946c3136 100644
>> --- a/hw/pci-bridge/ioh3420.c
>> +++ b/hw/pci-bridge/ioh3420.c
>> @@ -135,7 +135,8 @@ static int ioh3420_initfn(PCIDevice *d)
>>  goto err_pcie_cap;
>>  }
>>  
>> -rc = pcie_aer_init(d, IOH_EP_AER_OFFSET, PCI_ERR_SIZEOF, );
>> +rc = pcie_aer_init(d, PCI_ERR_VER, IOH_EP_AER_OFFSET,
>> +   PCI_ERR_SIZEOF, );
>>  if (rc < 0) {
>>  error_report_err(err);
>>  goto err;
>> diff --git a/hw/pci-bridge/xio3130_downstream.c 
>> b/hw/pci-bridge/xio3130_downstream.c
>> index 571334185b42..04b8e5b8479e 100644
>> --- a/hw/pci-bridge/xio3130_downstream.c
>> +++ b/hw/pci-bridge/xio3130_downstream.c
>> @@ -97,7 +97,8 @@ static int xio3130_downstream_initfn(PCIDevice *d)
>>  goto err_pcie_cap;
>>  }
>>  
>> -rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF, );
>> +rc = pcie_aer_init(d, PCI_ERR_VER, XIO3130_AER_OFFSET,
>> +   PCI_ERR_SIZEOF, );
>>  if (rc < 0) {
>>  error_report_err(err);
>>  goto err;
>> diff --git a/hw/pci-bridge/xio3130_upstream.c 
>> b/hw/pci-bridge/xio3130_upstream.c
>> index 94c16910069e..d1f59c883477 100644
>> --- a/hw/pci-bridge/xio3130_upstream.c
>> +++ b/hw/pci-bridge/xio3130_upstream.c
>> @@ -85,7 +85,8 @@ static int xio3130_upstream_initfn(PCIDevice *d)
>>  pcie_cap_flr_init(d);
>>  pcie_cap_deverr_init(d);
>>  
>> -rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF, );
>> +rc = pcie_aer_init(d, PCI_ERR_VER, XIO3130_AER_OFFSET,
>> +   PCI_ERR_SIZEOF, );
>>  if (rc < 0) {
>>  error_report_err(err);
>>  goto err;
>> diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
>> index 2a4bd5aef639..daf1f65427c2 100644
>> --- a/hw/pci/pcie_aer.c
>> +++ b/hw/pci/pcie_aer.c
>> @@ -97,10 +97,10 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
>>  aer_log->log_num = 0;
>>  }
>>  
>> -int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size,
>> -  Error **errp)
>> +int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
>> +  uint16_t size, Error **errp)
>>  {
>> -pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_VER,
>> +pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, cap_ver,
>>  offset, size);
>>  dev->exp.aer_cap = offset;
>>  
>> diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h
>> index 5891b6816e85..526802bd312b 100644
>> --- a/include/hw/pci/pcie_aer.h
>> +++ b/include/hw/pci/pcie_aer.h
>> @@ -86,8 +86,8 @@ struct PCIEAERErr {
>>  
>>  extern const VMStateDescription vmstate_pcie_aer_log;
>>  
>> -int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size,
>> -  Error **errp);
>> +int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
>> +  uint16_t size, Error **errp);
>>  void pcie_aer_exit(PCIDevice *dev);
>>  void pcie_aer_write_config(PCIDevice *dev,
>> uint32_t addr, uint32_t val, int len);
>> -- 
>> 2.1.0
>>
>>
> 
> 
> .
>

[Qemu-devel] [PATCH v2] vfio/pci: Support error recovery

2016-12-31 Thread Cao jin

Support serious device error recovery

Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 drivers/vfio/pci/vfio_pci.c | 70 +++--
 drivers/vfio/pci/vfio_pci_private.h |  2 ++
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 712a849..752af20 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -534,6 +534,15 @@ static long vfio_pci_ioctl(void *device_data,
 {
struct vfio_pci_device *vdev = device_data;
unsigned long minsz;
+   int ret;
+
+   if (vdev->aer_recovering && (cmd == VFIO_DEVICE_SET_IRQS ||
+   cmd == VFIO_DEVICE_RESET || cmd == VFIO_DEVICE_PCI_HOT_RESET)) {
+   ret = wait_for_completion_interruptible(
+   >aer_completion);
+   if (ret)
+   return ret;
+   }
 
if (cmd == VFIO_DEVICE_GET_INFO) {
struct vfio_device_info info;
@@ -953,6 +962,15 @@ static ssize_t vfio_pci_rw(void *device_data, char __user 
*buf,
 {
unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
struct vfio_pci_device *vdev = device_data;
+   int ret;
+
+   /* block all kinds of access during host recovery */
+   if (vdev->aer_recovering) {
+   ret = wait_for_completion_interruptible(
+   >aer_completion);
+   if (ret)
+   return ret;
+   }
 
if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
return -EINVAL;
@@ -1117,6 +1135,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
vdev->irq_type = VFIO_PCI_NUM_IRQS;
mutex_init(>igate);
spin_lock_init(>irqlock);
+   init_completion(>aer_completion);
 
ret = vfio_add_group_dev(>dev, _pci_ops, vdev);
if (ret) {
@@ -1176,6 +1195,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
 {
struct vfio_pci_device *vdev;
struct vfio_device *device;
+   u32 uncor_status;
+   unsigned int aer_cap_offset;
+   int ret;
 
device = vfio_device_get_from_dev(>dev);
if (device == NULL)
@@ -1187,10 +1209,29 @@ static pci_ers_result_t 
vfio_pci_aer_err_detected(struct pci_dev *pdev,
return PCI_ERS_RESULT_DISCONNECT;
}
 
+   /*
+* get device's uncorrectable error status as soon as possible,
+* and signal it to user space. The later we read it, the possibility
+* the register value is mangled grows.
+*/
+   aer_cap_offset = pci_find_ext_capability(vdev->pdev, 
PCI_EXT_CAP_ID_ERR);
+   ret = pci_read_config_dword(vdev->pdev, aer_cap_offset +
+PCI_ERR_UNCOR_STATUS, _status);
+if (ret)
+return PCI_ERS_RESULT_DISCONNECT;
+
+   pr_info("device %d got AER detect notification. uncorrectable error 
status = 0x%x\n", pdev->devfn, uncor_status);//to be removed
mutex_lock(>igate);
 
-   if (vdev->err_trigger)
-   eventfd_signal(vdev->err_trigger, 1);
+   vdev->aer_recovering = true;
+   reinit_completion(>aer_completion);
+
+   if (vdev->err_trigger && uncor_status) {
+   pr_info("device %d signal uncor status 0x%x to user",
+   pdev->devfn, uncor_status);
+   /* signal uncorrectable error status to user space */
+   eventfd_signal(vdev->err_trigger, uncor_status);
+}
 
mutex_unlock(>igate);
 
@@ -1199,8 +1240,33 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct 
pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+static void vfio_pci_aer_resume(struct pci_dev *pdev)
+{
+   struct vfio_pci_device *vdev;
+   struct vfio_device *device;
+
+   device = vfio_device_get_from_dev(>dev);
+   if (device == NULL)
+   return;
+
+   vdev = vfio_device_data(device);
+   if (vdev == NULL) {
+   vfio_device_put(device);
+   return;
+   }
+
+   mutex_lock(>igate);
+   vdev->aer_recovering = false;
+   mutex_unlock(>igate);
+
+   complete_all(>aer_completion);
+
+   vfio_device_put(device);
+}
+
 static const struct pci_error_handlers vfio_err_handlers = {
.error_detected = vfio_pci_aer_err_detected,
+   .resume = vfio_pci_aer_resume,
 };
 
 static struct pci_driver vfio_pci_driver = {
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 8a7d546..ba8471f 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -83,6 +83,8 @@ struct vfio_pci_device {
boolbardirty;
bool

[Qemu-devel] [PATCH RFC v11 4/4] vfio: add 'aer' property to expose aercap

2016-12-31 Thread Cao jin

From: Chen Fan <chen.fan.f...@cn.fujitsu.com>

Add 'aer' property, let user choose whether expose the aer capability
or not. Should disable aer feature by default, because only non-fatal
error is supported now.

Signed-off-by: Chen Fan <chen.fan.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9861f72..fc9db66 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3057,6 +3057,8 @@ static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
sub_device_id, PCI_ANY_ID),
 DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
+DEFINE_PROP_BIT("aer", VFIOPCIDevice, features,
+VFIO_FEATURE_ENABLE_AER_BIT, false),
 /*
  * TODO - support passed fds... is this necessary?
  * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v11 2/4] vfio: new function to init aer cap for vfio device

2016-12-31 Thread Cao jin

From: Chen Fan <chen.fan.f...@cn.fujitsu.com>

Introduce new function to initilize AER capability registers
for vfio-pci device.

Signed-off-by: Chen Fan <chen.fan.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c | 87 +++
 hw/vfio/pci.h |  3 +++
 2 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d7dbe0e..76a8ac3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1851,18 +1851,81 @@ out:
 return 0;
 }
 
-static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
+  int pos, uint16_t size, Error **errp)
+{
+PCIDevice *pdev = >pdev;
+PCIDevice *dev_iter;
+uint8_t type;
+uint32_t errcap;
+
+/* In case the physical device has AER cap while user doesn't enable AER,
+ * still allocate the config space in the emulated device for AER */
+if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
+pcie_add_capability(pdev, PCI_EXT_CAP_ID_ERR,
+cap_ver, pos, size);
+return 0;
+}
+
+dev_iter = pci_bridge_get_device(pdev->bus);
+if (!dev_iter) {
+goto error;
+}
+
+while (dev_iter) {
+if (!pci_is_express(dev_iter)) {
+goto error;
+}
+
+type = pcie_cap_get_type(dev_iter);
+if ((type != PCI_EXP_TYPE_ROOT_PORT &&
+ type != PCI_EXP_TYPE_UPSTREAM &&
+ type != PCI_EXP_TYPE_DOWNSTREAM)) {
+goto error;
+}
+
+if (!dev_iter->exp.aer_cap) {
+goto error;
+}
+
+dev_iter = pci_bridge_get_device(dev_iter->bus);
+}
+
+errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+/*
+ * The ability to record multiple headers is depending on
+ * the state of the Multiple Header Recording Capable bit and
+ * enabled by the Multiple Header Recording Enable bit.
+ */
+if ((errcap & PCI_ERR_CAP_MHRC) &&
+(errcap & PCI_ERR_CAP_MHRE)) {
+pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+} else {
+pdev->exp.aer_log.log_max = 0;
+}
+
+pcie_cap_deverr_init(pdev);
+return pcie_aer_init(pdev, cap_ver, pos, size);
+
+error:
+error_setg(errp, "vfio: Unable to enable AER for device %s, parent bus "
+   "does not support AER signaling", vdev->vbasedev.name);
+return -1;
+}
+
+static int vfio_add_ext_cap(VFIOPCIDevice *vdev, Error **errp)
 {
 PCIDevice *pdev = >pdev;
 uint32_t header;
 uint16_t cap_id, next, size;
 uint8_t cap_ver;
 uint8_t *config;
+int ret = 0;
 
 /* Only add extended caps if we have them and the guest can see them */
 if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
 !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
-return;
+return 0;
 }
 
 /*
@@ -1911,6 +1974,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
PCI_EXT_CAP_NEXT_MASK);
 
 switch (cap_id) {
+case PCI_EXT_CAP_ID_ERR:
+ret = vfio_setup_aer(vdev, cap_ver, next, size, errp);
+break;
 case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
 case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
 trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
@@ -1919,6 +1985,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pcie_add_capability(pdev, cap_id, cap_ver, next, size);
 }
 
+if (ret) {
+goto out;
+}
 }
 
 /* Cleanup chain head ID if necessary */
@@ -1926,8 +1995,9 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
 }
 
+out:
 g_free(config);
-return;
+return ret;
 }
 
 static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
@@ -1945,8 +2015,8 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, 
Error **errp)
 return ret;
 }
 
-vfio_add_ext_cap(vdev);
-return 0;
+ret = vfio_add_ext_cap(vdev, errp);
+return ret;
 }
 
 static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
@@ -2769,6 +2839,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 goto out_teardown;
 }
 
+if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+!pdev->exp.aer_cap) {
+error_setg(errp, "vfio: Unable to enable AER for device %s, device "
+   "does not support AER signaling", vdev->vbasedev.name);
+return;
+}
+
 if (vdev->vga) {
 vfio_vga_quirk_setup(vdev);
 }
diff --gi

[Qemu-devel] [PATCH RFC v11 3/4] vfio-pci: pass the aer error to guest

2016-12-31 Thread Cao jin

From: Chen Fan <chen.fan.f...@cn.fujitsu.com>

When physical device has uncorrectable error hanppened, the vfio_pci
driver will signal the uncorrectable error status register value to
corresponding QEMU's vfio-pci device via the eventfd registered by this
device, then, the vfio-pci's error eventfd handler will be invoked in
event loop.

Construct and pass the aer message to root port, root port will trigger an
interrupt to signal guest, then, the guest driver will do the recovery.

Note: Now only support non-fatal error's recovery, fatal error will
still result in vm stop.

Signed-off-by: Chen Fan <chen.fan.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/vfio/pci.c | 50 ++
 1 file changed, 42 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 76a8ac3..9861f72 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2470,21 +2470,55 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
 static void vfio_err_notifier_handler(void *opaque)
 {
 VFIOPCIDevice *vdev = opaque;
+PCIDevice *dev = >pdev;
+PCIEAERMsg msg = {
+.severity = 0,
+.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+};
+int len;
+uint64_t uncor_status;
+
+/* Read uncorrectable error status from driver */
+len = read(vdev->err_notifier.rfd, _status, sizeof(uncor_status));
+if (len != sizeof(uncor_status)) {
+error_report("vfio-pci: uncor error status reading returns"
+ " invalid number of bytes: %d", len);
+return; //Or goto stop?
+}
+
+if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
+goto stop;
+}
+
+/* Populate the aer msg and send it to root port */
+if (dev->exp.aer_cap) {
+uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+bool isfatal = uncor_status &
+   pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+
+   if (isfatal) {
+   goto stop;
+   }
+
+msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
+ PCI_ERR_ROOT_CMD_NONFATAL_EN;
 
-if (!event_notifier_test_and_clear(>err_notifier)) {
+error_report("vfio-pci device %d sending AER to root port. uncor"
+ " status = 0x%"PRIx64, dev->devfn, uncor_status);
+pcie_aer_msg(dev, );
 return;
 }
 
+stop:
 /*
- * TBD. Retrieve the error details and decide what action
- * needs to be taken. One of the actions could be to pass
- * the error to the guest and have the guest driver recover
- * from the error. This requires that PCIe capabilities be
- * exposed to the guest. For now, we just terminate the
- * guest to contain the error.
+ * Terminate the guest in case of
+ * 1. AER capability is not exposed to guest.
+ * 2. AER capability is exposed, but error is fatal, only non-fatal
+ * error is handled now.
  */
 
-error_report("%s(%s) Unrecoverable error detected. Please collect any data 
possible and then kill the guest", __func__, vdev->vbasedev.name);
+error_report("%s(%s) fatal error detected. Please collect any data"
+" possible and then kill the guest", __func__, 
vdev->vbasedev.name);
 
 vm_stop(RUN_STATE_INTERNAL_ERROR);
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH v11 1/4] pcie_aer: support configurable AER capa version

2016-12-31 Thread Cao jin

From: Dou Liyang <douly.f...@cn.fujitsu.com>

Now, AER capa version is fixed to v2, if assigned device is actually
v1, this value will be inconsistent between guest and host

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
---
 hw/net/e1000e.c| 3 ++-
 hw/pci-bridge/ioh3420.c| 2 +-
 hw/pci-bridge/xio3130_downstream.c | 2 +-
 hw/pci-bridge/xio3130_upstream.c   | 2 +-
 hw/pci/pcie_aer.c  | 5 +++--
 include/hw/pci/pcie_aer.h  | 3 ++-
 6 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 4994e1c..66de849 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -472,7 +472,8 @@ static void e1000e_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 hw_error("Failed to initialize PM capability");
 }
 
-if (pcie_aer_init(pci_dev, e1000e_aer_offset, PCI_ERR_SIZEOF) < 0) {
+if (pcie_aer_init(pci_dev, PCI_ERR_VER, e1000e_aer_offset,
+  PCI_ERR_SIZEOF) < 0) {
 hw_error("Failed to initialize AER capability");
 }
 
diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index c8b5ac4..d70784c 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -135,7 +135,7 @@ static int ioh3420_initfn(PCIDevice *d)
 goto err_pcie_cap;
 }
 
-rc = pcie_aer_init(d, IOH_EP_AER_OFFSET, PCI_ERR_SIZEOF);
+rc = pcie_aer_init(d, PCI_ERR_VER, IOH_EP_AER_OFFSET, PCI_ERR_SIZEOF);
 if (rc < 0) {
 goto err;
 }
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index cef6e13..5d1ce01 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -97,7 +97,7 @@ static int xio3130_downstream_initfn(PCIDevice *d)
 goto err_pcie_cap;
 }
 
-rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
+rc = pcie_aer_init(d, PCI_ERR_VER, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
 if (rc < 0) {
 goto err;
 }
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 4ad0440..3819964 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -85,7 +85,7 @@ static int xio3130_upstream_initfn(PCIDevice *d)
 pcie_cap_flr_init(d);
 pcie_cap_deverr_init(d);
 
-rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
+rc = pcie_aer_init(d, PCI_ERR_VER, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
 if (rc < 0) {
 goto err;
 }
diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 048ce6a..ac47f34 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -96,11 +96,12 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
 aer_log->log_num = 0;
 }
 
-int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size)
+int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
+  uint16_t size)
 {
 PCIExpressDevice *exp;
 
-pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_VER,
+pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, cap_ver,
 offset, size);
 exp = >exp;
 exp->aer_cap = offset;
diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h
index c2ee4e2..c373591 100644
--- a/include/hw/pci/pcie_aer.h
+++ b/include/hw/pci/pcie_aer.h
@@ -87,7 +87,8 @@ struct PCIEAERErr {
 
 extern const VMStateDescription vmstate_pcie_aer_log;
 
-int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size);
+int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
+  uint16_t size);
 void pcie_aer_exit(PCIDevice *dev);
 void pcie_aer_write_config(PCIDevice *dev,
uint32_t addr, uint32_t val, int len);
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v11 0/4] vfio-pci: pass non-fatal error to guest

2016-12-31 Thread Cao jin

As previous discussion suggest, we could take a step back to handle non-fatal
error first, this will make this patchset much more thinner, because we could
drop all the configuration restriction related patches.

FYI: patch 1 has been cherry picked into another series, and wait to be merged
first, so this patchset can't compile in your host.

v11 changelog:
1. drop a bunch of code which check the configuration.
2. modify patch 3 to handle non-fatal error only, fatal error still
   results in vm stop.
   Doesn't modify as suggestion "add another eventfd do distinguish fatal &
   non-fatal error", because 1st, user has the ability to distinguish them
   just from the uncorrectable error status; 2nd, for back compatible, e.g.
   an old user handle both error, rely on the current error eventfd.

Test:
Test it with intel 82576 NIC, which has 2 functions, function 1 has cable
connected to gateway, function 0 has no link. Test in 4 scenario.
1. just assign function 1 to one vm, function 0 has no user
2. assign 2 function to one vm, totally comply previous configuraton restrction
3. assign 2 function to one vm, under different virtual bus
4, assign functions to 2 different vm

The test steps are the same as v10: assign ip to function 1, add route info,
and ping the gateway. The results meet expectation. But the unsteady hardware
often emit fatal error, still don't know why. And igb driver in guest seems
has bug: ping gateway for a while, even if I don't do anything, it will show
"Destination Host Unreachable" after many successful ping. But obviously,
neither of these has relationship with this patchset.


Chen Fan (3):
  vfio: new function to init aer cap for vfio device
  vfio-pci: pass the aer error to guest
  vfio: add 'aer' property to expose aercap

Dou Liyang (1):
  pcie_aer: support configurable AER capa version

 hw/net/e1000e.c|   3 +-
 hw/pci-bridge/ioh3420.c|   2 +-
 hw/pci-bridge/xio3130_downstream.c |   2 +-
 hw/pci-bridge/xio3130_upstream.c   |   2 +-
 hw/pci/pcie_aer.c  |   5 +-
 hw/vfio/pci.c  | 139 +
 hw/vfio/pci.h  |   3 +
 include/hw/pci/pcie_aer.h  |   3 +-
 8 files changed, 139 insertions(+), 20 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [RFC]virtio-blk: add disk-name device property

2016-12-29 Thread Cao jin

As I know, this is not a good way to submit a patch. You need to read
the guideline first: http://wiki.qemu.org/Contribute/SubmitAPatch

-- 
Sincerely,
Cao jin

On 12/30/2016 10:41 AM, Junkang Fu wrote:
>>From 74e913fc41ea98d1dde692175f1e3fb6729342aa Mon Sep 17 00:00:00 2001
> From: "junkang.fjk" <junkang@alibaba-inc.com>
> Date: Wed, 24 Aug 2016 19:36:53 +0800
> Subject: [PATCH] virtio-blk: add disk-name device property
> 
> Current virtio-blk disk name(ex. /dev/vdb) has nothing to do with the
> target dev
> name specified in libvirt xml file. For example, we may get disk name
> /dev/vdb in
> VM while target dev specified in libvirt xml is vdc. This may lead to a
> little trouble
> to find out the relationship between the disk name in VM and somewhere out
> of
> VM, for example in the control board of Public cloud service providers. I
> suggest
> if Qemu could add a VIRTIO_BLK_F_DISK_NAME feature, with
> VIRTIO_BLK_F_DISK_NAME
> capable Qemu and virtio-blk frontend drivers, disk name in the vm can be
> specified
> as follows:
> -device virtio-blk-pci,disk-name=vdabc
> 
> ---
>  hw/block/virtio-blk.c   | 5 +
>  include/hw/virtio/virtio-blk.h  | 1 +
>  include/standard-headers/linux/virtio_blk.h | 6 ++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 331d766..4039fb9 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -716,6 +716,8 @@ static void virtio_blk_update_config(VirtIODevice
> *vdev, uint8_t *config)
>  blkcfg.alignment_offset = 0;
>  blkcfg.wce = blk_enable_write_cache(s->blk);
>  virtio_stw_p(vdev, _queues, s->conf.num_queues);
> +if (s->disk_name)
> +strncpy((char *)blkcfg.disk_name, s->disk_name, DISK_NAME_LEN);
>  memcpy(config, , sizeof(struct virtio_blk_config));
>  }
> @@ -740,6 +742,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice
> *vdev, uint64_t features,
>  virtio_add_feature(, VIRTIO_BLK_F_GEOMETRY);
>  virtio_add_feature(, VIRTIO_BLK_F_TOPOLOGY);
>  virtio_add_feature(, VIRTIO_BLK_F_BLK_SIZE);
> +virtio_add_feature(, VIRTIO_BLK_F_DISK_NAME);
> +
>  if (virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
>  if (s->conf.scsi) {
>  error_setg(errp, "Please set scsi=off for virtio-blk devices
> in order to use virtio 1.0");
> @@ -970,6 +974,7 @@ static Property virtio_blk_properties[] = {
>  DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging,
> 0,
>  true),
>  DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
> +DEFINE_PROP_STRING("disk-name", VirtIOBlock, disk_name),
>  DEFINE_PROP_END_OF_LIST(),
>  };
> 
> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
> index 180bd8d..003e810 100644
> --- a/include/hw/virtio/virtio-blk.h
> +++ b/include/hw/virtio/virtio-blk.h
> @@ -56,6 +56,7 @@ typedef struct VirtIOBlock {
>  bool dataplane_disabled;
>  bool dataplane_started;
>  struct VirtIOBlockDataPlane *dataplane;
> +char *disk_name;
>  } VirtIOBlock;
> 
>  typedef struct VirtIOBlockReq {
> diff --git a/include/standard-headers/linux/virtio_blk.h
> b/include/standard-headers/linux/virtio_blk.h
> index ab16ec5..1f5d89d 100644
> --- a/include/standard-headers/linux/virtio_blk.h
> +++ b/include/standard-headers/linux/virtio_blk.h
> @@ -38,6 +38,7 @@
>  #define VIRTIO_BLK_F_BLK_SIZE  6   /* Block size of disk is available*/
>  #define VIRTIO_BLK_F_TOPOLOGY  10  /* Topology information is available */
>  #define VIRTIO_BLK_F_MQ12  /* support more than one vq */
> +#define VIRTIO_BLK_F_DISK_NAME  13  /* specify /dev/xxx name */
> 
>  /* Legacy feature bits */
>  #ifndef VIRTIO_BLK_NO_LEGACY
> @@ -51,6 +52,9 @@
> 
>  #define VIRTIO_BLK_ID_BYTES20  /* ID string length */
> 
> +/* micro defined in kernel genhd.h */
> +#define DISK_NAME_LEN 32
> +
>  struct virtio_blk_config {
> /* The capacity (in 512-byte sectors). */
> uint64_t capacity;
> @@ -84,6 +88,8 @@ struct virtio_blk_config {
> 
> /* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
> uint16_t num_queues;
> +
> +   uint8_t disk_name[DISK_NAME_LEN];
>  } QEMU_PACKED;
> 
>  /*
> --
> 1.9.4
> 
>

[Qemu-devel] [PATCH v2] doc/pcie: correct command line examples

2016-12-28 Thread Cao jin

Nit picking: Multi-function PCI Express Root Ports should mean that
'addr' property is mandatory, and slot is optional because it defaults
to 0, and 'chassis' is mandatory for 2nd & 3rd root port because it
defaults to 0 too.

Bonus: fix a typo(2->3)
Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <mar...@redhat.com>
---
 docs/pcie.txt | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/pcie.txt b/docs/pcie.txt
index 9fb20aaed9f4..5bada24a15ab 100644
--- a/docs/pcie.txt
+++ b/docs/pcie.txt
@@ -110,18 +110,18 @@ Plug only PCI Express devices into PCI Express Ports.
   -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] 
 \
   -device ,bus=root_port1
 2.2.2 Using multi-function PCI Express Root Ports:
-  -device 
ioh3420,id=root_port1,multifunction=on,chassis=x,slot=y[,bus=pcie.0][,addr=z.0] 
\
-  -device ioh3420,id=root_port2,chassis=x1,slot=y1[,bus=pcie.0][,addr=z.1] 
\
-  -device ioh3420,id=root_port3,chassis=x2,slot=y2[,bus=pcie.0][,addr=z.2] 
\
-2.2.2 Plugging a PCI Express device into a Switch:
+  -device 
ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] 
\
+  -device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] 
\
+  -device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] 
\
+2.2.3 Plugging a PCI Express device into a Switch:
   -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
   -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] 
 \
   -device 
xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]]
 \
   -device ,bus=downstream_port1
 
 Notes:
-  - (slot, chassis) pair is mandatory and must be
- unique for each PCI Express Root Port.
+  - (slot, chassis) pair is mandatory and must be unique for each
+PCI Express Root Port. slot defaults to 0 when not specified.
   - 'addr' parameter can be 0 for all the examples above.
 
 
-- 
2.1.0

Re: [Qemu-devel] [PATCH] doc/pcie: correct command line examples

2016-12-28 Thread Cao jin



On 12/28/2016 11:21 PM, Andrew Jones wrote:
> On Wed, Dec 28, 2016 at 03:24:30PM +0200, Marcel Apfelbaum wrote:
>> On 12/27/2016 09:40 AM, Cao jin wrote:
>>> Nit picking: Multi-function PCI Express Root Ports should mean that
>>> 'addr' property is mandatory, and slot is optional because it is default
>>> to 0, and 'chassis' is mandatory for 2nd & 3rd root port because it is
>>> default to 0 too.
>>>
>>> Bonus: fix a typo(2->3)
>>> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
>>> ---
>>>  docs/pcie.txt | 12 ++--
>>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/docs/pcie.txt b/docs/pcie.txt
>>> index 9fb20aaed9f4..54f05eaa71dc 100644
>>> --- a/docs/pcie.txt
>>> +++ b/docs/pcie.txt
>>> @@ -110,18 +110,18 @@ Plug only PCI Express devices into PCI Express Ports.
>>>-device 
>>> ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
>>>-device ,bus=root_port1
>>>  2.2.2 Using multi-function PCI Express Root Ports:
>>> -  -device 
>>> ioh3420,id=root_port1,multifunction=on,chassis=x,slot=y[,bus=pcie.0][,addr=z.0]
>>>  \
>>> -  -device 
>>> ioh3420,id=root_port2,chassis=x1,slot=y1[,bus=pcie.0][,addr=z.1] \
>>> -  -device 
>>> ioh3420,id=root_port3,chassis=x2,slot=y2[,bus=pcie.0][,addr=z.2] \
>>> -2.2.2 Plugging a PCI Express device into a Switch:
>>> +  -device 
>>> ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0]
>>>  \
>>> +  -device 
>>> ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \
>>> +  -device 
>>> ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \
>>> +2.2.3 Plugging a PCI Express device into a Switch:
>>>-device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] 
>>>  \
>>>-device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] 
>>>  \
>>>-device 
>>> xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]]
>>>  \
>>>-device ,bus=downstream_port1
>>>
>>>  Notes:
>>> -  - (slot, chassis) pair is mandatory and must be
>>> - unique for each PCI Express Root Port.
>>> +  - (slot, chassis) pair is mandatory and must be unique for each
>>> +PCI Express Root Port. slot is default to 0 when doesn't specify it.
> 
> Please rewrite last sentence as
> 
>  slot defaults to 0 when not specified.

Thanks for pointing it out, v2 is on the way.

-- 
Sincerely,
Cao jin

> 
>>>- 'addr' parameter can be 0 for all the examples above.
>>>
>>>
>>>
>>
>> Reviewed-by: Marcel Apfelbaum <mar...@redhat.com>
>>
>> Thanks,
>> Marcel
>>
> 
> Thanks,
> drew
> 
> 
> .
>

Re: [Qemu-devel] vfio/pci: guest error recovery proposal

2016-12-27 Thread Cao jin



On 12/16/2016 07:02 AM, Michael S. Tsirkin wrote:
> 
>>  1) We need to do the right thing for the guest, I don't think we
>> should be presuming that different reset types are equivalent,
>> leaving gaps where we expect the guest/host to do a reset and don't
>> follow through on other reset requests, and we need to notify the
>> guest immediately for the error.
> c>  2) We need to do the right thing for the host, that means we should
>> not give the user the opportunity to leave a device in a state
>> where we haven't at least performed a bus reset on link error (this
>> may be our current state and if so we should fix it).
> 
> Ok so here is a concrete proposal for improving guest device error
> recovery (1).  This is not trying to fix current bugs for 2, but
> also does not lock us into not fixing them.
> 
> I'll write up proposal for (2) but I feel we can't properly
> fix host without fixing (1) first and without breaking compatibility.
> 
> Background:
> 
> non-fatal errors:
> 
> - These errors are due to data link problems.
>   The problem is that a transaction was lost, so driver and device are
>   out of sync. Device reset is in theory enough to recover from these,
>   in practice some drivers might try to do link level reset instead.
> 
> 
> fatal errors:
> 
> - These errors are due to physical problems.
>   The problem is that a transaction was lost, so driver and device are
>   out of sync. Link reset might be necessary to recover from these,
>   sometimes device reset might be enough for very simple devices.
>   If a link above the device reports errors, device might have went away,
>   link reset is the only thing that might being it back.
> 
> current behaviour:
> 
> - vfio will always report that it recovered function from an error.
> - whether link reset will trigger depends on whether any other
>   function on the same link has a host driver that reports an error.
> - also, if there's a host driver that can't handle errors,
>   link reset will never trigger
> 
> 
> proposed enhancement:
> 
> 1- allow userspace to request reporting non fatal/fatal errors separately
> 2- report errors on monitor as events as well
> 3- forward correct error type to guest
> 4- set link error flag in userspace (this is optional, used for 5 below)
> 5- if guest requests link reset, and error flag is set,
>   stop vm (I hope we can distinguish this
>   from resets that happen on reboot here.
>   if yes we might not need error flag in 4 above)
> 

Hi,

I have a question about vm stop on fatal error.
Recently, When test my patches, I often saw fatal error(Malformed TLP
Status) happens, which disturbed my test. So I am wondering: why vm stop
is a better choice than qdev_unplug? Although we told user "Please
collect any data possible and then kill the guest", I still don't know
how to save any possible data. For example, if user is editing document,
vm_stop caused by a device fatal error will destroy user's effort.

-- 
Sincerely,
Cao jin
> 
> Results:
> The advantage of this is that we don't need to manage any state at all.
> Most drivers will handle non fatal errors by FLR and will recover fine.
> Drivers that attempt link reset will get vmstop which is not
> worse than what we have now.
> 
> I don't see how this can break any reasonable configuration
> that is not already broken, but we might want a flag
> to suppress aer reports to guest and just do vmstop
> unconditionally.
> Alternatively, management can pause vm itself when it sees the error.
> 
> 
> Pls remember to Cc qemu list on discussion, not just kvm.
>

1 2 3 4 5 6 7 8 9 >

1 - 100 of 878 matches

Mail list logo