Re: [PATCH v1 7/7] pseries/setup: Add Initialization of VF Bars

2017-12-20 Thread Juan Alvarez
On 12/19/17 12:38 AM, Alexey Kardashevskiy wrote:

> On 19/12/17 06:29, Juan Alvarez wrote:
>> This is PF only path. Yes either we have a root returned otherwise
>> will fall back to iomem_resource.
> You have removed context from my response, do not do that please.

My apologies. I will not do that.

>
> When will you have root and when you won't? imho it should always be either
> one or another.
>
Yes you are correct. The resource is carved out of a different mmio 
space and will never be passed in the assigned-addresses property in 
the device node of PF.

We will remove that function call, conditional check and set root accordingly.

>> On 12/18/17 1:21 AM, Alexey Kardashevskiy wrote:
>>> @dev here is a VF, right? I am not familiar with powervn much but from what
>>> I see - the devices are sitting on a root bus of their own PHB and they all
>>> either have a root returned from pci_find_parent_resource() or none of them
>>> has a root and will fall back to _resource, or both cases are 
>>> possible?
>
- Juan 



Re: [PATCH v2 2/7] powerpc/kernel: Add uevents in EEH error/resume

2017-12-20 Thread Juan Alvarez
On 12/19/17 12:27 AM, Benjamin Herrenschmidt wrote:

> On Mon, 2017-12-18 at 22:50 -0600, Bjorn Helgaas wrote:
>> [+cc Keith, Gabriele, Dongdong]
>>
>> On Mon, Dec 18, 2017 at 04:38:03PM -0600, Bryant G. Ly wrote:
>>> Devices can go offline when EEH is reported. This patch adds
>>> a change to the kernel object and lets udev know of error.
>>> When device resumes a change is also set reporting device as
>>> online. Therefore, EEH events are better propagated to user
>>> space for devices in powerpc arch.
>> I'm on vacation and can't review this in detail, but I wonder if you
>> can compare this with the uevents we emit for DPC, AER, and hotplug
>> events (if any).  I hope we don't end up with userspace having to be
>> aware of the differences between EEH, DPC, AER, etc.
>>
>>> From a very quick look, I only see a few uevents even mentioned in
>> drivers/pci: KOBJ_ADD in __pci_hp_register() and KOBJ_CHANGE in the
>> SR-IOV code.  I'm worried that we're missing some important uevents in
>> the PCI core.  That's not an argument against what you're doing here;
>> it just would be nice to fill in any missing pieces in the core also,
>> and hopefully make them consistent with these EEH events.
> We also need to be careful about what specific EEH activity we are
> talking about, and if we bring into the picture things like DPDK, it
> gets even more murky...
>
> The basic way EEH is supposed to work for recovery (minus all sort of
> implementation nasties which hopefully Russell and Sam are trying to
> cleanup and fix) is that either:
>
>   - The driver of the device has recovery callbacks, in which
> case the driver participates in the recovery process, the device
> doesn't "go away" (though it shouldn't be accessed during that process
> by other entities, userspace originated config space could be a problem
> and needs to be blocked...). The recovery typically involves a reset of
> the device but in sync with the driver.
>
>   - The driver doesn't have the callbacks. In this case, we
> simulate an unplug, reset the device, and replug.
>
> So it makes sense for the second case to emit the same uevents as a
> normal PCI(e) hotplug.
>
> For the former case I'm less sure Do we really need userspace to be
> notified ? If yes, what for precisely ?

In pSeries SR-IOV environment the management console might need to apply
certain configuration changes to the PF driver after it has been recovered
and before the VF drivers are allowed to resume their recovery path.
I could not think of another way to notify user space of these events.
I made this assumption because I saw there were no uevents added when 
the device goes offline and come back online in EEH code. It was my 
intention to make the event as generic as possible in EEH component,
therefore, making this change independent of pSeries SR-IOV.

- Juan



Re: [PATCH v2 2/7] powerpc/kernel: Add uevents in EEH error/resume

2017-12-20 Thread Juan Alvarez
On 12/18/17 10:59 PM, Russell Currey wrote:

> On Mon, 2017-12-18 at 22:50 -0600, Bjorn Helgaas wrote:
>> [+cc Keith, Gabriele, Dongdong]
>>
>> On Mon, Dec 18, 2017 at 04:38:03PM -0600, Bryant G. Ly wrote:
>>> Devices can go offline when EEH is reported. This patch adds
>>> a change to the kernel object and lets udev know of error.
>>> When device resumes a change is also set reporting device as
>>> online. Therefore, EEH events are better propagated to user
>>> space for devices in powerpc arch.
>> I'm on vacation and can't review this in detail, but I wonder if you
>> can compare this with the uevents we emit for DPC, AER, and hotplug
>> events (if any).  I hope we don't end up with userspace having to be
>> aware of the differences between EEH, DPC, AER, etc.
>>
>> From a very quick look, I only see a few uevents even mentioned in
>> drivers/pci: KOBJ_ADD in __pci_hp_register() and KOBJ_CHANGE in the
>> SR-IOV code.  I'm worried that we're missing some important uevents
>> in
>> the PCI core.  

The only place where I see the KOBJ_REMOVE being used is when the device is 
removed in pci_destroy_dev -> device_del whic will be called implicitly
in permanent failure path of EEH code

>> That's not an argument against what you're doing here;
>> it just would be nice to fill in any missing pieces in the core also,
>> and hopefully make them consistent with these EEH events.
> I don't think this needs to be particularly complex, could we get away
> with events for when devices do the following?
>
> - begin recovery
> - successfully recover
> - fail recovery

If there are no objections in the on going review of this patch 
I can change them to these names:

  - BEGIN_RECOVERY
  - SUCCESSFUL_RECOVERY
  - FAILED_RECOVERY

>
> It might be worthwhile sorting out some consistent, non-EEH-specific
> naming, and then other device error recovery systems can do the same
> later.
>
Do you have a more consistent naming in mind for these events? 

- Juan



Re: [PATCH v1 7/7] pseries/setup: Add Initialization of VF Bars

2017-12-18 Thread Juan Alvarez
This is PF only path. Yes either we have a root returned otherwise
will fall back to iomem_resource.


On 12/18/17 1:21 AM, Alexey Kardashevskiy wrote:
> @dev here is a VF, right? I am not familiar with powervn much but from what
> I see - the devices are sitting on a root bus of their own PHB and they all
> either have a root returned from pci_find_parent_resource() or none of them
> has a root and will fall back to _resource, or both cases are possible?



Re: [PATCH v1 4/7] powerpc/kernel Add EEH operations to notify resume

2017-12-18 Thread Juan Alvarez
Yes, way less. So our current design only supports less than or equal to
256 VFs per PF. That is 256*2 bytes.


On 12/17/17 11:02 PM, Alexey Kardashevskiy wrote:
> It is kind of assumed that the number of VFs is always less than 2048 minus
> rtas call parameters (RTAS_DATA_BUF_SIZE==4096 now)?



Re: [PATCH v1 3/7] platforms/pseries: Set eeh_pe of EEH_PE_VF type

2017-12-18 Thread Juan Alvarez
Here we need to set the config_addr as PHYP (platform) 
does not enable the PE until the PE is bound to a VM, 
reason why we disable VF autoprobe.


On 12/17/17 10:34 PM, Alexey Kardashevskiy wrote:
> powernv does this from eeh_ops::probe, and so does pseries_eeh_probe(), do
> you still need this here?



Re: [PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device

2017-10-18 Thread Juan Alvarez
On 10/17/17 8:36 PM, Alexey Kardashevskiy wrote:
> PowerNV KVM guest is a pseries machine so this code will execute there.
>
The configure sriov path will fail and not enable sriov if resources are
not met. I.e. the IOV Bar is not set in PF IOV Resources, which in this
case gets assigned by firmware.

We have separated the calls to put PowerNV and PSeries as machine dependent
calls. Furthermore, we are adding device node properties in the device tree to 
identify if
this is an SR-IOV slot on Phyp Pseries platform. Verification will be in place 
to
distinguish between platforms that support SR-IOV in PSeries.
>> Which is the reason for
>> the separation of calls to the machine dependent stuff.
>>> We also use the pseries platform when running under KVM.
>>>
>>> cheers
>>>
>> If anyone plans to enable SR-IOV on Pseries platform, firmware must provide 
>> the
>> resources to enable the VFs and map them with system resources.
> This is what the PowerNV platform does.
Again, we have separated the machine dependent calls to different platforms. In 
our
case we don't use opal and our dependent on phyp to associate resources.
>
>> A new version
>> of the PAPR Document will be added to document these system resources.
> The guest simply gets yet another PCI device, how is IOV different here?
>
> In regard of EEH, the API does not change afaik, it is up to the hypervisor
> (KVM+QEMU) to handle IOV case correctly.

I don't understand your question entirely, can you rephrase?




Re: [PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device

2017-10-17 Thread Juan Alvarez

On 10/17/17 1:52 PM, Bjorn Helgaas wrote:
> Right.  But that patch isn't really on the path to inclusion (mainly
> because it's test framework and doesn't fix a bug or add support for
> new hardware or features).
I was not aware of this decision and this will cause changes to this patch.
>
> I'm suggesting that maybe there should be a generic way to prevent
> binding to VF devices that works for everybody and doesn't require any
> arch-specific code at all.

The patch that you have suggested in kernel 4.12 is also a generic way.

https://marc.info/?l=linux-kernel=149002335105804=2

Perhaps we can use the same constructs that this patch uses at our level. 
Nonetheless,
that will take a rework to this patch and possibly an export of a function to 
set drivers_autoprobe
globally. I know that was frowned upon on first email thread. Let me know if 
this is an
acceptable solution.
>> So like existing way of enabling SRIOV we still rely on the PF driver to
>> enable VFs - but in this case the attachment phase is done via a user
>> action via a management console in our case (novalink or hmc) triggered
>> event that will essentially act like a hotplug.
>>
>> So in the fine details of that user triggered action the system
>> firmware will bind the VFs, allowing resources to be allocated to
>> the VF.  - Which essentially does all the attaching as we know it
>> today but is managed by PHYP not by the kernel.
> What exactly does "firmware binding the VFs" mean?  I guess this must
> mean assigning a VF to a partition, injecting a hotplug add event to
> that partition, and making the VF visible in config space?
 Firmware basically adds a device node to the device tree as defined
 in the (PAPR) Power Architecture Platform Requirements document.
>>> From the point of view of the kernel that consumes this device tree
>>> and owns the VF, I guess this looks like a hot-add.  
>> Correct, this is intended. We want to minimize change and only focus
>> on configure SR-IOV path in the PF on PSeries platform.
>>> It would be nice
>>> if this could be exposed to that kernel by having the firmware inject
>>> a normal PCIe hotplug interrupt, but I'm guessing it's not that
>>> simple.
>> I don't understand entirely your suggestion. However, in the case of
>> interrupts PHYP owns all the resources and will be associated accordingly
>> with a partitionable endpoint (PE).
> Assume you have a Linux kernel, and PHYP assigns a VF to it.  What
> happens in that Linux kernel? 

When  kernel boots it goes through the lists of buses in the device tree. For 
each (physical,virtual)
bus the kernel reads the list of devices. After resources are read and assigned 
it will probe the devices.

If the kernel is up and running then this will start its flow through the dlpar 
add bus code.
Which then will read the device node and same operation as boot will entail of 
going
through the list of devices under the bus.

One should note that dynamically created VFs in configure SR-IOV path will have 
is_virtfn
set and others do not, therefore pcibios_bus_add_device will only be exercised 
in certain
cases.
>  How does it discover this new device?
This is treated as a normal hot plug operation for the Operating System that
the device gets assigned which can be  IBMi, AIX or Linux in this environment.
>
> I'm suggesting that it would be cool if that kernel received a hotplug
> interrupt from the Downstream Port leading to the new VF, and the
> pciehp driver could read the Slot Status register and see that
> "Presence Detect Changed" was set.  If that happened, the pciehp
> driver should automatically enumerate the new device and there
> wouldn't be much if any powerpc-specific code in the path.
To re-state Steve's earlier comment, PHYP will enumerate the virtual pci bus 
and is not
directly correlated to the real PCI bus that the PF is on. We have no control 
of the pci
bus enumeration for the device node added to the device tree.

Juan J. Alvarez



Re: [PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device

2017-10-17 Thread Juan Alvarez
On 10/17/17 11:26 AM, Bjorn Helgaas wrote:
> To pop back up to the top of the stack, I think the main point of this
> patch is to keep from binding a driver to the VFs in the kernel that
> set PCI_SRIOV_CTRL_VFE.  This patch does that by setting
> pdev->match_driver to -1 in powerpc-specific code.
Correct, to add to your comment, we will only add to the PSeries platform
 (in PowerPC)  configure SR-IOV path.
> I think all systems, not just powerpc, want to prevent this VF driver
> binding, so I hope there's a generic solution that everybody can use.
The patch that we made this change dependent on:

https://patchwork.kernel.org/patch/9882915/

Is a generic form of not binding a pci device to a device driver
and can be used in another environment if needed.
>
 So like existing way of enabling SRIOV we still rely on the PF driver to
 enable VFs - but in this case the attachment phase is done via a user
 action via a management console in our case (novalink or hmc) triggered
 event that will essentially act like a hotplug.

 So in the fine details of that user triggered action the system
 firmware will bind the VFs, allowing resources to be allocated to
 the VF.  - Which essentially does all the attaching as we know it
 today but is managed by PHYP not by the kernel.
>>> What exactly does "firmware binding the VFs" mean?  I guess this must
>>> mean assigning a VF to a partition, injecting a hotplug add event to
>>> that partition, and making the VF visible in config space?
>> Firmware basically adds a device node to the device tree as defined
>> in the (PAPR) Power Architecture Platform Requirements document.
> From the point of view of the kernel that consumes this device tree
> and owns the VF, I guess this looks like a hot-add.  
Correct, this is intended. We want to minimize change and only focus
on configure SR-IOV path in the PF on PSeries platform.
> It would be nice
> if this could be exposed to that kernel by having the firmware inject
> a normal PCIe hotplug interrupt, but I'm guessing it's not that
> simple.
I don't understand entirely your suggestion. However, in the case of
interrupts PHYP owns all the resources and will be associated accordingly
with a partitionable endpoint (PE).
> But if I understand correctly, this patch series isn't concerned with
> that kernel; it's concerned with the kernel that owns the PF and sets
> PCI_SRIOV_CTRL_VFE.
Correct, we will enable SR-IOV in the PF through the Linux kernel and map 
system resources
for the new VFs in powerpc platform specific kernel code. Furthermore, not 
binding
the device drivers for the VFs that get mapped within the configure SR-IOV path
is a requirement for this PSeries SR-IOV environment.

Juan J. Alvarez



Re: [PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device

2017-10-17 Thread Juan Alvarez

On 10/17/17 8:51 AM, Bjorn Helgaas wrote:
> This is where I get confused.  I guess the Linux that sets
> PCI_SRIOV_CTRL_VFE to enable the VFs can also perform config accesses
> to the VFs, since it can enumerate them and build pci_dev structs for
> them, right?

Correct, we are not changing anything related to how the VF gets initialized
in the kernel.
>
> And the Linux in the "Hosting Partition" is a guest that cannot see a
> VF until a management console attaches the VF to the Hosting
> Partition?  

Correct, this is how PHYP does normal adapter assignment.
> I'm not a VFIO or KVM expert but that sounds vaguely like
> what they would do when assigning a VF to a guest.
I do not know the specifics on VFIO and KVM. However, in this
case there is no KVM running on the Linux LPAR. PHYP owns all
the system resources.
>
>> So like existing way of enabling SRIOV we still rely on the PF driver to
>> enable VFs - but in this case the attachment phase is done via a user
>> action via a management console in our case (novalink or hmc) triggered
>> event that will essentially act like a hotplug.
>>
>> So in the fine details of that user triggered action the system
>> firmware will bind the VFs, allowing resources to be allocated to
>> the VF.  - Which essentially does all the attaching as we know it
>> today but is managed by PHYP not by the kernel.
> What exactly does "firmware binding the VFs" mean?  I guess this must
> mean assigning a VF to a partition, injecting a hotplug add event to
> that partition, and making the VF visible in config space?
Firmware basically adds a device node to the device tree as defined
in the (PAPR) Power Architecture Platform Requirements document.

Juan



Re: [PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device

2017-10-17 Thread Juan Alvarez
On 10/17/17 8:51 AM, Bjorn Helgaas wrote:

> This is where I get confused.  I guess the Linux that sets
> PCI_SRIOV_CTRL_VFE to enable the VFs can also perform config accesses
> to the VFs, since it can enumerate them and build pci_dev structs for
> them, right?
Correct, we are not changing anything related to how the VF gets initialized
in the Linux kernel.
>
> And the Linux in the "Hosting Partition" is a guest that cannot see a
> VF until a management console attaches the VF to the Hosting
> Partition? 
Correct, this is how PHYP does normal adapter assignment.

>  I'm not a VFIO or KVM expert but that sounds vaguely like
> what they would do when assigning a VF to a guest.
I do not know the specifics on VFIO and KVM. However, in this
case there is no KVM running on the Linux (LPAR) logical partition.
PHYP owns all the system resources.
>
>> So like existing way of enabling SRIOV we still rely on the PF driver to
>> enable VFs - but in this case the attachment phase is done via a user
>> action via a management console in our case (novalink or hmc) triggered
>> event that will essentially act like a hotplug.
>>
>> So in the fine details of that user triggered action the system
>> firmware will bind the VFs, allowing resources to be allocated to
>> the VF.  - Which essentially does all the attaching as we know it
>> today but is managed by PHYP not by the kernel.
> What exactly does "firmware binding the VFs" mean?  I guess this must
> mean assigning a VF to a partition, injecting a hotplug add event to
> that partition, and making the VF visible in config space?
Firmware basically adds a device node to the device tree as defined
in the (PAPR) Power Architecture Platform Requirements document.

Juan J. Alvarez