Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-20 Thread Oliver O'Halloran
On Fri, May 17, 2024 at 9:15 PM krishna kumar  wrote:
>
> > Uh, if I'm reading this right it looks like your "slot" C5 is actually
> > the PCIe switch's internal bus which is definitely not hot pluggable.
>
> It's a hotplug slot. Please see the snippet below:
>
> :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
>  SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug
> SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug
> SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug
> SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> :~$

All this is showing is that the switch downstream ports on bus 0004:02
have a slot capability. I already know that (see what I said
previously about physical links). The fact the downstream ports have a
slot capability also has absolutely nothing to do with anything I was
saying. Look at the lspci output for 0004:01:00.0 which is the
switch's upstream port. The upstream port device will not have a slot
capability because it's a bridge into the virtual PCI bus that is
internal to the switch.

> It seems like your explanation about the missing 0004:01:00.0 may be
> correct and could be due to a firmware bug. However, the scope of this
> patch does not relate to this issue. Additionally, if it starts with
> 0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug
> operations will remain inconsistent. This patch aims to address the
> inconsistent behavior of hot-unplug and hot-plug.
>
> *snip*
>
> > It might be worth adding some logic to pnv_php to verify the PCI
> > bridge upstream of the slot actually has the PCIe slot capability to
> > guard against this problem.
>
> We can have a look at this problem in another patch.

The point of this series is to fix the behaviour of pnv_php, is it
not? Powering off a PCI(e) slot is supposed to render it safe to
remove the card  in that slot. Currently if you "power off" C5, the
kernel is still going to have active references to the switch's
upstream port device (at 0004:01:00.0) and the switch management
function (at 0004:01:00.1). If the kernel has active references to PCI
devices physically located in the slot we supposedly powered off, then
the hotplug driver isn't doing its job. The asymmetry between hot add
and removal that you're trying to fix here is a side effect of the
fact that pnv_php is advertising the wrong thing as a slot. I think
you should stop pnv_php from advertising something as a slot when it's
not actually a slot because that's the root of all your problems.

> We wanted to handle the more generic case and did not want to be confined to
> only one device assumption. We want to fix the current inconsistent behavior
> more generically.

Right, as I said above I don't think handing the more generic case is
actually required if pnv_php is doing its job properly. It doesn't
hurt though.

> Regarding the fix, the fix is obvious:

really?

> We have to traverse
> and find the bridge ports from DT and invoke  pci_scan_slot() on them. This 
> will
> discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 
> on
> the given bus- 0004:02). There is already an existing function, 
> pci_scan_bridge()
> which is doing invocation of pci_scan_slot () for the devices behind the 
> bridge,
> in this case for  SAS device. So eventually, we are doing a scan of all the 
> entities
> behind the slot.

I already read your patch so I'm not sure why you feel the need to
re-describe it in tedious detail.

> Would you like me to combine the non-bridge and bridge cases into one? I can 
> attempt
> to do this. Hopefully, if we incorporate the iterate sibling logic case 
> correctly,
> we may not need to maintain these two separate cases for bridge and 
> non-bridge. I
> will attempt this, and if it works, I will include it in the next patch. 
> Thanks.

Yes, do that.

Also, do not post HTML emails to linux development lists. It breaks
plain text inline quoting which makes your messages annoying to reply
to. Some linux development lists will also silently drop HTML emails.
Please talk to the other LTC engineers about how to set up your mail
client to send plain text emails to avoid these problems in the
future.

Oliver


Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-16 Thread Oliver O'Halloran
On Tue, May 14, 2024 at 11:54 PM Krishna Kumar  wrote:
>
> There is an issue with the hotplug operation when it's done on the
> bridge/switch slot. The bridge-port and devices behind the bridge, which
> become offline by hot-unplug operation, don't get hot-plugged/enabled by
> doing hot-plug operation on that slot. Only the first port of the bridge
> gets enabled and the remaining port/devices remain unplugged. The hot
> plug/unplug operation is done by the hotplug driver
> (drivers/pci/hotplug/pnv_php.c).
>
> Root Cause Analysis: This behavior is due to missing code for the DPC
> switch/bridge.

I don't see anything touching DPC in this series?

> *snip*
>
> Command for reproducing the issue :
>
> For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
> For hot plug/enable -echo 1 > /sys/bus/pci/slots/C5/power
>
> where C5 is slot associated with bridge.
>
> Scenario/Tests:
> Output of lspci -nn before test is given below. This snippet contains
> devices used for testing on Powernv machine.
>
> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:08:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
> 0004:09:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
>
> Output of lspci -tv before test is as follows:
>
> # lspci -tv
>  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>  |   |   +-01.0-[08]00.0  Broadcom / 
> LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |   |   +-02.0-[09]00.0  Broadcom / 
> LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |   |   \-03.0-[0a-0e]--
>  |   \-00.1  PMC-Sierra Inc. Device 4052
>
> C5(bridge) and C6(End Point) slot address are as below:
> # cat /sys/bus/pci/slots/C5/address
> 0004:02:00
> # cat /sys/bus/pci/slots/C6/address
> 0004:09:00

Uh, if I'm reading this right it looks like your "slot" C5 is actually
the PCIe switch's internal bus which is definitely not hot pluggable.
I find it helps to look at the PCI topology in terms of where the
physical PCIe links are. Here we've got:

- A link between the PHB (0004:00:00.0) and the switch upstream port
(0004:01:00.0)
- A link from switch downstream port 0 (0004:02:00.0) to nothing
- A link from switch downstream port 1 (0004:02:01.0) to a SAS card
- A link from switch downstream port 2 (0004:02:02.0) to a SAS card
- A link from switch downstream port 2 (0004:02:03.0) to nothing

Note that there's no PCIe link between the switch upstream port
(0004:01:00.0) and the downstream ports on bus 0004:02. The connection
between those is invisible to us because it's custom bus logic
internal to the PCIe switch ASIC. What I think has happened here is
that system firmware has supplied bad PCIe slot information to OPAL
which has resulted in pnv_php advertising a slot in the wrong place.
Assuming this following the usual IBM convention I'd expect the bridge
device for C5 to be the PHB's root port and the bus should be 0004:01.
It might be worth adding some logic to pnv_php to verify the PCI
bridge upstream of the slot actually has the PCIe slot capability to
guard against this problem.

> Hot-unplug operation on slot associated with bridge:
> # echo 0 > /sys/bus/pci/slots/C5/power
> # lspci -tv
>  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--
>  |   \-00.1  PMC-Sierra Inc. Device 4052

Yep, "powering off" C5 doesn't remove the upstream port device. This
would create problems if you physically removed the card from C5 since
the kernel would assume the switch device is still present.

> *snip*


> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 38561d6a2079..bea612759832 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -493,4 +493,36 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev)
> pdn = pci_get_pdn(pdev);
> pdev->dev.archdata.pci_data = pdn;
>  }
> +
> +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, 
> struct pci_bus *bus)
> +{
> +   struct device_node *dn;
> +   int slotno;
> +
> +   u32 class = 0;
> +
> +   if (!of_property_read_u32(start->child, "class-code", )) {
> +   /* Call of pci_scan_slot for non-bridge/EP case */
> +   if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI)) {
> +   slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
> +   pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> +   return;
> +   }
> +   }
> +
> +   /* Iterate all siblings */
> +   

Re: PowerNV PCIe hotplug support?

2024-01-11 Thread Oliver O'Halloran
On Thu, Dec 28, 2023 at 4:49 PM Timothy Pearson
 wrote:
>
> Hrmm, potentially related, I'm getting a kernel oops when I try to hot unplug 
> the entire PCIe switch from the PHB4:

IIRC there's hard to fix races in the pci core around removal of pci
slot devices. Maybe someone fixed those since I tried last, but I'd
just avoid doing that if I were you.


Re: PowerNV PCIe hotplug support?

2024-01-11 Thread Oliver O'Halloran
On Thu, Dec 28, 2023 at 3:16 PM Timothy Pearson
 wrote:
>
> I've been evaluating some new options for our POWER9-based hardware in the 
> NVMe space, and would like some clarification on the current status of PCIe 
> hotplug for the PowerNV platforms.
>
> From what I understand, the pnv_php driver provides the basic hotplug 
> functionality on PowerNV.  What I'm not clear on is to what extent this is 
> intended to flow downstream to attached PCIe switches.

I did a bunch of work on NVMe hotplug back in the day and it worked
fine then. Most of that work was done with Gen3 PLX switches though
which are considerably dumber than the newer ones though.

> I have a test setup here that consists of a PMC 8533 switch and several 
> downstream NVMe drives, with the switch attached directly to the PHB4 root 
> port.  After loading the pnv_php module, I can disconnect the downstream NVMe 
> devices by either using echo 0 on /sys/bus/pcu/slots/Snnn/power, or by 
> doing a physical surprise unplug, however nothing I try can induce a newly 
> plugged device to train and be detected on the bus.  Even trying a echo 0 and 
> subsequent echo 1 to /sys/bus/pcu/slots/Snnn/power only results in the 
> device going offline, there seems to be no way to bring the device back 
> online short of a reboot.
>
> Hotplug of other devices connected directly to the PHB4 appears to work 
> properly (I can online and offline via the power node); the issue seems to be 
> restricted to downstream devices connected to the (theoretically hotplug 
> capable) PMC 8533 switch.

I'd suspect either the PCIe interrupt is not being delivered for some
reason (EEH might be isolating the PCIe switch port?) or the removal
is triggering downstream port containment on the switch. Check the
port capability status with lspci. IIRC pnv_php doesn't know anything
about DPC so you might need to have skiboot disable that by default to
keep the kernel happy.

> Is this the intended behavior for downstream (non-IBM) PCIe ports?  Raptor 
> can provide resources to assist in a fix if needed, but I would like to 
> understand if this is a bug or an unimplemented feature first, and if the 
> latter what the main issues are likely to be in implementation.

It *should* work and the WARN()/BUG() spew you're seeing are bugs that
just need fixing. That said, hotplug on PNV is a headache for a bunch
of reasons most of which are due to EEH. Something I was working
towards before I left IBM was refactoring how EEH worked internally so
we could eliminate the need for pnv_php and go back to using the
standard pcieport driver. Unfortunately, that's a big job and I can't
even remember how much of that work actually made it upstream.


Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Oliver O'Halloran
On Wed, Sep 27, 2023 at 9:03 AM Bjorn Helgaas  wrote:
>
> On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote:
> > ...
>
> > Actually, this is a question from my colleague from firmware team.
> > The original question is that:
> >
> > "Should I set CPER_SEV_FATAL for Generic Error Status Block when a
> > PCIe fatal error is detected? If set, kernel will always panic.
> > Otherwise, kernel will always not panic."
> >
> > So I pull a question about desired behavior of Linux kernel first :)
> > From the perspective of the kernel, CPER_SEV_FATAL for Generic Error
> > Status Block is not reasonable. The kernel will attempt to recover
> > Fatal errors, although recovery may fail.
>
> I don't know the semantics of CPER_SEV_FATAL or why it's there.
> With CPER, we have *two* error severities: a "native" one defined by
> the PCIe spec and another defined by the platform via CPER.
>
> I speculate that the reason for the CPER severity could be to provide
> a severity for error sources that don't have a "native" severity like
> AER does, or for the vendor to force the OS to restart (for
> CPER_SEV_FATAL, anyway) in cases where it might not otherwise.
>
> In the native case, we only have the PCIe severity and don't have the
> CPER severity at all, and I suspect that unless there's uncontained
> data corruption, we would rather handle even the most severe PCIe
> fatal error by disabling the specific device(s) instead of panicking
> and restarting the whole machine.

>From a user's point of view disabling a device is often worse than a
reboot. If you get a fatal error from a system's only network card
then disabling the card may result in the system being uncontactable
until someone manually recovers it. Similarly, if the disk hosting the
root filesystem disappears the system may not crash immediately (most
of what it needs will be in page cache), but there's no guarantee that
it can do anything useful in that state. In both cases forcing a
reboot will probably bring the system back into a usable state.

> So for PCIe errors, I'm not sure setting CPER_SEV_FATAL is beneficial
> unless the platform wants to force the OS to panic, e.g., maybe the
> platform knows about data corruption and/or the vendor wants the OS to
> panic as part of a reliability story.

The PCIe spec is (intentionally?) vague about the causes of fatal
errors. For all we know a device is reporting one because the embedded
OS it was running crashed and as a side effect it's been DMAing junk
into system memory for the past hour. If you know something about the
device in question maybe you can make those assumptions, but in
general it's impossible to assess the actual severity of an error.
Even in the case of a noisy link causing bit flips (it's possible,
LCRC is only 16bit and ECEC is optional) if we get corruption of the
address bits of the TLP header then the DMA might have overwritten
something important to the OS. From a hardware vendor's point of view
just forcing a reboot makes a lot of sense.

Paranoia aside, in a lot of cases PCI device errors are nothing major
and can be resolved by just restarting the device. However, there's no
way for generic kernel code to make that assessment and we probably
shouldn't have the kernel guess. I'd say the safest option would be to
punt that decision to userspace and provide some way to whitelist
devices that we can ignore errors from. I'm not familiar enough with
the ACPI to know if we have enough details in the notification it
sends to actually implement that though.

Oliver


Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-24 Thread Oliver O'Halloran
On Fri, Sep 22, 2023 at 8:23 AM David Laight  wrote:
>
> > It would be nice if they worked the same, but I suspect that vendors
> > may rely on the fact that CPER_SEV_FATAL forces a restart/panic as
> > part of their system integrity story.
>
> The file system errors created by a panic (especially an NMI panic)
> could easily be more problematic than a failed PCIe data transfer.
> Evan a read that returned ~0u - which can be checked for.
>
> Panicking a system that is converting TDM telephony to RTP for the
> 911 emergency service because a PCIe cable/riser connecting one of the
> TDM board has become loose doesn't seem ideal.

For kernel native AER the default reaction to errors is
reset-and-reinit which probably isn't much better for your case.
Sounds like you would want a knob to suppress everything except error
reporting so you can handle it in userspace?

> (Or because the TDM board's fpga has decided it isn't going to respond
> to any accesses until the BARs are setup again...)
>
> The system can carry on with some TDM connections disabled - but that
> is ok because they are all duplicated in case a cable gets cuit.

Well that's a relief :)

Oliver


Re: [RFC 0/3] Asynchronous EEH recovery

2023-06-12 Thread Oliver O'Halloran
On Tue, Jun 13, 2023 at 11:44 AM Ganesh Goudar  wrote:
>
> Hi,
>
> EEH recovery is currently serialized and these patches shorten
> the time taken for EEH recovery by making the recovery to run
> in parallel. The original author of these patches is Sam Bobroff,
> I have rebased and tested these patches.
>
> On powervm with 64 VFs from same PHB,  I see approximately 48%
> reduction in time taken in EEH recovery.
>
> On powernv with 9 network cards, Where 2 cards installed on one
> PHB and 1 card on each of the rest of the PHBs, Providing 20 PFs
> in total. I see approximately 33% reduction in time taken in EEH
> recovery.
>
> These patches were originally posted as separate RFCs by Sam, And
> I rebased and posted these patches almost a year back, I stopped
> pursuing these patches as I was not able test this on powernv, Due
> to the issues in drivers of cards I was testing this on, Which are
> now resolved. Since I am re-posting this after long time, Posting
> this as a fresh RFC, Please comment.

What changes have you made since the last time you posted this series?
If the patches are the same then the comments I posted last time still
apply.


Re: [RFC 0/3] Asynchronous EEH recovery

2022-08-17 Thread Oliver O'Halloran
On Tue, Aug 16, 2022 at 1:29 PM Ganesh Goudar  wrote:
>
> Hi,
>
> EEH reocvery is currently serialized and these patches shorten
> the time taken for EEH recovery by making the recovery to run
> in parallel. The original author of these patches is Sam Bobroff,
> I have rebased and tested these patches.
>
> On powervm with 64 VFs and I see approximately 48% reduction
> in time taken in EEH recovery,

Can you elaborate on how you're testing this? A VM with 64 separate
VFs on 64 separate PHBs I guess? How are you injecting errors?

> Yet to be tested on powernv.

If you're not testing on powernv you might as well not be testing. EEH
support on pseries is relatively straightforward since managing PCI
bridges, etc is all hidden away in the hypervisor. Most of the things
about EEH which give me headaches simply don't happen on pseries since
the PCI topology seen by the guest is flat.

> These patches were originally posted as separate RFCs, I think
> posting them as single series would be more helpful, I know the
> patches are too big, I will try to logically divide in next
> iterations.
>
> Thanks
>
> Ganesh Goudar (3):
>   powerpc/eeh: Synchronization for safety

I didn't pick that patch up after Sam left for a reason. I don't
recall the exact details, but I think I decided that the lock being
added had the same (or mostly the same) scope as the pci rescan lock.
All modifications to the EEH PE tree have to occur with the rescan
lock held since the per-device EEH setup occurs while configuring the
pci_dev (similarly the teardown happens when we remove the pci_dev
from its bus). I don't think that was true when Same wrote the patch
originally though since it pre-dates my big EEH init re-work.

It's possible (probable even!) I got that wrong, or that it was
something I was trying to make true through re-works rather than an
accurate assessment of the current state of things. I'll try page some
of this back into my brain later. I think I left some comments on
Sam's original RFC patch which might still be valid.

>   powerpc/eeh: Asynchronous recovery

I remember not hating the idea, but I think the implementation leaves
a lot to be desired. If you only really care about pseries then just
moving to a model where each PHB has its own recovery thread would get
you most of the benefit without needing to turn the already
complicated recovery path into an async trainwreck. IIRC Sam's
motivation for that patch was to make recovery faster for powernv
systems when an error was injected on a PF with a lot of child VFs.
Certain drivers took forever to recover each VF (might have been mlx5)
so doing it in parallel would have sped things up considerably.

Oliver


Re: [PATCH] MAINTAINERS: Remove myself as EEH maintainer

2022-08-11 Thread Oliver O'Halloran
On Thu, Aug 11, 2022 at 4:22 PM Michael Ellerman  wrote:
>
> Russell Currey  writes:
> > I haven't touched EEH in a long time I don't have much knowledge of the
> > subsystem at this point either, so it's misleading to have me as a
> > maintainer.
>
> Thank you for your service.
>
> > I remain grateful to Oliver for picking up my slack over the years.
>
> Ack.
>
> But I wonder if he is still happy being listed as the only maintainer.
> Given the status is "Supported" that means "Someone is actually paid to
> look after this" - and I suspect Oracle are probably not paying him to
> do that?

I'm still happy to field questions and/or give reviews occasionally if
needed, but yeah I don't have the time, hardware, or inclination to do
any actual maintenance. IIRC Mahesh was supposed to take over
supporting EEH after I left IBM. If he's still around he should
probably be listed as a maintainer.


Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains

2022-07-28 Thread Oliver O'Halloran
On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy  wrote:
>
> *snip*
>
> About this. If a platform has a concept of explicit DMA windows (2 or
> more), is it one domain with 2 windows or 2 domains with one window each?
>
> If it is 2 windows, iommu_domain_ops misses windows manipulation
> callbacks (I vaguely remember it being there for embedded PPC64 but
> cannot find it quickly).
>
> If it is 1 window per a domain, then can a device be attached to 2
> domains at least in theory (I suspect not)?
>
> On server POWER CPUs, each DMA window is backed by an independent IOMMU
> page table. (reminder) A window is a bus address range where devices are
> allowed to DMA to/from ;)

I've always thought of windows as being entries to a top-level "iommu
page table" for the device / domain. The fact each window is backed by
a separate IOMMU page table shouldn't really be relevant outside the
arch/platform.


Re: [PATCH] powerpc/powernv/pci: Drop VF MPS fixup

2022-05-19 Thread Oliver O'Halloran
On Thu, May 19, 2022 at 10:38 PM Michael Ellerman  wrote:
>
> Christophe Leroy  writes:
> > Le 02/09/2020 à 05:51, Oliver O'Halloran a écrit :
> >> The MPS field in the VF config space is marked as reserved in current
> >> versions of the SR-IOV spec. In other words, this fixup doesn't do
> >> anything.
> >>
> >> Signed-off-by: Oliver O'Halloran 
> >
> > A lot of cleanup patches from Oliver were merged in Septembre 2020 but
> > not this one.
> >
> > Any reason ?
>
> It wasn't clear to me that it's safe to remove. The commit that added it
> seemed to think it was important.
>
> The fact that it's out-of-spec doesn't mean we don't have some hardware
> somewhere that relies on that.

There is no hardware that depends on it. It was added in response to a
bug report on the IBM internal bugzilla about virtual functions not
reporting the same MPS as the physical function in the output of
lspci. This is by design since MPS is a property that is only relevant
to the PF. There was a corresponding patch to skiboot to intercept
writes to the MPS field of VFs which was used to fake a writable MPS
field in firmware. I removed that hack in 2019
(https://github.com/open-power/skiboot/commit/22057f868f3b2b1fd02647a738f6da0858b5eb6c)
since it was pointless and was causing other problems. There's no real
reason to keep this code around IMO.


Re: [PATCH] powerpc/eeh: Delay slot presence check once driver is notified about the pci error.

2021-11-24 Thread Oliver O'Halloran
On Wed, Nov 24, 2021 at 12:05 AM Mahesh Salgaonkar  wrote:
>
> *snip*
>
> This causes the EEH handler to get stuck for ~6
> seconds before it could notify that the pci error has been detected and
> stop any active operations. Hence with running I/O traffic, during this 6
> seconds, the network driver continues its operation and hits a timeout
> (netdev watchdog).On timeouts, network driver go into ffdc capture mode
> and reset path assuming the PCI device is in fatal condition. This causes
> EEH recovery to fail and sometimes it leads to system hang or crash.

Whatever is causing that crash is the real issue IMO. PCI error
reporting is fundamentally asynchronous and the driver always has to
tolerate some amount of latency between the error occuring and being
reported. Six seconds is admittedly an eternity, but it should not
cause a system crash under any circumstances. Printing a warning due
to a timeout is annoying, but it's not the end of the world.


Re: [PATCH] powerpc/eeh: Delay slot presence check once driver is notified about the pci error.

2021-11-24 Thread Oliver O'Halloran
On Wed, Nov 24, 2021 at 7:45 PM Mahesh J Salgaonkar
 wrote:
>
> No it doesn't. We will still do a presence check before the recovery
> process starts. This patch moves the check after notifying the driver to
> stop active I/O operations. If a presence check finds the device isn't
> present, we will skip the EEH recovery. However, on a surprise hotplug,
> the user will see the EEH messages on the console before it finds there
> is nothing to recover.

Suppressing the spurious EEH messages was part of why I added that
check in the first place. If you want to defer the presence check
until later you should move the stack trace printing, etc to after
we've confirmed there are still devices present. Considering the
motivation for this patch is to avoid spurious warnings from the
driver I don't think printing spurious EEH messages is much of an
improvement.

The other option would be returning an error from the pseries hotplug
driver. IIRC that's what pnv_php / OPAL does if the PHB is fenced and
we can't check the slot presence state.


[PATCH] pci: Rename pcibios_add_device to match

2021-09-13 Thread Oliver O'Halloran
The general convention for pcibios_* hooks is that they're named after
the corresponding pci_* function they provide a hook for. The exception
is pcibios_add_device() which provides a hook for pci_device_add(). This
has been irritating me for years so rename it.

Also, remove the export of the microblaze version. The only caller
must be compiled as a built-in so there's no reason for the export.

Signed-off-by: Oliver O'Halloran 
---
 arch/microblaze/pci/pci-common.c   | 3 +--
 arch/powerpc/kernel/pci-common.c   | 2 +-
 arch/powerpc/platforms/powernv/pci-sriov.c | 2 +-
 arch/s390/pci/pci.c| 2 +-
 arch/sparc/kernel/pci.c| 2 +-
 arch/x86/pci/common.c  | 2 +-
 drivers/pci/pci.c  | 4 ++--
 drivers/pci/probe.c| 4 ++--
 include/linux/pci.h| 2 +-
 9 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 557585f1be41..622a4867f9e9 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -587,13 +587,12 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pcibios_fixup_resources);
 
-int pcibios_add_device(struct pci_dev *dev)
+int pcibios_device_add(struct pci_dev *dev)
 {
dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
 
return 0;
 }
-EXPORT_SYMBOL(pcibios_add_device);
 
 /*
  * Reparent resource children of pr that conflict with res
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index c3573430919d..6749905932f4 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1059,7 +1059,7 @@ void pcibios_bus_add_device(struct pci_dev *dev)
ppc_md.pcibios_bus_add_device(dev);
 }
 
-int pcibios_add_device(struct pci_dev *dev)
+int pcibios_device_add(struct pci_dev *dev)
 {
struct irq_domain *d;
 
diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..486c2937b159 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -54,7 +54,7 @@
  * to "new_size", calculated above. Implementing this is a convoluted process
  * which requires several hooks in the PCI core:
  *
- * 1. In pcibios_add_device() we call pnv_pci_ioda_fixup_iov().
+ * 1. In pcibios_device_add() we call pnv_pci_ioda_fixup_iov().
  *
  *At this point the device has been probed and the device's BARs are sized,
  *but no resource allocations have been done. The SR-IOV BARs are sized
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index e7e6788d75a8..ded3321b7208 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -561,7 +561,7 @@ static void zpci_cleanup_bus_resources(struct zpci_dev 
*zdev)
zdev->has_resources = 0;
 }
 
-int pcibios_add_device(struct pci_dev *pdev)
+int pcibios_device_add(struct pci_dev *pdev)
 {
struct zpci_dev *zdev = to_zpci(pdev);
struct resource *res;
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 9c2b720bfd20..31b0c1983286 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -1010,7 +1010,7 @@ void pcibios_set_master(struct pci_dev *dev)
 }
 
 #ifdef CONFIG_PCI_IOV
-int pcibios_add_device(struct pci_dev *dev)
+int pcibios_device_add(struct pci_dev *dev)
 {
struct pci_dev *pdev;
 
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 3507f456fcd0..9e1e6b8d8876 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -632,7 +632,7 @@ static void set_dev_domain_options(struct pci_dev *pdev)
pdev->hotplug_user_indicators = 1;
 }
 
-int pcibios_add_device(struct pci_dev *dev)
+int pcibios_device_add(struct pci_dev *dev)
 {
struct pci_setup_rom *rom;
struct irq_domain *msidom;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ce2ab62b64cf..c63598c1cdd8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2091,14 +2091,14 @@ void pcim_pin_device(struct pci_dev *pdev)
 EXPORT_SYMBOL(pcim_pin_device);
 
 /*
- * pcibios_add_device - provide arch specific hooks when adding device dev
+ * pcibios_device_add - provide arch specific hooks when adding device dev
  * @dev: the PCI device being added
  *
  * Permits the platform to provide architecture specific functionality when
  * devices are added. This is the default implementation. Architecture
  * implementations can override this.
  */
-int __weak pcibios_add_device(struct pci_dev *dev)
+int __weak pcibios_device_add(struct pci_dev *dev)
 {
return 0;
 }
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index d9fc02a71baa..2ba43b6adf31 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2450,7 +2450,7 @@ static struct irq_domain *pci_dev_msi_domain(s

Re: [PATCH 1/1] powerpc: Drop superfluous pci_dev_is_added() calls

2021-09-13 Thread Oliver O'Halloran
On Sat, Sep 11, 2021 at 9:09 PM Michael Ellerman  wrote:
>
> Niklas Schnelle  writes:
> > On powerpc, pci_dev_is_added() is called as part of SR-IOV fixups
> > that are done under pcibios_add_device() which in turn is only called in
> > pci_device_add() whih is called when a PCI device is scanned.
>
> Thanks for cleaning this up for us.
>
> > Now pci_dev_assign_added() is called in pci_bus_add_device() which is
> > only called after scanning the device. Thus pci_dev_is_added() is always
> > false and can be dropped.
>
> My only query is whether we can pin down when that changed.
>
> Oliver said:
>
>   The use of pci_dev_is_added() in arch/powerpc was because in the past
>   pci_bus_add_device() could be called before pci_device_add(). That was
>   fixed a while ago so It should be safe to remove those calls now.
>
> I trawled back through the history a bit but I can't remember/find which
> commit changed that, Oliver can you remember?

Yeah, on closer inspection that never happened. The re-ordering I was
thinking of was when the boot-time BAR assignments were moved in
3f068aae7a95 so they'd always occur before pci_bus_add_device() was
called. I think I got that change mixed up with commit 30d87ef8b38d
("powerpc/pci: Fix pcibios_setup_device() ordering") which moved some
of what what pcibios_device_add() did into pcibios_bus_add_device() to
harmonise the hot and coldplug paths.

As far as I can tell the pci_dev_is_added() check has been pointless
since the code was added in 6e628c7d33d9 ("powerpc/powernv: Reserve
additional space for IOV BAR according to the number of total_pe").
Even back then pci_device_add() was called first in both the normal
and OF based PCI probing paths so there's no circumstance where that
code would see the added flag set.

That patch was part of the PowerNV SRIOV support series which went
through quite a few iterations. My best guess is that check might have
been needed in an earlier version and was just carried forward until
it got merged. I didn't dig too deeply into the history though.

Reviewed-by: Oliver O'Halloran 


Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-07 Thread Oliver O'Halloran
On Tue, Sep 7, 2021 at 10:21 PM Niklas Schnelle  wrote:
>
> On Tue, 2021-09-07 at 10:45 +0200, Niklas Schnelle wrote:
> > On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote:
> > > On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle  
> > > wrote:
> > > > Patch 3 I already sent separately resulting in the discussion below but 
> > > > without
> > > > a final conclusion.
> > > >
> > > > https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/
> > > >
> > > > I believe even though there were some doubts about the use of
> > > > pci_dev_is_added() by arch code the existing uses as well as the use in 
> > > > the
> > > > final patch of this series warrant this export.
> > >
> > > The use of pci_dev_is_added() in arch/powerpc was because in the past
> > > pci_bus_add_device() could be called before pci_device_add(). That was
> > > fixed a while ago so It should be safe to remove those calls now.
> >
> > Hmm, ok that confirms Bjorns suspicion and explains how it came to be.
> > I can certainly sent a patch for that. This would then leave only the
> > existing use in s390 which I added because of a dead lock prevention
> > and explained here:
> > https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.ca...@linux.ibm.com/
> >
> > Plus the need to use it in the recovery code of this series. I think in
> > the EEH code the need for a similar check is alleviated by the checks
> > in the beginning of
> > arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially
> > eeh_slot_presence_check() which checks presence via the hotplug slot.
> > I guess we could use our own state tracking in a similar way but felt
> > like pci_dev_is_added() is the more logical choice.

The slot check is mainly there to prevent attempts to "recover"
devices that have been surprise removed (i.e NVMe hot-unplug). The
actual recovery process operates off the eeh_pe tree which is frozen
in place when an error is detected. If a pci_dev is added or removed
it's not really a problem since those are only ever looked at when
notifying drivers which is done with the rescan_remove lock held. That
said, I wouldn't really encourage anyone to follow the EEH model since
it's pretty byzantine.

> Looking into this again, I think we actually can't easily track this
> state ourselves outside struct pci_dev. The reason for this is that
> when e.g. arch/s390/pci/pci_sysfs.c:recover_store() removes the struct
> pci_dev and scans it again the new struct pci_dev re-uses the same
> struct zpci_dev because from a platform point of view the PCI device
> was never removed but only disabled and re-enabled. Thus we can only
> distinguish the stale struct pci_dev by looking at things stored in
> struct pci_dev itself.

IMO the real problem is removing and re-adding the pci_dev. I think
it's something that's done largely because the PCI core doesn't really
provide any better mechanism for getting a device back into a
known-good state so it's abused to implement error recovery. This is
something that's always annoyed me since it conflates recovery with
hotplug. After a hot-(un)plug we might have a different device or no
device. In the recovery case we expect to start and end with the same
device. Why not apply the same logic to the pci_dev?

Something I was tinkering with before I left IBM was re-working the
way EEH handles recovering devices that don't have a driver with error
handling callbacks to something like:

1. unbind the driver
2. pci_save_state()
3. do the reset
4. pci_restore_state()
5. re-bind the driver

That would allow keeping the pci_dev around and let me delete a pile
of confusing code which handles binding the eeh_dev to the new
pci_dev. The obvious problem with that approach is the assumption the
device is functional enough to allow saving the config space, but I
don't think that's a deal breaker. We could stash a copy of the device
state before we allow drivers to attach and use that to restore the
device after the reset. The end result would be the same known-good
state that we'd get after a re-scan.

> That said, I think for the recovery case we might be able to drop the
> pci_dev_is_added() and rely on pdev->driver != NULL which we check
> anyway and that should catch any PCI device that was already removed.

Would that work if there was an error on a device without a driver
bound? If you're just trying to stop races between recovery and device
removal then pci_dev_is_added() is probably the right tool for the
job. Trying to substitute it with a proxy seems like a bad idea.


Re: [PATCH] pci/hotplug/pnv-php: Remove probable double put

2021-09-07 Thread Oliver O'Halloran
On Wed, Sep 8, 2021 at 8:02 AM Tyrel Datwyler  wrote:
>
> On 9/7/21 1:59 AM, Xu Wang wrote:
> > Device node iterators put the previous value of the index variable,
> > so an explicit put causes a double put.
> >
> > Signed-off-by: Xu Wang 
> > ---
> >  drivers/pci/hotplug/pnv_php.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> > index 04565162a449..ed4d1a2c3f22 100644
> > --- a/drivers/pci/hotplug/pnv_php.c
> > +++ b/drivers/pci/hotplug/pnv_php.c
> > @@ -158,7 +158,6 @@ static void pnv_php_detach_device_nodes(struct 
> > device_node *parent)
> >   for_each_child_of_node(parent, dn) {
> >   pnv_php_detach_device_nodes(dn);
> >
> > - of_node_put(dn);
> >   of_detach_node(dn);
>
> Are you sure this is a double put? This looks to me like its meant to drive 
> tear
> down of the device by putting a long term reference and not the short term get
> that is part of the iterator.

Yeah, the put is there is to drop the initial ref so the node can be
released. It might be worth adding a comment.


Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-06 Thread Oliver O'Halloran
On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle  wrote:
>
> Patch 3 I already sent separately resulting in the discussion below but 
> without
> a final conclusion.
>
> https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/
>
> I believe even though there were some doubts about the use of
> pci_dev_is_added() by arch code the existing uses as well as the use in the
> final patch of this series warrant this export.

The use of pci_dev_is_added() in arch/powerpc was because in the past
pci_bus_add_device() could be called before pci_device_add(). That was
fixed a while ago so It should be safe to remove those calls now.

> Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit
> e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which
> already exported pci_dev_trylock(). In the final patch we make use of
> pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished
> before starting recovery.

Hmm, I noticed the EEH
(arch/powerpc/kernel/eeh_driver.c:eeh_pe_report_edev())  and the
generic PCIe error recovery code (see
drivers/pci/pcie/err.c:report_error_detected()) only call
device_lock() before entering the driver's error handling callbacks. I
wonder if they should be using pci_dev_lock() instead. The only real
difference is that pci_dev_lock() will also block user space from
accessing the device's config space while error recovery is in
progress which seems sensible enough.


Re: [PATCH] powerpc/chrp: Revert "Move PHB discovery" and "Make hydra_init() static"

2021-07-18 Thread Oliver O'Halloran
On Sun, Jul 18, 2021 at 2:14 AM Guenter Roeck  wrote:
>
> On Sun, Jul 18, 2021 at 01:54:23AM +1000, Oliver O'Halloran wrote:
> > On Sat, Jul 17, 2021 at 8:12 AM Guenter Roeck  wrote:
> > >
> > > This patch reverts commit 407d418f2fd4 ("powerpc/chrp: Move PHB
> > > discovery") and commit 9634afa67bfd ("powerpc/chrp: Make hydra_init()
> > > static").
> > >
> > > Running the upstream kernel on Qemu's brand new "pegasos2" emulation
> > > results in a variety of backtraces such as
> >
> > ...and actually using it appears to require both manually enabling it
> > in the qemu config and finding a random bios blob that is no longer
> > distributed by the manufacturer. Cool.
> >
> That is absolutely wrong. vof.bin provided by qemu in the linux root
> directory works just fine, plus chrp32_defconfig minus SMP.

It looks like I forgot to actually merge the current HEAD into my
local qemu branch which was a few weeks old. VOF support was merged on
the 13th so I didn't see it. My apologies.

Oliver


Re: [PATCH] powerpc/chrp: Revert "Move PHB discovery" and "Make hydra_init() static"

2021-07-18 Thread Oliver O'Halloran
On Sun, Jul 18, 2021 at 2:24 AM Guenter Roeck  wrote:
>
> On Sat, Jul 17, 2021 at 05:57:50PM +0200, Christophe Leroy wrote:
> > Guenter Roeck  a écrit :
> >
> > > This patch reverts commit 407d418f2fd4 ("powerpc/chrp: Move PHB
> > > discovery") and commit 9634afa67bfd ("powerpc/chrp: Make hydra_init()
> > > static").
> > >
> > > Running the upstream kernel on Qemu's brand new "pegasos2" emulation
> > > results in a variety of backtraces such as
> > >
> > > Kernel attempted to write user page (a1) - exploit attempt? (uid: 0)
> > > [ cut here ]
> > > Bug: Write fault blocked by KUAP!
> > > WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/fault.c:230
> > > do_page_fault+0x4f4/0x920
> > > CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.2 #40
> > > NIP:  c0021824 LR: c0021824 CTR: 
> > > REGS: c1085d50 TRAP: 0700   Not tainted  (5.13.2)
> > > MSR:  00021032   CR: 24042254  XER: 
> > >
> > > GPR00: c0021824 c1085e10 c0f8c520 0021 3fffefff c1085c60 c1085c58
> > > 
> > > GPR08: 1032   c0ffb3ec 44042254  
> > > 0004
> > > GPR16:   00c4 00d0 0188c6e0 01006000 0001
> > > 40b14000
> > > GPR24: c0ec000c 0300 0200  4200 00a1 
> > > c1085e60
> > > NIP [c0021824] do_page_fault+0x4f4/0x920
> > > LR [c0021824] do_page_fault+0x4f4/0x920
> > > Call Trace:
> > > [c1085e10] [c0021824] do_page_fault+0x4f4/0x920 (unreliable)
> > > [c1085e50] [c0004254] DataAccess_virt+0xd4/0xe4
> > >
> > > and the system fails to boot. Bisect points to commit 407d418f2fd4
> > > ("powerpc/chrp: Move PHB discovery"). Reverting this patch together with
> > > commit 9634afa67bfd ("powerpc/chrp: Make hydra_init() static") fixes
> > > the problem.
> >
> > Isn't there more than that in the backtrace ? If there is a fault blocked by
> > Kuap, it means there is a fault. It should be visible in the traces.
> >
> > Should we fix the problem instead of reverting the commit that made the
> > problem visible ?
> >
>
> I do not think the patch reverted here made the problem visible. I am
> quite sure that it introduced it. AFAIS the problem is that the new code
> initializes and remaps PCI much later, after it is being used.

Right. The bug is that on 32bit platforms the PHB setup also maps one
of the PHB's IO space as "ISA IO space" as a side effect. There's a
handful of platforms (pegasos2 is one) which use an i8259 interrupt
controller and configuring that requires access to IO / ISA space. The
KUAP faults we're setting are because isa_io_base is still set to zero
so outb() and friends are accessing the zero page.

I don't think there's any real reason why we need to have PCI fully
set up to handle that situation. A few platforms already have early
fixup code which parses the DT directly rather than using the fields
of pci_controller (which are parsed from the DT anyway) and I'm pretty
sure we can do something similar.

> Also, the
> patch introducing the problem was never tested on real hardware (it even
> says so in the patch comments). That by itself seems to be quite
> problematic for such an invasive patch, and makes me wonder if some of
> the other PHB discovery related patches introduced similar problems.

The legacy platforms are maintained on a best-effort basis. Ellerman's
CI farm covers most of the powerpc CPU types, but there's no real way
to test the bulk of the platforms in the tree since most of the
hardware is currently in landfill.

> Anyway, I do not use or have that hardware. I was just playing with the
> latest version of qemu and ended up tracking down why its brand new
> pegasos2 emulation no longer boots with the latest Linux kernel.
> I personally don't care too much if ppc/chrp support is broken in the
> upstream kernel or not. Please take this patch as problem report,
> and feel free to do with it whatever you like, including ignoring it.

Problem reports are fine and appreciated. I'd be less cranky if you
included the kernel config you used in the initial report since I
wasted an hour of my saturday trying to replicate it with various
kernel configs that had SMP enabled since that's what the
chrp_defconfig uses.

Oliver


Re: [PATCH] powerpc/chrp: Revert "Move PHB discovery" and "Make hydra_init() static"

2021-07-17 Thread Oliver O'Halloran
On Sat, Jul 17, 2021 at 8:12 AM Guenter Roeck  wrote:
>
> This patch reverts commit 407d418f2fd4 ("powerpc/chrp: Move PHB
> discovery") and commit 9634afa67bfd ("powerpc/chrp: Make hydra_init()
> static").
>
> Running the upstream kernel on Qemu's brand new "pegasos2" emulation
> results in a variety of backtraces such as

...and actually using it appears to require both manually enabling it
in the qemu config and finding a random bios blob that is no longer
distributed by the manufacturer. Cool.

> Kernel attempted to write user page (a1) - exploit attempt? (uid: 0)
> [ cut here ]
> Bug: Write fault blocked by KUAP!
> WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/fault.c:230 
> do_page_fault+0x4f4/0x920
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.2 #40
> NIP:  c0021824 LR: c0021824 CTR: 
> REGS: c1085d50 TRAP: 0700   Not tainted  (5.13.2)
> MSR:  00021032   CR: 24042254  XER: 
>
> GPR00: c0021824 c1085e10 c0f8c520 0021 3fffefff c1085c60 c1085c58 
> GPR08: 1032   c0ffb3ec 44042254   0004
> GPR16:   00c4 00d0 0188c6e0 01006000 0001 40b14000
> GPR24: c0ec000c 0300 0200  4200 00a1  c1085e60
> NIP [c0021824] do_page_fault+0x4f4/0x920
> LR [c0021824] do_page_fault+0x4f4/0x920
> Call Trace:
> [c1085e10] [c0021824] do_page_fault+0x4f4/0x920 (unreliable)
> [c1085e50] [c0004254] DataAccess_virt+0xd4/0xe4
>
> and the system fails to boot. Bisect points to commit 407d418f2fd4
> ("powerpc/chrp: Move PHB discovery"). Reverting this patch together with
> commit 9634afa67bfd ("powerpc/chrp: Make hydra_init() static") fixes
> the problem.

The rationale for adding ppc_md.discover_phbs() and shifting all the
platforms over to using it is in commit 5537fcb319d0 ("powerpc/pci:
Add ppc_md.discover_phbs()"). I'd rather not go back to having random
platforms doing their PCI init before the kernel has setup the page
allocator. You need to either debug the problem fully, or provide
enough replication details so that someone who isn't invested in
emulating ancient hardware (i.e. me) with enough information to
actually replicate the problem.

Oliver


Re: [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence

2021-06-18 Thread Oliver O'Halloran
On Fri, Jun 18, 2021 at 4:05 PM Wesley Sheng  wrote:
>
> Reset_link() callback function was called before mmio_enabled() in
> pcie_do_recovery() function actually, so rearrange the general
> sequence betwen step 2 and step 3 accordingly.

I don't think this is true in all cases. If pcie_do_recovery() is
called with state==pci_channel_io_normal (i.e. non-fatal AER) the link
won't be reset. EEH (ppc PCI error recovery thing) also uses
.mmio_enabled() as described.


Re: [PATCH] powerpc: Fix kernel-jump address for ppc64 wrapper boot

2021-06-08 Thread Oliver O'Halloran
On Tue, Jun 8, 2021 at 4:33 PM He Ying  wrote:
>
> Hello,
>
> 在 2021/6/8 13:26, Oliver O'Halloran 写道:
> > On Fri, Jun 4, 2021 at 7:39 PM He Ying  wrote:
> >>  From "64-bit PowerPC ELF Application Binary Interface Supplement 1.9",
> >> we know that the value of a function pointer in a language like C is
> >> the address of the function descriptor and the first doubleword
> >> of the function descriptor contains the address of the entry point
> >> of the function.
> >>
> >> So, when we want to jump to an address (e.g. addr) to execute for
> >> PPC-elf64abi, we should assign the address of addr *NOT* addr itself
> >> to the function pointer or system will jump to the wrong address.
> > How have you tested this?
>
> I tested ppc64-elf big-endian. I changed the Kconfig so that ppc64
> big-endian
>
> selects PPC64_WRAPPER_BOOT. I used qemu to run the cuImage and found
>
> the problem. It made me confused. By applying this patch, I found it works.
>
> I thought it works for ppc64le too. So I upstream this patch.
>
> >
> > IIRC the 64bit wrapper is only used for ppc64le builds. For that case
> > the current code is work because the LE ABI (ABIv2) doesn't use
> > function descriptors. I think even for a BE kernel we need the current
> > behaviour because the vmlinux's entry point is screwed up (i.e.
> > doesn't point a descriptor) and tools in the wild (probably kexec)
> > expect it to be screwed up.
>
> Yes, you're right. PPC64_WRAPPER_BOOT is only used for ppc64le builds
> currently.
>
> LE ABI (ABI v2) doesn't use function descriptors. Is that right? I don't
> test that. If so,
>
> this patch should be dropped. But why does ppc64 have different ABIs? So
> strange.

Yeah, it is strange. When LE support was added the toolchain team took
the opportunity to revamp the ABI since BE and LE binaries were never
going to be compatible. IIRC there is a slight performance advantage
to using v2 since function descriptors added an extra load when
performing a non-local function call. I think.

> If the wrapper is built to ppc64be, my patch is tested right. The entry
> point in the ELF
>
> header is always right so you can assign the header->e_entry to the
> function pointer
>
> and then jump to the entry by calling the function. But in the ppc
> wrapper, the address
>
> is intialized to 0 or malloced to be an address later. In this
> situation, I think my patch
>
> should be right for ppc64be.

Yeah maybe it's fine. I just have some memories of running into some
bizzare edge case at some point. It might have been the entrypoint of
the zImage rather than the vmlinux which had (has?) that problem.


Re: [PATCH] powerpc: Fix kernel-jump address for ppc64 wrapper boot

2021-06-07 Thread Oliver O'Halloran
On Fri, Jun 4, 2021 at 7:39 PM He Ying  wrote:
>
> From "64-bit PowerPC ELF Application Binary Interface Supplement 1.9",
> we know that the value of a function pointer in a language like C is
> the address of the function descriptor and the first doubleword
> of the function descriptor contains the address of the entry point
> of the function.
>
> So, when we want to jump to an address (e.g. addr) to execute for
> PPC-elf64abi, we should assign the address of addr *NOT* addr itself
> to the function pointer or system will jump to the wrong address.

How have you tested this?

IIRC the 64bit wrapper is only used for ppc64le builds. For that case
the current code is work because the LE ABI (ABIv2) doesn't use
function descriptors. I think even for a BE kernel we need the current
behaviour because the vmlinux's entry point is screwed up (i.e.
doesn't point a descriptor) and tools in the wild (probably kexec)
expect it to be screwed up.

ABIv2 (LE) reference:
https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture


Re: [PATCH] powerpc/eeh: skip slot presence check when PE is temporarily unavailable.

2021-05-06 Thread Oliver O'Halloran
On Fri, May 7, 2021 at 3:43 AM Mahesh Salgaonkar  wrote:
>
> When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> state as temporarily unavailable. In this case, per PAPR, rtas call
> ibm,read-slot-reset-state2 returns a PE state as temporarily unavailable(5)
> and OS has to wait until that recovery is complete. During this state the
> slot presence check 'get-sensor-state(dr-entity-sense)' returns as DR
> connector empty which leads to assumption that the device has been
> hot-removed. This results into no EEH recovery on this device and it stays
> in failed state forever.
>
> This patch fixes this issue by skipping slot presence check only if device
> PE state is temporarily unavailable(5).
>
> Signed-off-by: Mahesh Salgaonkar 
> ---
> * snip*
>
> /*
>  * It should be corner case that the parent PE has been
> diff --git a/arch/powerpc/kernel/eeh_driver.c 
> b/arch/powerpc/kernel/eeh_driver.c
> index 3eff6a4888e79..a0913768f33de 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -851,6 +851,17 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> return;
> }
>
> +   /*
> +* When PE's state is temporarily unavailable, the slot
> +* presence check returns as DR connector empty.

That sounds like a bug in either RTAS or the hotplug slot driver (or
both). The presence check is there largely to filter out events that
we can guarantee are not recoverable (i.e. surprise hot-unplug). In
every other case (especially if we can't determine the state) we
should be going down the recovery path. If the hotplug slot driver is
incorrectly reporting the card has been removed then you should be
fixing the slot driver.

> +* to assumption that the device is hot-removed and causes EEH
> +* recovery to stop leaving the device in failed state forever.
> +* Hence skip the slot presence check if PE's state is
> +* temporarily unavailable and go down EEH recovery path.
> +*/
> +   if (pe->state & EEH_PE_TEMP_UNAVAIL)
> +   goto skip_slot_presence_check;

There's a time-of-check-vs-time-of-use error here. You're setting this
flag at the point of detection, but there can be a significant lag
time between when an EEH is initially detected and when it's handled
by the recovery thread (usually due to other events being recovered).
Transitioning the PE into and out of PE_TEMP_UNAVAILABLE is handled
autonomously by the hypervisor so by the time we get to recovery the
PE may be back into a normal state where the slot check works fine.


Re: [PATCH] powerpc/powernv/pci: remove dead code from !CONFIG_EEH

2021-04-22 Thread Oliver O'Halloran
On Fri, Apr 23, 2021 at 9:09 AM Daniel Axtens  wrote:
>
> Hi Nick,
>
> > While looking at -Wundef warnings, the #if CONFIG_EEH stood out as a
> > possible candidate to convert to #ifdef CONFIG_EEH, but it seems that
> > based on Kconfig dependencies it's not possible to build this file
> > without CONFIG_EEH enabled.
>
> This seemed odd to me, but I think you're right:
>
> arch/powerpc/platforms/Kconfig contains:
>
> config EEH
> bool
> depends on (PPC_POWERNV || PPC_PSERIES) && PCI
> default y
>
> It's not configurable from e.g. make menuconfig because there's no prompt.
> You can attempt to explicitly disable it with e.g. `scripts/config -d EEH`
> but then something like `make oldconfig` will silently re-enable it for
> you.
>
> It's been forced on since commit e49f7a9997c6 ("powerpc/pseries: Rivet
> CONFIG_EEH for pSeries platform") in 2012 which fixed it for
> pseries. That moved out from pseries to pseries + powernv later on.
>
> There are other cleanups in the same vein that could be made, from the
> Makefile (which has files only built with CONFIG_EEH) through to other
> source files. It looks like there's one `#ifdef CONFIG_EEH` in
> arch/powerpc/platforms/powernv/pci-ioda.c that could be pulled out, for
> example.
>
> I think it's probably worth trying to rip out all of those in one patch?

The change in commit e49f7a9997c6 ("powerpc/pseries: Rivet CONFIG_EEH
for pSeries platform") never should have been made.

There's no inherent reason why EEH needs to be enabled and forcing it
on is (IMO) a large part of why EEH support is the byzantine
clusterfuck that it is. One of the things I was working towards was
allowing pseries and powernv to be built with !CONFIG_EEH since that
would help define a clearer boundary between what is "eeh support" and
what is required to support PCI on the platform. Pseries is
particularly bad for this since PAPR says the RTAS calls needed to do
a PCI bus reset are part of the EEH extension, but there's non-EEH
reasons why you might want to use those RTAS calls. The PHB reset that
we do when entering a kdump kernel is a good example since that uses
the same RTAS calls, but it has nothing to do with the EEH recovery
machinery enabled by CONFIG_EEH.

I was looking into that largely because people were considering using
OPAL for microwatt platforms. Breaking the assumption that
powernv==EEH support is one of the few bits of work required to enable
that, but even if you don't go down that road I think everyone would
be better off if you kept a degree of separation between the two.


Re: [PATCH v2] powerpc/eeh: Fix EEH handling for hugepages in ioremap space.

2021-04-12 Thread Oliver O'Halloran
On Mon, Apr 12, 2021 at 5:52 PM Mahesh Salgaonkar  wrote:
>
> During the EEH MMIO error checking, the current implementation fails to map
> the (virtual) MMIO address back to the pci device on radix with hugepage
> mappings for I/O. This results into failure to dispatch EEH event with no
> recovery even when EEH capability has been enabled on the device.
>
> eeh_check_failure(token)# token = virtual MMIO address
>   addr = eeh_token_to_phys(token);
>   edev = eeh_addr_cache_get_dev(addr);
>   if (!edev)
> return 0;
>   eeh_dev_check_failure(edev);  <= Dispatch the EEH event
>
> In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys
> translation that results in wrong physical address, which is then passed to
> eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges
> to get to a PCI device. Hence, it fails to find a match and the EEH event
> never gets dispatched leaving the device in failed state.
>
> The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space")
> introduced following logic to translate virt to phys for hugepage mappings:
>
> eeh_token_to_phys():
> +   pa = pte_pfn(*ptep);
> +
> +   /* On radix we can do hugepage mappings for io, so handle that */
> +   if (hugepage_shift) {
> +   pa <<= hugepage_shift;  <= This is wrong
> +   pa |= token & ((1ul << hugepage_shift) - 1);
> +   }

I think I vaguely remember thinking "is this right?" at the time.
Apparently not!

Reviewed-by: Oliver O'Halloran 


It would probably be a good idea to add a debugfs interface to help
with testing the address translation. Maybe something like:

echo  > /sys/kernel/debug/powerpc/eeh_addr_check

Then in the kernel:

struct resource *r = lookup_resource(mmio_addr);
void *virt = ioremap_resource(r);
ret = eeh_check_failure(virt);
iounmap(virt)

return ret;

A little tedious, but then you can write a selftest :)

Oliver


Re: [PASEMI] Nemo board doesn't boot anymore because of moving pas_pci_init

2021-02-23 Thread Oliver O'Halloran
On Wed, Feb 24, 2021 at 11:55 AM Michael Ellerman  wrote:
>
> Olof Johansson  writes:
> > Hi,
> >
> > On Tue, Feb 23, 2021 at 1:43 PM Christian Zigotzky
> >  wrote:
> >>
> >> Hello,
> >>
> >> The Nemo board [1] with a P.A. Semi PA6T SoC doesn't boot anymore
> >> because of moving "pas_pci_init" to the device tree adoption [2] in the
> >> latest PowerPC updates 5.12-1 [3].
> >>
> >> Unfortunately the Nemo board doesn't have it in its device tree. I
> >> reverted this commit and after that the Nemo board boots without any
> >> problems.
> >>
> >> What do you think about this ifdef?
> >>
> >> #ifdef CONFIG_PPC_PASEMI_NEMO
> >>  /*
> >>   * Check for the Nemo motherboard here, if we are running on one
> >>   * then pas_pci_init()
> >>   */
> >>  if (of_machine_is_compatible("pasemi,nemo")) {
> >>  pas_pci_init();
> >>  }
> >> #endif
> >
> > This is not a proper fix for the problem. Someone will need to debug
> > what on the pas_pci_init() codepath still needs to happen early in the
> > boot, even if the main PCI setup happens later.
>
> I looked but don't see anything 100% obvious.
>
> Possibly it's the call to isa_bridge_find_early()?

Looks like it. I think the problem stems from the use of the PIO
helpers (mainly outb()) in i8259_init() which is called from
nemo_init_IRQ(). The PIO helpers require the ISA space to be mapped
and io_isa_base to be set since they take a PIO register address
rather than an MMIO address. It looks like there's a few other legacy
embedded platforms that might have the same problem.

I guess the real fix would be to decouple the ISA base address
discovery from the PHB discovery. That should be doable since it's all
discovered via DT anyway and we only support one ISA address range,
but it's a bit of work.


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.12-1 tag

2021-02-22 Thread Oliver O'Halloran
On Tue, Feb 23, 2021 at 9:44 AM Linus Torvalds
 wrote:
>
> On Mon, Feb 22, 2021 at 4:06 AM Michael Ellerman  wrote:
> >
> > Please pull powerpc updates for 5.12.
>
> Pulled. However:
>
> >  mode change 100755 => 100644 
> > tools/testing/selftests/powerpc/eeh/eeh-functions.sh
> >  create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-aware.sh
> >  create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-unaware.sh
>
> Somebody is being confused.
>
> Why create two new shell scripts with the proper executable bit, and
> then remove the executable bit from an existing one?
>
> That just seems very inconsistent.

eeh-function.sh just provides some helper functions for the other
scripts and doesn't do anything when executed directly. I thought
making it non-executable made sense.

>
>  Linus


Re: [PATCH] arch:powerpc simple_write_to_buffer return check

2021-02-04 Thread Oliver O'Halloran
On Fri, Feb 5, 2021 at 5:17 AM Mayank Suman  wrote:
>
> Signed-off-by: Mayank Suman 

commit messages aren't optional

> ---
>  arch/powerpc/kernel/eeh.c| 8 
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 813713c9120c..2dbe1558a71f 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1628,8 +1628,8 @@ static ssize_t eeh_force_recover_write(struct file 
> *filp,
> char buf[20];
> int ret;
>
> -   ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count);
> -   if (!ret)
> +   ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, 
> count);

We should probably be zeroing the buffer. Reading to sizeof(buf) - 1
is done in a few places to guarantee that the string is nul
terminated, but without the preceeding memset() that isn't actually
guaranteed.

> +   if (ret <= 0)
> return -EFAULT;

EFAULT is supposed to be returned when the user supplies a buffer to
write(2) which is outside their address space. I figured letting the
sscanf() in the next step fail if the user passes writes a zero-length
buffer and returning EINVAL made more sense. That said, the exact
semantics around zero length writes are pretty handwavy so I guess
this isn't wrong, but I don't think it's better either.

> /*
> @@ -1696,7 +1696,7 @@ static ssize_t eeh_dev_check_write(struct file *filp,
>
> memset(buf, 0, sizeof(buf));
> ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, 
> count);
> -   if (!ret)
> +   if (ret <= 0)
> return -EFAULT;
>
> ret = sscanf(buf, "%x:%x:%x.%x", , , , );
> @@ -1836,7 +1836,7 @@ static ssize_t eeh_dev_break_write(struct file *filp,
>
> memset(buf, 0, sizeof(buf));
> ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, 
> count);
> -   if (!ret)
> +   if (ret <= 0)
> return -EFAULT;
>
> ret = sscanf(buf, "%x:%x:%x.%x", , , , );
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 89e22c460ebf..36ed2b8f7375 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -76,8 +76,8 @@ static ssize_t pnv_eeh_ei_write(struct file *filp,
> return -ENXIO;
>
> /* Copy over argument buffer */
> -   ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count);
> -   if (!ret)
> +   ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, 
> count);
> +   if (ret <= 0)
> return -EFAULT;
>
> /* Retrieve parameters */
> --
> 2.30.0
>


Re: [PATCH] powerpc/eeh: remove unneeded semicolon

2021-02-01 Thread Oliver O'Halloran
On Tue, Feb 2, 2021 at 2:21 PM Yang Li  wrote:
>
> Eliminate the following coccicheck warning:
> ./arch/powerpc/kernel/eeh.c:782:2-3: Unneeded semicolon

Doesn't appear to be a load-bearing semicolon. It's hard to tell with EEH.

Reviewed-by: Oliver O'Halloran 

>
> Reported-by: Abaci Robot 
> Signed-off-by: Yang Li 
> ---
>  arch/powerpc/kernel/eeh.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 813713c..02c058d 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -779,7 +779,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, 
> enum pcie_reset_state stat
> default:
> eeh_pe_state_clear(pe, EEH_PE_ISOLATED | EEH_PE_CFG_BLOCKED, 
> true);
> return -EINVAL;
> -   };
> +   }
>
> return 0;
>  }
> --
> 1.8.3.1
>


Re: [PATCH 3/3] selftests/powerpc: Add VF recovery tests

2020-11-24 Thread Oliver O'Halloran
On Tue, Nov 24, 2020 at 9:14 PM Frederic Barrat  wrote:
>
> Is it possible to run those tests on pseries? I haven't managed to set
> up a LPAR with a physical function which would let me enable a virtual
> function. All I could do is assign a virtual function to a LPAR. When
> assigning a physical function to the LPAR, enabling a virtual function
> fails because of missing properties in the device tree, so it looks like
> the hypervisor doesn't support it (?).
>
> Same story on qemu.
>
>Fred

IIRC having the guest manage SR-IOV was a half-baked feature that
never made it into a production phyp build. I never managed to get any
real documentation from the phyp folks about how it worked either. As
far as I can tell it's pretty similar to what we do on PowerNV with
the PE configuration being handled by h-call rather than OPAL call.
The main difference would be in how EEH freezes are handled and I know
there's *something* going on there, but I never really understood it
due to the lack of documentation.

I've been tempted to rip out all that crap a few times, but never
really got around to it. There was also some noises about implementing
support for guest managed SRIOV in pseries qemu, but I'm not sure what
ever happened to that.

Oliver


[PATCH 2/2] powerpc/eeh: Add a debugfs interface to check if a driver supports recovery

2020-11-02 Thread Oliver O'Halloran
If a PCI device's current driver implements the error handling callbacks
EEH can use them to recover the device after an error occurs. For devices
without the error handling callbacks we recover them by removing the device
and re-scanning it so the PCI core puts the device back into a known good
state.

Currently there's no way for userspace to determine if the driver supports
recovery or not which makes it difficult to write automated tests for EEH.
This patch addressing that by adding a debugfs interface for querying if
a specific device can be recovered or not.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/eeh.c | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f9182ff57804..cd60bc1c8701 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1868,6 +1868,53 @@ static const struct file_operations eeh_dev_break_fops = 
{
.read   = eeh_debugfs_dev_usage,
 };
 
+static ssize_t eeh_dev_can_recover(struct file *filp,
+  const char __user *user_buf,
+  size_t count, loff_t *ppos)
+{
+   struct pci_driver *drv;
+   struct pci_dev *pdev;
+   size_t ret;
+
+   pdev = eeh_debug_lookup_pdev(filp, user_buf, count, ppos);
+   if (IS_ERR(pdev))
+   return PTR_ERR(pdev);
+
+   /*
+* In order for error recovery to work the driver needs to implement
+* .error_detected(), so it can quiesce IO to the device, and
+* .slot_reset() so it can re-initialise the device after a reset.
+*
+* Ideally they'd implement .resume() too, but some drivers which
+* we need to support (notably IPR) don't so I guess we can tolerate
+* that.
+*
+* .mmio_enabled() is mostly there as a work-around for devices which
+* take forever to re-init after a hot reset. Implementing that is
+* strictly optional.
+*/
+   drv = pci_dev_driver(pdev);
+   if (drv &&
+   drv->err_handler &&
+   drv->err_handler->error_detected &&
+   drv->err_handler->slot_reset) {
+   ret = count;
+   } else {
+   ret = -EOPNOTSUPP;
+   }
+
+   pci_dev_put(pdev);
+
+   return ret;
+}
+
+static const struct file_operations eeh_dev_can_recover_fops = {
+   .open   = simple_open,
+   .llseek = no_llseek,
+   .write  = eeh_dev_can_recover,
+   .read   = eeh_debugfs_dev_usage,
+};
+
 #endif
 
 static int __init eeh_init_proc(void)
@@ -1892,6 +1939,9 @@ static int __init eeh_init_proc(void)
debugfs_create_file_unsafe("eeh_force_recover", 0600,
powerpc_debugfs_root, NULL,
_force_recover_fops);
+   debugfs_create_file_unsafe("eeh_dev_can_recover", 0600,
+   powerpc_debugfs_root, NULL,
+   _dev_can_recover_fops);
eeh_cache_debugfs_init();
 #endif
}
-- 
2.26.2



[PATCH 1/2] powerpc/eeh: Rework pci_dev lookup in debugfs attributes

2020-11-02 Thread Oliver O'Halloran
Pull the string -> pci_dev lookup stuff into a helper function. No functional 
change.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/eeh.c | 71 ---
 1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 813713c9120c..f9182ff57804 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1596,6 +1596,35 @@ static int proc_eeh_show(struct seq_file *m, void *v)
 }
 
 #ifdef CONFIG_DEBUG_FS
+
+
+static struct pci_dev *eeh_debug_lookup_pdev(struct file *filp,
+const char __user *user_buf,
+size_t count, loff_t *ppos)
+{
+   uint32_t domain, bus, dev, fn;
+   struct pci_dev *pdev;
+   char buf[20];
+   int ret;
+
+   memset(buf, 0, sizeof(buf));
+   ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, count);
+   if (!ret)
+   return ERR_PTR(-EFAULT);
+
+   ret = sscanf(buf, "%x:%x:%x.%x", , , , );
+   if (ret != 4) {
+   pr_err("%s: expected 4 args, got %d\n", __func__, ret);
+   return ERR_PTR(-EINVAL);
+   }
+
+   pdev = pci_get_domain_bus_and_slot(domain, bus, (dev << 3) | fn);
+   if (!pdev)
+   return ERR_PTR(-ENODEV);
+
+   return pdev;
+}
+
 static int eeh_enable_dbgfs_set(void *data, u64 val)
 {
if (val)
@@ -1688,26 +1717,13 @@ static ssize_t eeh_dev_check_write(struct file *filp,
const char __user *user_buf,
size_t count, loff_t *ppos)
 {
-   uint32_t domain, bus, dev, fn;
struct pci_dev *pdev;
struct eeh_dev *edev;
-   char buf[20];
int ret;
 
-   memset(buf, 0, sizeof(buf));
-   ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, count);
-   if (!ret)
-   return -EFAULT;
-
-   ret = sscanf(buf, "%x:%x:%x.%x", , , , );
-   if (ret != 4) {
-   pr_err("%s: expected 4 args, got %d\n", __func__, ret);
-   return -EINVAL;
-   }
-
-   pdev = pci_get_domain_bus_and_slot(domain, bus, (dev << 3) | fn);
-   if (!pdev)
-   return -ENODEV;
+   pdev = eeh_debug_lookup_pdev(filp, user_buf, count, ppos);
+   if (IS_ERR(pdev))
+   return PTR_ERR(pdev);
 
edev = pci_dev_to_eeh_dev(pdev);
if (!edev) {
@@ -1717,8 +1733,8 @@ static ssize_t eeh_dev_check_write(struct file *filp,
}
 
ret = eeh_dev_check_failure(edev);
-   pci_info(pdev, "eeh_dev_check_failure(%04x:%02x:%02x.%01x) = %d\n",
-   domain, bus, dev, fn, ret);
+   pci_info(pdev, "eeh_dev_check_failure(%s) = %d\n",
+   pci_name(pdev), ret);
 
pci_dev_put(pdev);
 
@@ -1829,25 +1845,12 @@ static ssize_t eeh_dev_break_write(struct file *filp,
const char __user *user_buf,
size_t count, loff_t *ppos)
 {
-   uint32_t domain, bus, dev, fn;
struct pci_dev *pdev;
-   char buf[20];
int ret;
 
-   memset(buf, 0, sizeof(buf));
-   ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, count);
-   if (!ret)
-   return -EFAULT;
-
-   ret = sscanf(buf, "%x:%x:%x.%x", , , , );
-   if (ret != 4) {
-   pr_err("%s: expected 4 args, got %d\n", __func__, ret);
-   return -EINVAL;
-   }
-
-   pdev = pci_get_domain_bus_and_slot(domain, bus, (dev << 3) | fn);
-   if (!pdev)
-   return -ENODEV;
+   pdev = eeh_debug_lookup_pdev(filp, user_buf, count, ppos);
+   if (IS_ERR(pdev))
+   return PTR_ERR(pdev);
 
ret = eeh_debugfs_break_device(pdev);
pci_dev_put(pdev);
-- 
2.26.2



[PATCH 3/3] selftests/powerpc: Add VF recovery tests

2020-11-02 Thread Oliver O'Halloran
The basic EEH test ignores VFs since we the way the eeh_dev_break debugfs
interface works means that if multiple VFs are enabled we may cause errors
on all them them. However, we can work around that by only enabling a
single VF at a time.

This patch adds some infrastructure for finding SR-IOV capable devices and
enabling / disabling VFs so we can exercise the VF specific EEH recovery
paths. Two new tests are added, one for testing EEH aware devices and one
for EEH un-aware VFs.

Signed-off-by: Oliver O'Halloran 
---
 .../selftests/powerpc/eeh/eeh-functions.sh| 108 ++
 .../selftests/powerpc/eeh/eeh-vf-aware.sh |  45 
 .../selftests/powerpc/eeh/eeh-vf-unaware.sh   |  35 ++
 3 files changed, 188 insertions(+)
 create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-aware.sh
 create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-unaware.sh

diff --git a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh 
b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
index 32e5b7fbf18a..70daa3925dcb 100644
--- a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
@@ -135,3 +135,111 @@ eeh_one_dev() {
return 0;
 }
 
+eeh_has_driver() {
+   test -e /sys/bus/pci/devices/$1/driver;
+   return $?
+}
+
+eeh_can_recover() {
+   # we'll get an IO error if the device's current driver doesn't support
+   # error recovery
+   echo $1 > '/sys/kernel/debug/powerpc/eeh_dev_can_recover' 2>/dev/null
+
+   return $?
+}
+
+eeh_find_all_pfs() {
+   devices=""
+
+   # SR-IOV on pseries requires hypervisor support, so check for that
+   is_pseries=""
+   if grep -q pSeries /proc/cpuinfo ; then
+   if [ ! -f /proc/device-tree/rtas/ibm,open-sriov-allow-unfreeze 
] ||
+  [ ! -f /proc/device-tree/rtas/ibm,open-sriov-map-pe-number ] 
; then
+   return 1;
+   fi
+
+   is_pseries="true"
+   fi
+
+   for dev in `ls -1 /sys/bus/pci/devices/` ; do
+   sysfs="/sys/bus/pci/devices/$dev"
+   if [ ! -e "$sysfs/sriov_numvfs" ] ; then
+   continue
+   fi
+
+   # skip unsupported PFs on pseries
+   if [ -z "$is_pseries" ] &&
+  [ ! -f "$sysfs/of_node/ibm,is-open-sriov-pf" ] &&
+  [ ! -f "$sysfs/of_node/ibm,open-sriov-vf-bar-info" ] ; then
+   continue;
+   fi
+
+   # no driver, no vfs
+   if ! eeh_has_driver $dev ; then
+   continue
+   fi
+
+   devices="$devices $dev"
+   done
+
+   if [ -z "$devices" ] ; then
+   return 1;
+   fi
+
+   echo $devices
+   return 0;
+}
+
+# attempts to enable one VF on each PF so we can do VF specific tests.
+# stdout: list of enabled VFs, one per line
+# return code: 0 if vfs are found, 1 otherwise
+eeh_enable_vfs() {
+   pf_list="$(eeh_find_all_pfs)"
+
+   vfs=0
+   for dev in $pf_list ; do
+   pf_sysfs="/sys/bus/pci/devices/$dev"
+
+   # make sure we have a single VF
+   echo 0 > "$pf_sysfs/sriov_numvfs"
+   echo 1 > "$pf_sysfs/sriov_numvfs"
+   if [ "$?" != 0 ] ; then
+   log "Unable to enable VFs on $pf, skipping"
+   continue;
+   fi
+
+   vf="$(basename $(realpath "$pf_sysfs/virtfn0"))"
+   if [ $? != 0 ] ; then
+   log "unable to find enabled vf on $pf"
+   echo 0 > "$pf_sysfs/sriov_numvfs"
+   continue;
+   fi
+
+   if ! eeh_can_break $vf ; then
+   log "skipping "
+
+   echo 0 > "$pf_sysfs/sriov_numvfs"
+   continue;
+   fi
+
+   vfs="$((vfs + 1))"
+   echo $vf
+   done
+
+   test "$vfs" != 0
+   return $?
+}
+
+eeh_disable_vfs() {
+   pf_list="$(eeh_find_all_pfs)"
+   if [ -z "$pf_list" ] ; then
+   return 1;
+   fi
+
+   for dev in $pf_list ; do
+   echo 0 > "/sys/bus/pci/devices/$dev/sriov_numvfs"
+   done
+
+   return 0;
+}
diff --git a/tools/testing/selftests/powerpc/eeh/eeh-vf-aware.sh 
b/tools/testing/selftests/powerpc/eeh/eeh-vf-aware.sh
new file mode 100755
index ..874c11953bb6
--- /dev/null
+++ b/tools/testing/selftests/powerpc/eeh/eeh-vf-aware.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-only
+

[PATCH 2/3] selftests/powerpc: Use stderr for debug messages in eeh-functions

2020-11-02 Thread Oliver O'Halloran
We want to use stdout to return lists of devices, etc so log debug / status
messages to stderr rather than stdout.

Signed-off-by: Oliver O'Halloran 
---
 .../selftests/powerpc/eeh/eeh-functions.sh| 20 +++
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh 
b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
index 9b1bcc1fd4ad..32e5b7fbf18a 100644
--- a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
@@ -3,6 +3,10 @@
 
 export KSELFTESTS_SKIP=4
 
+log() {
+   echo >/dev/stderr $*
+}
+
 pe_ok() {
local dev="$1"
local path="/sys/bus/pci/devices/$dev/eeh_pe_state"
@@ -49,7 +53,7 @@ eeh_test_prep() {
 
if [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_check" ] && \
   [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_break" ] ; then
-   echo "debugfs EEH testing files are missing. Is debugfs 
mounted?"
+   log "debugfs EEH testing files are missing. Is debugfs mounted?"
exit $KSELFTESTS_SKIP;
fi
 
@@ -61,7 +65,7 @@ eeh_test_prep() {
 eeh_can_break() {
# skip bridges since we can't recover them (yet...)
if [ -e "/sys/bus/pci/devices/$dev/pci_bus" ] ; then
-   echo "$dev, Skipped: bridge"
+   log "$dev, Skipped: bridge"
return 1;
fi
 
@@ -70,7 +74,7 @@ eeh_can_break() {
# it the system will generally go down. We should probably fix that
# at some point
if [ "ahci" = "$(basename $(realpath 
/sys/bus/pci/devices/$dev/driver))" ] ; then
-   echo "$dev, Skipped: ahci doesn't support recovery"
+   log "$dev, Skipped: ahci doesn't support recovery"
return 1;
fi
 
@@ -80,7 +84,7 @@ eeh_can_break() {
# result in the recovery failing and the device being marked as
# failed.
if ! pe_ok $dev ; then
-   echo "$dev, Skipped: Bad initial PE state"
+   log "$dev, Skipped: Bad initial PE state"
return 1;
fi
 
@@ -94,7 +98,7 @@ eeh_one_dev() {
# testing so check that the argument is a well-formed sysfs device
# name.
if ! test -e /sys/bus/pci/devices/$dev/ ; then
-   echo "Error: '$dev' must be a sysfs device name (:BB:DD.F)"
+   log "Error: '$dev' must be a sysfs device name (:BB:DD.F)"
return 1;
fi
 
@@ -118,16 +122,16 @@ eeh_one_dev() {
if pe_ok $dev ; then
break;
fi
-   echo "$dev, waited $i/${max_wait}"
+   log "$dev, waited $i/${max_wait}"
sleep 1
done
 
if ! pe_ok $dev ; then
-   echo "$dev, Failed to recover!"
+   log "$dev, Failed to recover!"
return 1;
fi
 
-   echo "$dev, Recovered after $i seconds"
+   log "$dev, Recovered after $i seconds"
return 0;
 }
 
-- 
2.26.2



[PATCH 1/3] selftests/powerpc: Hoist helper code out of eeh-basic

2020-11-02 Thread Oliver O'Halloran
Hoist some of the useful test environment checking and prep code into
eeh-functions.sh so they can be reused in other tests.

Signed-off-by: Oliver O'Halloran 
---
 .../selftests/powerpc/eeh/eeh-basic.sh| 39 ++-
 .../selftests/powerpc/eeh/eeh-functions.sh| 48 +++
 2 files changed, 51 insertions(+), 36 deletions(-)
 mode change 100755 => 100644 
tools/testing/selftests/powerpc/eeh/eeh-functions.sh

diff --git a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh 
b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
index 0d783e1065c8..16d00555f13e 100755
--- a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
@@ -1,28 +1,13 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0-only
 
-KSELFTESTS_SKIP=4
-
 . ./eeh-functions.sh
 
-if ! eeh_supported ; then
-   echo "EEH not supported on this system, skipping"
-   exit $KSELFTESTS_SKIP;
-fi
-
-if [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_check" ] && \
-   [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_break" ] ; then
-   echo "debugfs EEH testing files are missing. Is debugfs mounted?"
-   exit $KSELFTESTS_SKIP;
-fi
+eeh_test_prep # NB: may exit
 
 pre_lspci=`mktemp`
 lspci > $pre_lspci
 
-# Bump the max freeze count to something absurd so we don't
-# trip over it while breaking things.
-echo 5000 > /sys/kernel/debug/powerpc/eeh_max_freezes
-
 # record the devices that we break in here. Assuming everything
 # goes to plan we should get them back once the recover process
 # is finished.
@@ -30,34 +15,16 @@ devices=""
 
 # Build up a list of candidate devices.
 for dev in `ls -1 /sys/bus/pci/devices/ | grep '\.0$'` ; do
-   # skip bridges since we can't recover them (yet...)
-   if [ -e "/sys/bus/pci/devices/$dev/pci_bus" ] ; then
-   echo "$dev, Skipped: bridge"
+   if ! eeh_can_break $dev ; then
continue;
fi
 
-   # Skip VFs for now since we don't have a reliable way
-   # to break them.
+   # Skip VFs for now since we don't have a reliable way to break them.
if [ -e "/sys/bus/pci/devices/$dev/physfn" ] ; then
echo "$dev, Skipped: virtfn"
continue;
fi
 
-   if [ "ahci" = "$(basename $(realpath 
/sys/bus/pci/devices/$dev/driver))" ] ; then
-   echo "$dev, Skipped: ahci doesn't support recovery"
-   continue
-   fi
-
-   # Don't inject errosr into an already-frozen PE. This happens with
-   # PEs that contain multiple PCI devices (e.g. multi-function cards)
-   # and injecting new errors during the recovery process will probably
-   # result in the recovery failing and the device being marked as
-   # failed.
-   if ! pe_ok $dev ; then
-   echo "$dev, Skipped: Bad initial PE state"
-   continue;
-   fi
-
echo "$dev, Added"
 
# Add to this list of device to check
diff --git a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh 
b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
old mode 100755
new mode 100644
index 00dc32c0ed75..9b1bcc1fd4ad
--- a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0-only
 
+export KSELFTESTS_SKIP=4
+
 pe_ok() {
local dev="$1"
local path="/sys/bus/pci/devices/$dev/eeh_pe_state"
@@ -39,6 +41,52 @@ eeh_supported() {
grep -q 'EEH Subsystem is enabled' /proc/powerpc/eeh
 }
 
+eeh_test_prep() {
+   if ! eeh_supported ; then
+   echo "EEH not supported on this system, skipping"
+   exit $KSELFTESTS_SKIP;
+   fi
+
+   if [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_check" ] && \
+  [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_break" ] ; then
+   echo "debugfs EEH testing files are missing. Is debugfs 
mounted?"
+   exit $KSELFTESTS_SKIP;
+   fi
+
+   # Bump the max freeze count to something absurd so we don't
+   # trip over it while breaking things.
+   echo 5000 > /sys/kernel/debug/powerpc/eeh_max_freezes
+}
+
+eeh_can_break() {
+   # skip bridges since we can't recover them (yet...)
+   if [ -e "/sys/bus/pci/devices/$dev/pci_bus" ] ; then
+   echo "$dev, Skipped: bridge"
+   return 1;
+   fi
+
+   # The ahci driver doesn't support error recovery. If the ahci device
+   # happens to be hosting the root filesystem, and then we go and break
+   # it the system will generally go down. We should probably fix that
+   # at some point
+   if [ "ahci" = "$(basename $(realpath 
/sys/bus/pci/devices/$d

Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed

2020-11-02 Thread Oliver O'Halloran
On Tue, Nov 3, 2020 at 1:39 AM Cédric Le Goater  wrote:
>
> On 10/14/20 4:55 AM, Alexey Kardashevskiy wrote:
> >
> > How do you remove PHBs exactly? There is no such thing in the powernv 
> > platform, I thought someone added this and you are fixing it but no. PHBs 
> > on powernv are created at the boot time and there is no way to remove them, 
> > you can only try removing all the bridges.
>
> yes. I noticed that later when proposing the fix for the double
> free.
>
> > So what exactly are you doing?
>
> What you just said above, with the commands :
>
>   echo 1 >  /sys/devices/pci0031\:00/0031\:00\:00.0/remove
>   echo 1 >  /sys/devices/pci0031\:00/pci_bus/0031\:00/rescan

Right, so that'll remove the root port device (and Bus 01 beneath it),
but the PHB itself is still there. If it was removed the root bus
would also disappear.


[PATCH 18/18] powerpc/powermac: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with pmac32_defconfig and g5_defconfig
---
 arch/powerpc/platforms/powermac/setup.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/setup.c 
b/arch/powerpc/platforms/powermac/setup.c
index 2e2cc0c75d87..86aee3f2483f 100644
--- a/arch/powerpc/platforms/powermac/setup.c
+++ b/arch/powerpc/platforms/powermac/setup.c
@@ -298,9 +298,6 @@ static void __init pmac_setup_arch(void)
of_node_put(ic);
}
 
-   /* Lookup PCI hosts */
-   pmac_pci_init();
-
 #ifdef CONFIG_PPC32
ohare_init();
l2cr_init();
@@ -600,6 +597,7 @@ define_machine(powermac) {
.name   = "PowerMac",
.probe  = pmac_probe,
.setup_arch = pmac_setup_arch,
+   .discover_phbs  = pmac_pci_init,
.show_cpuinfo   = pmac_show_cpuinfo,
.init_IRQ   = pmac_pic_init,
.get_irq= NULL, /* changed later */
-- 
2.26.2



[PATCH 17/18] powerpc/pasemi: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with pasemi_defconfig
---
 arch/powerpc/platforms/pasemi/setup.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pasemi/setup.c 
b/arch/powerpc/platforms/pasemi/setup.c
index b612474f8f8e..376797eb7894 100644
--- a/arch/powerpc/platforms/pasemi/setup.c
+++ b/arch/powerpc/platforms/pasemi/setup.c
@@ -144,8 +144,6 @@ static void __init pas_setup_arch(void)
/* Setup SMP callback */
smp_ops = _smp_ops;
 #endif
-   /* Lookup PCI hosts */
-   pas_pci_init();
 
/* Remap SDC register for doing reset */
/* XXXOJN This should maybe come out of the device tree */
@@ -446,6 +444,7 @@ define_machine(pasemi) {
.name   = "PA Semi PWRficient",
.probe  = pas_probe,
.setup_arch = pas_setup_arch,
+   .discover_phbs  = pas_pci_init,
.init_IRQ   = pas_init_IRQ,
.get_irq= mpic_get_irq,
.restart= pas_restart,
-- 
2.26.2



[PATCH 16/18] powerpc/embedded6xx/mve5100: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with mvme5100_defconfig
---
 arch/powerpc/platforms/embedded6xx/mvme5100.c   | 13 -
 arch/powerpc/platforms/embedded6xx/storcenter.c |  8 ++--
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/mvme5100.c 
b/arch/powerpc/platforms/embedded6xx/mvme5100.c
index 1cd488daa0bf..c06a0490d157 100644
--- a/arch/powerpc/platforms/embedded6xx/mvme5100.c
+++ b/arch/powerpc/platforms/embedded6xx/mvme5100.c
@@ -154,17 +154,19 @@ static const struct of_device_id mvme5100_of_bus_ids[] 
__initconst = {
  */
 static void __init mvme5100_setup_arch(void)
 {
-   struct device_node *np;
-
if (ppc_md.progress)
ppc_md.progress("mvme5100_setup_arch()", 0);
 
-   for_each_compatible_node(np, "pci", "hawk-pci")
-   mvme5100_add_bridge(np);
-
restart = ioremap(BOARD_MODRST_REG, 4);
 }
 
+static void __init mvme5100_setup_pci(void)
+{
+   struct device_node *np;
+
+   for_each_compatible_node(np, "pci", "hawk-pci")
+   mvme5100_add_bridge(np);
+}
 
 static void mvme5100_show_cpuinfo(struct seq_file *m)
 {
@@ -205,6 +207,7 @@ define_machine(mvme5100) {
.name   = "MVME5100",
.probe  = mvme5100_probe,
.setup_arch = mvme5100_setup_arch,
+   .discover_phbs  = mvme5100_setup_pci,
.init_IRQ   = mvme5100_pic_init,
.show_cpuinfo   = mvme5100_show_cpuinfo,
.get_irq= mpic_get_irq,
diff --git a/arch/powerpc/platforms/embedded6xx/storcenter.c 
b/arch/powerpc/platforms/embedded6xx/storcenter.c
index e346ddcef45e..e188b90f7016 100644
--- a/arch/powerpc/platforms/embedded6xx/storcenter.c
+++ b/arch/powerpc/platforms/embedded6xx/storcenter.c
@@ -65,14 +65,17 @@ static int __init storcenter_add_bridge(struct device_node 
*dev)
 }
 
 static void __init storcenter_setup_arch(void)
+{
+   printk(KERN_INFO "IOMEGA StorCenter\n");
+}
+
+static void __init storcenter_setup_pci(void)
 {
struct device_node *np;
 
/* Lookup PCI host bridges */
for_each_compatible_node(np, "pci", "mpc10x-pci")
storcenter_add_bridge(np);
-
-   printk(KERN_INFO "IOMEGA StorCenter\n");
 }
 
 /*
@@ -117,6 +120,7 @@ define_machine(storcenter){
.name   = "IOMEGA StorCenter",
.probe  = storcenter_probe,
.setup_arch = storcenter_setup_arch,
+   .discover_phbs  = storcenter_setup_pci,
.init_IRQ   = storcenter_init_IRQ,
.get_irq= mpic_get_irq,
.restart= storcenter_restart,
-- 
2.26.2



[PATCH 15/18] powerpc/embedded6xx/mpc7448: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with mpc7448_hpc2_defconfig
---
 arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c 
b/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
index b95c3380d2b5..5565647dc879 100644
--- a/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
+++ b/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
@@ -58,16 +58,14 @@ int mpc7448_hpc2_exclude_device(struct pci_controller *hose,
return PCIBIOS_SUCCESSFUL;
 }
 
-static void __init mpc7448_hpc2_setup_arch(void)
+static void __init mpc7448_hpc2_setup_pci(void)
 {
+#ifdef CONFIG_PCI
struct device_node *np;
if (ppc_md.progress)
-   ppc_md.progress("mpc7448_hpc2_setup_arch():set_bridge", 0);
-
-   tsi108_csr_vir_base = get_vir_csrbase();
+   ppc_md.progress("mpc7448_hpc2_setup_pci():set_bridge", 0);
 
/* setup PCI host bridge */
-#ifdef CONFIG_PCI
for_each_compatible_node(np, "pci", "tsi108-pci")
tsi108_setup_pci(np, MPC7448HPC2_PCI_CFG_PHYS, 0);
 
@@ -75,6 +73,11 @@ static void __init mpc7448_hpc2_setup_arch(void)
if (ppc_md.progress)
ppc_md.progress("tsi108: resources set", 0x100);
 #endif
+}
+
+static void __init mpc7448_hpc2_setup_arch(void)
+{
+   tsi108_csr_vir_base = get_vir_csrbase();
 
printk(KERN_INFO "MPC7448HPC2 (TAIGA) Platform\n");
printk(KERN_INFO
@@ -181,6 +184,7 @@ define_machine(mpc7448_hpc2){
.name   = "MPC7448 HPC2",
.probe  = mpc7448_hpc2_probe,
.setup_arch = mpc7448_hpc2_setup_arch,
+   .discover_phbs  = mpc7448_hpc2_setup_pci,
.init_IRQ   = mpc7448_hpc2_init_IRQ,
.show_cpuinfo   = mpc7448_hpc2_show_cpuinfo,
.get_irq= mpic_get_irq,
-- 
2.26.2



[PATCH 14/18] powerpc/embedded6xx/linkstation: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with linkstation_defconfig
---
 arch/powerpc/platforms/embedded6xx/linkstation.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/linkstation.c 
b/arch/powerpc/platforms/embedded6xx/linkstation.c
index f514d5d28cd4..eb8342e7f84e 100644
--- a/arch/powerpc/platforms/embedded6xx/linkstation.c
+++ b/arch/powerpc/platforms/embedded6xx/linkstation.c
@@ -63,15 +63,18 @@ static int __init linkstation_add_bridge(struct device_node 
*dev)
 }
 
 static void __init linkstation_setup_arch(void)
+{
+   printk(KERN_INFO "BUFFALO Network Attached Storage Series\n");
+   printk(KERN_INFO "(C) 2002-2005 BUFFALO INC.\n");
+}
+
+static void __init linkstation_setup_pci(void)
 {
struct device_node *np;
 
/* Lookup PCI host bridges */
for_each_compatible_node(np, "pci", "mpc10x-pci")
linkstation_add_bridge(np);
-
-   printk(KERN_INFO "BUFFALO Network Attached Storage Series\n");
-   printk(KERN_INFO "(C) 2002-2005 BUFFALO INC.\n");
 }
 
 /*
@@ -153,6 +156,7 @@ define_machine(linkstation){
.name   = "Buffalo Linkstation",
.probe  = linkstation_probe,
.setup_arch = linkstation_setup_arch,
+   .discover_phbs  = linkstation_setup_pci,
.init_IRQ   = linkstation_init_IRQ,
.show_cpuinfo   = linkstation_show_cpuinfo,
.get_irq= mpic_get_irq,
-- 
2.26.2



[PATCH 13/18] powerpc/embedded6xx/holly: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with holly_defconfig
---
 arch/powerpc/platforms/embedded6xx/holly.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/holly.c 
b/arch/powerpc/platforms/embedded6xx/holly.c
index d8f2e2c737bb..53065d564161 100644
--- a/arch/powerpc/platforms/embedded6xx/holly.c
+++ b/arch/powerpc/platforms/embedded6xx/holly.c
@@ -108,15 +108,13 @@ static void holly_remap_bridge(void)
tsi108_write_reg(TSI108_PCI_P2O_BAR2, 0x0);
 }
 
-static void __init holly_setup_arch(void)
+static void __init holly_init_pci(void)
 {
struct device_node *np;
 
if (ppc_md.progress)
ppc_md.progress("holly_setup_arch():set_bridge", 0);
 
-   tsi108_csr_vir_base = get_vir_csrbase();
-
/* setup PCI host bridge */
holly_remap_bridge();
 
@@ -127,6 +125,11 @@ static void __init holly_setup_arch(void)
ppc_md.pci_exclude_device = holly_exclude_device;
if (ppc_md.progress)
ppc_md.progress("tsi108: resources set", 0x100);
+}
+
+static void __init holly_setup_arch(void)
+{
+   tsi108_csr_vir_base = get_vir_csrbase();
 
printk(KERN_INFO "PPC750GX/CL Platform\n");
 }
@@ -259,6 +262,7 @@ define_machine(holly){
.name   = "PPC750 GX/CL TSI",
.probe  = holly_probe,
.setup_arch = holly_setup_arch,
+   .discover_phbs  = holly_init_pci,
.init_IRQ   = holly_init_IRQ,
.show_cpuinfo   = holly_show_cpuinfo,
.get_irq= mpic_get_irq,
-- 
2.26.2



[PATCH 12/18] powerpc/chrp: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with chrp32_defconfig
---
 arch/powerpc/platforms/chrp/pci.c   |  8 
 arch/powerpc/platforms/chrp/setup.c | 12 +---
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/chrp/pci.c 
b/arch/powerpc/platforms/chrp/pci.c
index b2c2bf35b76c..8c421dc78b28 100644
--- a/arch/powerpc/platforms/chrp/pci.c
+++ b/arch/powerpc/platforms/chrp/pci.c
@@ -314,6 +314,14 @@ chrp_find_bridges(void)
}
}
of_node_put(root);
+
+   /*
+*  "Temporary" fixes for PCI devices.
+*  -- Geert
+*/
+   hydra_init();   /* Mac I/O */
+
+   pci_create_OF_bus_map();
 }
 
 /* SL82C105 IDE Control/Status Register */
diff --git a/arch/powerpc/platforms/chrp/setup.c 
b/arch/powerpc/platforms/chrp/setup.c
index c45435aa5e36..3cfc382841e5 100644
--- a/arch/powerpc/platforms/chrp/setup.c
+++ b/arch/powerpc/platforms/chrp/setup.c
@@ -334,22 +334,11 @@ static void __init chrp_setup_arch(void)
/* On pegasos, enable the L2 cache if not already done by OF */
pegasos_set_l2cr();
 
-   /* Lookup PCI host bridges */
-   chrp_find_bridges();
-
-   /*
-*  Temporary fixes for PCI devices.
-*  -- Geert
-*/
-   hydra_init();   /* Mac I/O */
-
/*
 *  Fix the Super I/O configuration
 */
sio_init();
 
-   pci_create_OF_bus_map();
-
/*
 * Print the banner, then scroll down so boot progress
 * can be printed.  -- Cort
@@ -582,6 +571,7 @@ define_machine(chrp) {
.name   = "CHRP",
.probe  = chrp_probe,
.setup_arch = chrp_setup_arch,
+   .discover_phbs  = chrp_find_bridges,
.init   = chrp_init2,
.show_cpuinfo   = chrp_show_cpuinfo,
.init_IRQ   = chrp_init_IRQ,
-- 
2.26.2



[PATCH 11/18] powerpc/amigaone: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with amigaone_defconfig
---
 arch/powerpc/platforms/amigaone/setup.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/amigaone/setup.c 
b/arch/powerpc/platforms/amigaone/setup.c
index f5d0bf999759..b25ddf39dd43 100644
--- a/arch/powerpc/platforms/amigaone/setup.c
+++ b/arch/powerpc/platforms/amigaone/setup.c
@@ -65,6 +65,12 @@ static int __init amigaone_add_bridge(struct device_node 
*dev)
 }
 
 void __init amigaone_setup_arch(void)
+{
+   if (ppc_md.progress)
+   ppc_md.progress("Linux/PPC "UTS_RELEASE"\n", 0);
+}
+
+void __init amigaone_discover_phbs(void)
 {
struct device_node *np;
int phb = -ENODEV;
@@ -74,9 +80,6 @@ void __init amigaone_setup_arch(void)
phb = amigaone_add_bridge(np);
 
BUG_ON(phb != 0);
-
-   if (ppc_md.progress)
-   ppc_md.progress("Linux/PPC "UTS_RELEASE"\n", 0);
 }
 
 void __init amigaone_init_IRQ(void)
@@ -159,6 +162,7 @@ define_machine(amigaone) {
.name   = "AmigaOne",
.probe  = amigaone_probe,
.setup_arch = amigaone_setup_arch,
+   .discover_phbs  = amigaone_discover_phbs,
.show_cpuinfo   = amigaone_show_cpuinfo,
.init_IRQ   = amigaone_init_IRQ,
.restart= amigaone_restart,
-- 
2.26.2



[PATCH 10/18] powerpc/83xx: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with mpc83xx_defconfig
---
 arch/powerpc/platforms/83xx/asp834x.c | 1 +
 arch/powerpc/platforms/83xx/km83xx.c  | 1 +
 arch/powerpc/platforms/83xx/misc.c| 2 --
 arch/powerpc/platforms/83xx/mpc830x_rdb.c | 1 +
 arch/powerpc/platforms/83xx/mpc831x_rdb.c | 1 +
 arch/powerpc/platforms/83xx/mpc832x_mds.c | 1 +
 arch/powerpc/platforms/83xx/mpc832x_rdb.c | 1 +
 arch/powerpc/platforms/83xx/mpc834x_itx.c | 1 +
 arch/powerpc/platforms/83xx/mpc834x_mds.c | 1 +
 arch/powerpc/platforms/83xx/mpc836x_mds.c | 1 +
 arch/powerpc/platforms/83xx/mpc836x_rdk.c | 1 +
 arch/powerpc/platforms/83xx/mpc837x_mds.c | 1 +
 arch/powerpc/platforms/83xx/mpc837x_rdb.c | 1 +
 13 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/83xx/asp834x.c 
b/arch/powerpc/platforms/83xx/asp834x.c
index 28474876f41b..68061c2a57c1 100644
--- a/arch/powerpc/platforms/83xx/asp834x.c
+++ b/arch/powerpc/platforms/83xx/asp834x.c
@@ -44,6 +44,7 @@ define_machine(asp834x) {
.name   = "ASP8347E",
.probe  = asp834x_probe,
.setup_arch = asp834x_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/km83xx.c 
b/arch/powerpc/platforms/83xx/km83xx.c
index bcdc2c203ec9..108e1e4d2683 100644
--- a/arch/powerpc/platforms/83xx/km83xx.c
+++ b/arch/powerpc/platforms/83xx/km83xx.c
@@ -180,6 +180,7 @@ define_machine(mpc83xx_km) {
.name   = "mpc83xx-km-platform",
.probe  = mpc83xx_km_probe,
.setup_arch = mpc83xx_km_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/misc.c 
b/arch/powerpc/platforms/83xx/misc.c
index a952e91db3ee..3285dabcf923 100644
--- a/arch/powerpc/platforms/83xx/misc.c
+++ b/arch/powerpc/platforms/83xx/misc.c
@@ -132,8 +132,6 @@ void __init mpc83xx_setup_arch(void)
setbat(-1, va, immrbase, immrsize, PAGE_KERNEL_NCG);
update_bats();
}
-
-   mpc83xx_setup_pci();
 }
 
 int machine_check_83xx(struct pt_regs *regs)
diff --git a/arch/powerpc/platforms/83xx/mpc830x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc830x_rdb.c
index 51426e88ec67..956d4389effa 100644
--- a/arch/powerpc/platforms/83xx/mpc830x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc830x_rdb.c
@@ -48,6 +48,7 @@ define_machine(mpc830x_rdb) {
.name   = "MPC830x RDB",
.probe  = mpc830x_rdb_probe,
.setup_arch = mpc830x_rdb_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/mpc831x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc831x_rdb.c
index 5ccd57a48492..3b578f080e3b 100644
--- a/arch/powerpc/platforms/83xx/mpc831x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc831x_rdb.c
@@ -48,6 +48,7 @@ define_machine(mpc831x_rdb) {
.name   = "MPC831x RDB",
.probe  = mpc831x_rdb_probe,
.setup_arch = mpc831x_rdb_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/mpc832x_mds.c 
b/arch/powerpc/platforms/83xx/mpc832x_mds.c
index 6fa5402ebf20..850d566ef900 100644
--- a/arch/powerpc/platforms/83xx/mpc832x_mds.c
+++ b/arch/powerpc/platforms/83xx/mpc832x_mds.c
@@ -101,6 +101,7 @@ define_machine(mpc832x_mds) {
.name   = "MPC832x MDS",
.probe  = mpc832x_sys_probe,
.setup_arch = mpc832x_sys_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/mpc832x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc832x_rdb.c
index 622c625d5ce4..b6133a237a70 100644
--- a/arch/powerpc/platforms/83xx/mpc832x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc832x_rdb.c
@@ -219,6 +219,7 @@ define_machine(mpc832x_rdb) {
.name   = "MPC832x RDB",
.probe  = mpc832x_rdb_probe,
.setup_arch = mpc832x_rdb_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restar

[PATCH 09/18] powerpc/82xx/*: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with pq2fads_defconfig
---
 arch/powerpc/platforms/82xx/mpc8272_ads.c | 2 +-
 arch/powerpc/platforms/82xx/pq2fads.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/82xx/mpc8272_ads.c 
b/arch/powerpc/platforms/82xx/mpc8272_ads.c
index 3fe1a6593280..0b5b9dec16d5 100644
--- a/arch/powerpc/platforms/82xx/mpc8272_ads.c
+++ b/arch/powerpc/platforms/82xx/mpc8272_ads.c
@@ -171,7 +171,6 @@ static void __init mpc8272_ads_setup_arch(void)
iounmap(bcsr);
 
init_ioports();
-   pq2_init_pci();
 
if (ppc_md.progress)
ppc_md.progress("mpc8272_ads_setup_arch(), finish", 0);
@@ -205,6 +204,7 @@ define_machine(mpc8272_ads)
.name = "Freescale MPC8272 ADS",
.probe = mpc8272_ads_probe,
.setup_arch = mpc8272_ads_setup_arch,
+   .discover_phbs = pq2_init_pci,
.init_IRQ = mpc8272_ads_pic_init,
.get_irq = cpm2_get_irq,
.calibrate_decr = generic_calibrate_decr,
diff --git a/arch/powerpc/platforms/82xx/pq2fads.c 
b/arch/powerpc/platforms/82xx/pq2fads.c
index a74082140718..ac9113d524af 100644
--- a/arch/powerpc/platforms/82xx/pq2fads.c
+++ b/arch/powerpc/platforms/82xx/pq2fads.c
@@ -150,8 +150,6 @@ static void __init pq2fads_setup_arch(void)
/* Enable external IRQs */
clrbits32(_immr->im_siu_conf.siu_82xx.sc_siumcr, 0x0c00);
 
-   pq2_init_pci();
-
if (ppc_md.progress)
ppc_md.progress("pq2fads_setup_arch(), finish", 0);
 }
@@ -184,6 +182,7 @@ define_machine(pq2fads)
.name = "Freescale PQ2FADS",
.probe = pq2fads_probe,
.setup_arch = pq2fads_setup_arch,
+   .discover_phbs = pq2_init_pci,
.init_IRQ = pq2fads_pic_init,
.get_irq = cpm2_get_irq,
.calibrate_decr = generic_calibrate_decr,
-- 
2.26.2



[PATCH 08/18] powerpc/52xx/mpc5200_simple: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/52xx/mpc5200_simple.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/mpc5200_simple.c 
b/arch/powerpc/platforms/52xx/mpc5200_simple.c
index 2d01e9b2e779..b9f5675b0a1d 100644
--- a/arch/powerpc/platforms/52xx/mpc5200_simple.c
+++ b/arch/powerpc/platforms/52xx/mpc5200_simple.c
@@ -40,8 +40,6 @@ static void __init mpc5200_simple_setup_arch(void)
 
/* Some mpc5200 & mpc5200b related configuration */
mpc5200_setup_xlb_arbiter();
-
-   mpc52xx_setup_pci();
 }
 
 /* list of the supported boards */
@@ -73,6 +71,7 @@ define_machine(mpc5200_simple_platform) {
.name   = "mpc5200-simple-platform",
.probe  = mpc5200_simple_probe,
.setup_arch = mpc5200_simple_setup_arch,
+   .discover_phbs  = mpc52xx_setup_pci,
.init   = mpc52xx_declare_of_platform_devices,
.init_IRQ   = mpc52xx_init_irq,
.get_irq= mpc52xx_get_irq,
-- 
2.26.2



[PATCH 07/18] powerpc/52xx/media5200: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/52xx/media5200.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/media5200.c 
b/arch/powerpc/platforms/52xx/media5200.c
index 07c5bc4ed0b5..efb8bdecbcc7 100644
--- a/arch/powerpc/platforms/52xx/media5200.c
+++ b/arch/powerpc/platforms/52xx/media5200.c
@@ -202,8 +202,6 @@ static void __init media5200_setup_arch(void)
/* Some mpc5200 & mpc5200b related configuration */
mpc5200_setup_xlb_arbiter();
 
-   mpc52xx_setup_pci();
-
np = of_find_matching_node(NULL, mpc5200_gpio_ids);
gpio = of_iomap(np, 0);
of_node_put(np);
@@ -244,6 +242,7 @@ define_machine(media5200_platform) {
.name   = "media5200-platform",
.probe  = media5200_probe,
.setup_arch = media5200_setup_arch,
+   .discover_phbs  = mpc52xx_setup_pci,
.init   = mpc52xx_declare_of_platform_devices,
.init_IRQ   = media5200_init_irq,
.get_irq= mpc52xx_get_irq,
-- 
2.26.2



[PATCH 06/18] powerpc/52xx/lite5200: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with 52xx/lite5200b_defconfig
---
 arch/powerpc/platforms/52xx/lite5200.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/lite5200.c 
b/arch/powerpc/platforms/52xx/lite5200.c
index 3181aac08225..04cc97397095 100644
--- a/arch/powerpc/platforms/52xx/lite5200.c
+++ b/arch/powerpc/platforms/52xx/lite5200.c
@@ -165,8 +165,6 @@ static void __init lite5200_setup_arch(void)
mpc52xx_suspend.board_resume_finish = lite5200_resume_finish;
lite5200_pm_init();
 #endif
-
-   mpc52xx_setup_pci();
 }
 
 static const char * const board[] __initconst = {
@@ -187,6 +185,7 @@ define_machine(lite5200) {
.name   = "lite5200",
.probe  = lite5200_probe,
.setup_arch = lite5200_setup_arch,
+   .discover_phbs  = mpc52xx_setup_pci,
.init   = mpc52xx_declare_of_platform_devices,
.init_IRQ   = mpc52xx_init_irq,
.get_irq= mpc52xx_get_irq,
-- 
2.26.2



[PATCH 05/18] powerpc/52xx/efika: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with mpc5200_defconfig
---
 arch/powerpc/platforms/52xx/efika.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/efika.c 
b/arch/powerpc/platforms/52xx/efika.c
index 4514a6f7458a..3b7d70d71692 100644
--- a/arch/powerpc/platforms/52xx/efika.c
+++ b/arch/powerpc/platforms/52xx/efika.c
@@ -185,8 +185,6 @@ static void __init efika_setup_arch(void)
/* Map important registers from the internal memory map */
mpc52xx_map_common_devices();
 
-   efika_pcisetup();
-
 #ifdef CONFIG_PM
mpc52xx_suspend.board_suspend_prepare = efika_suspend_prepare;
mpc52xx_pm_init();
@@ -218,6 +216,7 @@ define_machine(efika)
.name   = EFIKA_PLATFORM_NAME,
.probe  = efika_probe,
.setup_arch = efika_setup_arch,
+   .discover_phbs  = efika_pcisetup,
.init   = mpc52xx_declare_of_platform_devices,
.show_cpuinfo   = efika_show_cpuinfo,
.init_IRQ   = mpc52xx_init_irq,
-- 
2.26.2



[PATCH 04/18] powerpc/512x: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
only compile tested
---
 arch/powerpc/platforms/512x/mpc5121_ads.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/512x/mpc5121_ads.c 
b/arch/powerpc/platforms/512x/mpc5121_ads.c
index 6303fbfc4e4f..9d030c2e0004 100644
--- a/arch/powerpc/platforms/512x/mpc5121_ads.c
+++ b/arch/powerpc/platforms/512x/mpc5121_ads.c
@@ -24,21 +24,23 @@
 
 static void __init mpc5121_ads_setup_arch(void)
 {
-#ifdef CONFIG_PCI
-   struct device_node *np;
-#endif
printk(KERN_INFO "MPC5121 ADS board from Freescale Semiconductor\n");
/*
 * cpld regs are needed early
 */
mpc5121_ads_cpld_map();
 
+   mpc512x_setup_arch();
+}
+
+static void __init mpc5121_ads_setup_pci(void)
+{
 #ifdef CONFIG_PCI
+   struct device_node *np;
+
for_each_compatible_node(np, "pci", "fsl,mpc5121-pci")
mpc83xx_add_bridge(np);
 #endif
-
-   mpc512x_setup_arch();
 }
 
 static void __init mpc5121_ads_init_IRQ(void)
@@ -64,6 +66,7 @@ define_machine(mpc5121_ads) {
.name   = "MPC5121 ADS",
.probe  = mpc5121_ads_probe,
.setup_arch = mpc5121_ads_setup_arch,
+   .discover_phbs  = mpc5121_ads_setup_pci,
.init   = mpc512x_init,
.init_IRQ   = mpc5121_ads_init_IRQ,
.get_irq= ipic_get_irq,
-- 
2.26.2



[PATCH 03/18] powerpc/maple: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/maple/setup.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index f7e66a2005b4..4e9ad5bf3efb 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -179,9 +179,6 @@ static void __init maple_setup_arch(void)
 #ifdef CONFIG_SMP
smp_ops = _smp_ops;
 #endif
-   /* Lookup PCI hosts */
-   maple_pci_init();
-
maple_use_rtas_reboot_and_halt_if_present();
 
printk(KERN_DEBUG "Using native/NAP idle loop\n");
@@ -351,6 +348,7 @@ define_machine(maple) {
.name   = "Maple",
.probe  = maple_probe,
.setup_arch = maple_setup_arch,
+   .discover_phbs  = maple_pci_init,
.init_IRQ   = maple_init_IRQ,
.pci_irq_fixup  = maple_pci_irq_fixup,
.pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-- 
2.26.2



[PATCH 02/18] powerpc/{powernv,pseries}: Move PHB discovery

2020-11-02 Thread Oliver O'Halloran
Make powernv and pseries use ppc_mc.discover_phbs. These two platforms need
to be done together because they both depends on pci_dn's being created
from the DT. The pci_dn contains a pointer to the relevant pci_controller
so they need to be created after the pci_controller structures are
available, but before  and before PCI devices are scanned. Currently this
ordering is provided by initcalls and the sequence is:

1. PHBs are discovered (setup_arch) (early boot, pre-initcalls)
2. pci_dn are created from the unflattended DT (core initcall)
3. PHBs are scanned pcibios_init() (subsys initcall)

The new ppc_md.discover_phbs() function is also a core_initcall so we can't
guarantee ordering between the creations of pci_controllers and the
creation of pci_dn's which require a pci_controller. We could use the
postcore, or core_sync initcall levels, but it's cleaner to just move the
pci_dn setup into the per-PHB inits which occur inside of .discover_phb()
for these platforms. This brings the boot-time path in line with the PHB
hotplug path that is used for pseries DLPAR operations too.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/pci_dn.c  | 22 --
 arch/powerpc/platforms/powernv/pci-ioda.c |  3 +++
 arch/powerpc/platforms/powernv/setup.c|  4 +---
 arch/powerpc/platforms/pseries/setup.c|  7 +--
 4 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 54e240597fd9..61571ae23953 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -481,28 +481,6 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
pci_traverse_device_nodes(dn, add_pdn, phb);
 }
 
-/** 
- * pci_devs_phb_init - Initialize phbs and pci devs under them.
- * 
- * This routine walks over all phb's (pci-host bridges) on the
- * system, and sets up assorted pci-related structures 
- * (including pci info in the device node structs) for each
- * pci device found underneath.  This routine runs once,
- * early in the boot sequence.
- */
-static int __init pci_devs_phb_init(void)
-{
-   struct pci_controller *phb, *tmp;
-
-   /* This must be done first so the device nodes have valid pci info! */
-   list_for_each_entry_safe(phb, tmp, _list, list_node)
-   pci_devs_phb_init_dynamic(phb);
-
-   return 0;
-}
-
-core_initcall(pci_devs_phb_init);
-
 static void pci_dev_pdn_setup(struct pci_dev *pdev)
 {
struct pci_dn *pdn;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2b4ceb5e6ce4..d6815f03fee3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3176,6 +3176,9 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
/* Remove M64 resource if we can't configure it successfully */
if (!phb->init_m64 || phb->init_m64(phb))
hose->mem_resources[1].flags = 0;
+
+   /* create pci_dn's for DT nodes under this PHB */
+   pci_devs_phb_init_dynamic(hose);
 }
 
 void __init pnv_pci_init_ioda2_phb(struct device_node *np)
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 9acaa0f131b9..92f5fa827909 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -162,9 +162,6 @@ static void __init pnv_setup_arch(void)
/* Initialize SMP */
pnv_smp_init();
 
-   /* Setup PCI */
-   pnv_pci_init();
-
/* Setup RTC and NVRAM callbacks */
if (firmware_has_feature(FW_FEATURE_OPAL))
opal_nvram_init();
@@ -524,6 +521,7 @@ define_machine(powernv) {
.init_IRQ   = pnv_init_IRQ,
.show_cpuinfo   = pnv_show_cpuinfo,
.get_proc_freq  = pnv_get_proc_freq,
+   .discover_phbs  = pnv_pci_init,
.progress   = pnv_progress,
.machine_shutdown   = pnv_shutdown,
.power_save = NULL,
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 633c45ec406d..e88b30d4b6cd 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -463,7 +463,7 @@ void pseries_little_endian_exceptions(void)
 }
 #endif
 
-static void __init find_and_init_phbs(void)
+static void __init pSeries_discover_phbs(void)
 {
struct device_node *node;
struct pci_controller *phb;
@@ -481,6 +481,9 @@ static void __init find_and_init_phbs(void)
pci_process_bridge_OF_ranges(phb, node, 0);
isa_bridge_find_early(phb);
phb->controller_ops = pseries_pci_controller_ops;
+
+   /* create pci_dn's for DT nodes under this PHB */
+   pci_devs_phb_init_dynamic(phb);
}
 
of_node_put(root);
@@ -777,7 +780,6 @@ static void __init pSeries_set

[PATCH 01/18] powerpc/pci: Add ppc_md.discover_phbs()

2020-11-02 Thread Oliver O'Halloran
On many powerpc platforms the discovery and initalisation of
pci_controllers (PHBs) happens inside of setup_arch(). This is very early
in boot (pre-initcalls) and means that we're initialising the PHB long
before many basic kernel services (slab allocator, debugfs, a real ioremap)
are available.

On PowerNV this causes an additional problem since we map the PHB registers
with ioremap(). As of commit d538aadc2718 ("powerpc/ioremap: warn on early
use of ioremap()") a warning is printed because we're using the "incorrect"
API to setup and MMIO mapping in searly boot. The kernel does provide
early_ioremap(), but that is not intended to create long-lived MMIO
mappings and a seperate warning is printed by generic code if
early_ioremap() mappings are "leaked."

This is all fixable with dumb hacks like using early_ioremap() to setup
the initial mapping then replacing it with a real ioremap later on in
boot, but it does raise the question: Why the hell are we setting up the
PHB's this early in boot?

The old and wise claim it's due to "hysterical rasins." Aside from amused
grapes there doesn't appear to be any real reason to maintain the current
behaviour. Already most of the newer embedded platforms perform PHB
discovery in an arch_initcall and between the end of setup_arch() and the
start of initcalls none of the generic kernel code does anything PCI
related. On powerpc scanning PHBs occurs in a subsys_initcall so it should
be possible to move the PHB discovery to a core, postcore or arch initcall.

This patch adds the ppc_md.discover_phbs hook and a core_initcall stub that
calls it. The core_initcalls are the earliest to be called so this will
any possibly issues with dependency between initcalls. This isn't just an
academic issue either since on pseries and PowerNV EEH init occurs in an
arch_initcall and depends on the pci_controllers being available, similarly
the creation of pci_dns occurs at core_initcall_sync (i.e. between core and
postcore initcalls). These problems need to be addressed seperately.

Cc: Paul Mackerras 
Cc: Christophe Leroy 
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   | 10 ++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 475687f24f4a..d319160d790c 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -59,6 +59,9 @@ struct machdep_calls {
int (*pcibios_root_bridge_prepare)(struct pci_host_bridge
*bridge);
 
+   /* finds all the pci_controllers present at boot */
+   void(*discover_phbs)(void);
+
/* To setup PHBs when using automatic OF platform driver for PCI */
int (*pci_setup_phb)(struct pci_controller *host);
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index be108616a721..6265e7d1c697 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1625,3 +1625,13 @@ static void fixup_hide_host_resource_fsl(struct pci_dev 
*dev)
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MOTOROLA, PCI_ANY_ID, 
fixup_hide_host_resource_fsl);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, PCI_ANY_ID, 
fixup_hide_host_resource_fsl);
+
+
+int __init discover_phbs(void)
+{
+   if (ppc_md.discover_phbs)
+   ppc_md.discover_phbs();
+
+   return 0;
+}
+core_initcall(discover_phbs);
-- 
2.26.2



Re: [PATCH] powerpc/eeh_cache: Fix a possible debugfs deadlock

2020-10-28 Thread Oliver O'Halloran
On Thu, Oct 29, 2020 at 2:27 AM Qian Cai  wrote:
>
> Lockdep complains that a possible deadlock below in
> eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
> but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
> disabled. Let's just make eeh_addr_cache_show() acquire the lock with
> IRQ disabled as well.
>
> CPU0CPU1
> 
>lock(_io_addr_cache_root.piar_lock);
> local_irq_disable();
> lock(>lock);
> lock(_io_addr_cache_root.piar_lock);
>
>  lock(>lock);
>
>   *** DEADLOCK ***
>
>   lock_acquire+0x140/0x5f0
>   _raw_spin_lock_irqsave+0x64/0xb0
>   eeh_addr_cache_insert_dev+0x48/0x390
>   eeh_probe_device+0xb8/0x1a0
>   pnv_pcibios_bus_add_device+0x3c/0x80
>   pcibios_bus_add_device+0x118/0x290
>   pci_bus_add_device+0x28/0xe0
>   pci_bus_add_devices+0x54/0xb0
>   pcibios_init+0xc4/0x124
>   do_one_initcall+0xac/0x528
>   kernel_init_freeable+0x35c/0x3fc
>   kernel_init+0x24/0x148
>   ret_from_kernel_thread+0x5c/0x80
>
>   lock_acquire+0x140/0x5f0
>   _raw_spin_lock+0x4c/0x70
>   eeh_addr_cache_show+0x38/0x110
>   seq_read+0x1a0/0x660
>   vfs_read+0xc8/0x1f0
>   ksys_read+0x74/0x130
>   system_call_exception+0xf8/0x1d0
>   system_call_common+0xe8/0x218
>
> Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address 
> cache")
> Signed-off-by: Qian Cai 

Good catch,

Reviewed-by: Oliver O'Halloran 


[PATCH] powerpc/eeh: Fix eeh_dev_check_failure() for PE#0

2020-10-21 Thread Oliver O'Halloran
In commit 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr") the
following simplification was made:

-   if (!pe->addr && !pe->config_addr) {
+   if (!pe->addr) {
eeh_stats.no_cfg_addr++;
return 0;
}

This introduced a bug which causes EEH checking to be skipped for devices
in PE#0.

Before the change above the check would always pass since atleast one of
the two PE addresses would be non-zero in all circumstances. On PowerNV
pe->config_addr was the be the BDFN of the first device added to the PE.
The zero BDFN is reserved for the PHB's root port, but this is fine since
for obscure platform reasons the root port is never assigned to PE#0.

Similarly, on pseries pe->addr has always been non-zero for the reasons
outlined in commit 42de19d5ef71 ("powerpc/pseries/eeh: Allow zero to be a
valid PE configuration address").

We can fix the problem by deleting the block entirely The original
purpose of this test was to avoid performing EEH checks on devices there
were not on an EEH capable bus. In modern Linux the edev->pe pointer will
be NULL for devices that are not on an EEH capable bus. The code block
immediately above this one already checks for the edev->pe == NULL case
so this test (new and old) is entirely redundant.

Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments it
any more. Unfortunately, that information is exposed via /proc/powerpc/eeh
which means it's technically ABI. We could make it hard-coded, but that's
a change for another patch.

Fixes: 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr")
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/eeh.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 4da880759a8b..7dd03d47693f 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -470,11 +470,6 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
return 0;
}
 
-   if (!pe->addr) {
-   eeh_stats.no_cfg_addr++;
-   return 0;
-   }
-
/*
 * On PowerNV platform, we might already have fenced PHB
 * there and we need take care of that firstly.
-- 
2.26.2



Re: [PATCH -next] Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed"

2020-10-14 Thread Oliver O'Halloran
On Thu, Oct 15, 2020 at 5:28 AM Qian Cai  wrote:
>
> This reverts commit 3a3181e16fbde752007759f8759d25e0ff1fc425 which
> causes memory corruptions on POWER9 NV.

I was going to post this along with a fix for Cedric's original bug,
but I can do that separately so:

Acked-by: Oliver O'Halloran 


[PATCH] selftests/powerpc: Fix eeh-basic.sh exit codes

2020-10-13 Thread Oliver O'Halloran
The kselftests test running infrastructure expects tests to finish with an
exit code of 4 if the test decided it should be skipped. Currently
eeh-basic.sh exits with the number of devices that failed to recover, so if
four devices didn't recover we'll report a skip instead of a fail.

Fix this by checking if the return code is non-zero and report success
and failure by returning 0 or 1 respectively. For the cases where should
actually skip return 4.

Fixes: 85d86c8aa52e ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran 
---
 tools/testing/selftests/powerpc/eeh/eeh-basic.sh | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh 
b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
index 8a8d0f456946..0d783e1065c8 100755
--- a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
@@ -1,17 +1,19 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0-only
 
+KSELFTESTS_SKIP=4
+
 . ./eeh-functions.sh
 
 if ! eeh_supported ; then
echo "EEH not supported on this system, skipping"
-   exit 0;
+   exit $KSELFTESTS_SKIP;
 fi
 
 if [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_check" ] && \
[ ! -e "/sys/kernel/debug/powerpc/eeh_dev_break" ] ; then
echo "debugfs EEH testing files are missing. Is debugfs mounted?"
-   exit 1;
+   exit $KSELFTESTS_SKIP;
 fi
 
 pre_lspci=`mktemp`
@@ -84,4 +86,5 @@ echo "$failed devices failed to recover ($dev_count tested)"
 lspci | diff -u $pre_lspci -
 rm -f $pre_lspci
 
-exit $failed
+test "$failed" == 0
+exit $?
-- 
2.26.2



Re: [PATCH] powerpc/pci: Fix PHB removal/rescan on PowerNV

2020-10-07 Thread Oliver O'Halloran
On Fri, Sep 25, 2020 at 7:23 PM Cédric Le Goater  wrote:
>
> To fix an issue with PHB hotplug on pSeries machine (HPT/XIVE), commit
> 3a3181e16fbd introduced a PPC specific pcibios_remove_bus() routine to
> clear all interrupt mappings when the bus is removed. This routine
> frees an array allocated in pcibios_scan_phb().
>
> This broke PHB hotplug on PowerNV because, when a PHB is removed and
> re-scanned through sysfs, the PCI layer un-assigns and re-assigns
> resources to the PHB but does not destroy and recreate the PCI
> controller structure. Since pcibios_remove_bus() does not clear the
> 'irq_map' array pointer, a second removal of the PHB will try to free
> the array a second time and corrupt memory.

"PHB hotplug" and "hot-plugging devices under a PHB" are different
things. What you're saying here doesn't make a whole lot of sense to
me unless you're conflating the two. The distinction is important
since on pseries we can use DLPAR to add and remove actual PHBs (i.e.
the pci_controller) at runtime, but there's no corresponding mechanism
on PowerNV.

> Free the 'irq_map' array in pcibios_free_controller() to fix
> corruption and clear interrupt mapping after it has been
> disposed. This to avoid filling up the array with successive
> remove/rescan of a bus.

Even with this patch I think we're still broken. With this patch
applied the init path is something like:

per-phb init:
allocate phb->irq_map array
per-bus init:
nothing
per-device init:
pcibios_bus_add_device()
   pci_read_irq_line()
pci_irq_map_register(pci_dev, virq)
   *record the device's interrupt in phb->irq_map*

And the teardown path:

per-device teardown:
nothing
per-bus teardown:
pcibios_remove_bus()
pci_irq_map_dispose()
*walk phb->irq_map and dispose of each mapped interrupt*
per-phb teardown:
free(phb->irq_map)

There's a lot of asymmetry here, which is a problem in itself, but the
real issue is that when removing *any* pci_bus under a PHB we dispose
of the LSI\ for *every* device under that PHB. Not good.

Ideally we should be fixing this by having the per-device teardown
handle disposing the mapping. Unfortunately, there's no pcibios hook
that's called when removing a pci_dev. However, we can register a bus
notifier which will be called when the pci_dev is removed from its bus
and we already do that for the per-device EEH teardown and to handle
IOMMU TCE invalidation when the device is removed.


[PATCH 2/2] powerpc/pseries/eeh: Fix use of uninitialised variable

2020-10-06 Thread Oliver O'Halloran
If the RTAS call to query the PE address for a device fails we jump the
err: label where an error message is printed along with the return code.
However, the printed return code is from the "ret" variable which isn't set
at that point since we assigned the result to "addr" instead. Fix this by
consistently using the "ret" variable for the result of the RTAS call
helpers an dropping the "addr" local variable"

Fixes: 98ba956f6a38 ("powerpc/pseries/eeh: Rework device EEH PE determination")
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index d8fd5f7b2143..cf024fa37bda 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -363,7 +363,6 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
 {
struct eeh_pe pe, *parent;
struct eeh_dev *edev;
-   int addr;
u32 pcie_flags;
int ret;
 
@@ -422,8 +421,8 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
}
 
/* first up, find the pe_config_addr for the PE containing the device */
-   addr = pseries_eeh_get_pe_config_addr(pdn);
-   if (addr < 0) {
+   ret = pseries_eeh_get_pe_config_addr(pdn);
+   if (ret < 0) {
eeh_edev_dbg(edev, "Unable to find pe_config_addr\n");
goto err;
}
@@ -431,7 +430,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
/* Try enable EEH on the fake PE */
memset(, 0, sizeof(struct eeh_pe));
pe.phb = pdn->phb;
-   pe.addr = addr;
+   pe.addr = ret;
 
eeh_edev_dbg(edev, "Enabling EEH on device\n");
ret = eeh_ops->set_option(, EEH_OPT_ENABLE);
@@ -440,7 +439,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
goto err;
}
 
-   edev->pe_config_addr = addr;
+   edev->pe_config_addr = pe.addr;
 
eeh_add_flag(EEH_ENABLED);
 
-- 
2.26.2



[PATCH 1/2] powerpc/eeh: Delete eeh_pe->config_addr

2020-10-06 Thread Oliver O'Halloran
The eeh_pe->config_addr field was supposed to be removed in
commit 35d64734b643 ("powerpc/eeh: Clean up PE addressing") which made it
largely unused. Finish the job.

Signed-off-by: Oliver O'Halloran 
---
mpe: the Fixes SHA is from ppc/next, fold it into that if you want.
---
 arch/powerpc/include/asm/eeh.h | 1 -
 arch/powerpc/kernel/eeh.c  | 2 +-
 arch/powerpc/kernel/eeh_pe.c   | 4 ++--
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index dd6a4ac6c713..b1a5bba2e0b9 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -73,7 +73,6 @@ struct pci_dn;
 struct eeh_pe {
int type;   /* PE type: PHB/Bus/Device  */
int state;  /* PE EEH dependent mode*/
-   int config_addr;/* Traditional PCI address  */
int addr;   /* PE configuration address */
struct pci_controller *phb; /* Associated PHB   */
struct pci_bus *bus;/* Top PCI bus for bus PE   */
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 87de8b798b2d..0e160dffcb86 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -466,7 +466,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
return 0;
}
 
-   if (!pe->addr && !pe->config_addr) {
+   if (!pe->addr) {
eeh_stats.no_cfg_addr++;
return 0;
}
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 61b7d4019051..845e024321d4 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -354,8 +354,8 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe 
*new_pe_parent)
pr_err("%s: out of memory!\n", __func__);
return -ENOMEM;
}
-   pe->addr= edev->pe_config_addr;
-   pe->config_addr = edev->bdfn;
+
+   pe->addr = edev->pe_config_addr;
 
/*
 * Put the new EEH PE into hierarchy tree. If the parent
-- 
2.26.2



Re: [PATCH v2] powernv/elog: Fix the race while processing OPAL error log event.

2020-10-05 Thread Oliver O'Halloran
On Mon, Oct 5, 2020 at 3:12 PM Mahesh Salgaonkar  wrote:
>
> Every error log reported by OPAL is exported to userspace through a sysfs
> interface and notified using kobject_uevent(). The userspace daemon
> (opal_errd) then reads the error log and acknowledges it error log is saved
> safely to disk. Once acknowledged the kernel removes the respective sysfs
> file entry causing respective resources getting released including kobject.
>
> However there are chances where user daemon may already be scanning elog
> entries while new sysfs elog entry is being created by kernel. User daemon
> may read this new entry and ack it even before kernel can notify userspace
> about it through kobject_uevent() call. If that happens then we have a
> potential race between elog_ack_store->kobject_put() and kobject_uevent
> which can lead to use-after-free issue of a kernfs object resulting into a
> kernel crash. This patch fixes this race by protecting a sysfs file
> creation/notification by holding an additional reference count on kobject
> until we safely send kobject_uevent().
>
> Reported-by: Oliver O'Halloran 
> Signed-off-by: Mahesh Salgaonkar 
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Change in v2:
> - Instead of mutex and use extra reference count on kobject to avoid the
>   race.
> ---
>  arch/powerpc/platforms/powernv/opal-elog.c |   15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/arch/powerpc/platforms/powernv/opal-elog.c 
> b/arch/powerpc/platforms/powernv/opal-elog.c
> index 62ef7ad995da..230f102e87c0 100644
> --- a/arch/powerpc/platforms/powernv/opal-elog.c
> +++ b/arch/powerpc/platforms/powernv/opal-elog.c
> @@ -222,13 +222,28 @@ static struct elog_obj *create_elog_obj(uint64_t id, 
> size_t size, uint64_t type)
> return NULL;
> }
>
> +   /*
> +* As soon as sysfs file for this elog is created/activated there is
> +* chance opal_errd daemon might read and acknowledge this elog before
> +* kobject_uevent() is called. If that happens then we have a 
> potential
> +* race between elog_ack_store->kobject_put() and kobject_uevent which
> +* leads to use-after-free issue of a kernfs object resulting into
> +* kernel crash. To avoid this race take an additional reference count
> +* on kobject until we safely send kobject_uevent().
> +*/
> +
> +   kobject_get(>kobj);  /* extra reference count */
> rc = sysfs_create_bin_file(>kobj, >raw_attr);
> if (rc) {
> +   kobject_put(>kobj);
> +   /* Drop the extra reference count  */
> kobject_put(>kobj);
>     return NULL;
> }
>
> kobject_uevent(>kobj, KOBJ_ADD);
> +   /* Drop the extra reference count  */
> +   kobject_put(>kobj);

Makes sense,

Reviewed-by: Oliver O'Halloran 

>
> return elog;

Does the returned value actually get used anywhere? We'd have a
similar use-after-free problem if it does. This should probably return
an error code rather than the object itself.


Re: [PATCH v2] powernv/elog: Fix the race while processing OPAL error log event.

2020-10-05 Thread Oliver O'Halloran
On Mon, Oct 5, 2020 at 11:07 PM Ananth N Mavinakayanahalli
 wrote:
>
> On 10/5/20 9:42 AM, Mahesh Salgaonkar wrote:
> > Every error log reported by OPAL is exported to userspace through a sysfs
> > interface and notified using kobject_uevent(). The userspace daemon
> > (opal_errd) then reads the error log and acknowledges it error log is saved
> > safely to disk. Once acknowledged the kernel removes the respective sysfs
> > file entry causing respective resources getting released including kobject.
> >
> > However there are chances where user daemon may already be scanning elog
> > entries while new sysfs elog entry is being created by kernel. User daemon
> > may read this new entry and ack it even before kernel can notify userspace
> > about it through kobject_uevent() call. If that happens then we have a
> > potential race between elog_ack_store->kobject_put() and kobject_uevent
> > which can lead to use-after-free issue of a kernfs object resulting into a
> > kernel crash. This patch fixes this race by protecting a sysfs file
> > creation/notification by holding an additional reference count on kobject
> > until we safely send kobject_uevent().
> >
> > Reported-by: Oliver O'Halloran 
> > Signed-off-by: Mahesh Salgaonkar 
> > Signed-off-by: Aneesh Kumar K.V 
>
> cc stable?

+1


Re: [PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules

2020-09-28 Thread Oliver O'Halloran
On Tue, Sep 29, 2020 at 6:50 AM Tyrel Datwyler  wrote:
>
> On 9/23/20 11:41 PM, Oliver O'Halloran wrote:
> > On Thu, Sep 24, 2020 at 3:15 PM Mamatha Inamdar
> >  wrote:
> >>
> >> This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
> >> (descriptions taken from Kconfig file)
> >>
> >> Signed-off-by: Mamatha Inamdar 
> >> ---
> >>  drivers/pci/hotplug/rpadlpar_core.c |1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> >> b/drivers/pci/hotplug/rpadlpar_core.c
> >> index f979b70..bac65ed 100644
> >> --- a/drivers/pci/hotplug/rpadlpar_core.c
> >> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> >> @@ -478,3 +478,4 @@ static void __exit rpadlpar_io_exit(void)
> >>  module_init(rpadlpar_io_init);
> >>  module_exit(rpadlpar_io_exit);
> >>  MODULE_LICENSE("GPL");
> >> +MODULE_DESCRIPTION("RPA Dynamic Logical Partitioning driver for I/O 
> >> slots");
> >
> > RPA as a spec was superseded by PAPR in the early 2000s. Can we rename
> > this already?
>
> I seem to recall Michael and I discussed the naming briefly when I added the
> maintainer entries for the drivers and that the PAPR acronym is almost as
> meaningless to most as the original RPA. While, IBM no longer uses the term
> pseries for Power hardware marketing it is the defacto platform identifier in
> the Linux kernel tree for what we would call PAPR compliant. All in all I have
> no problem with renaming, but maybe we should consider pseries_dlpar or even
> simpler ibmdlpar.

I'm not too bothered by what we call it so long as it's consistent
with *something* else in the tree. Using pseries rather than ibm as a
prefix would probably be better since the legacy ibmphp driver is in
the same directory.


Re: [PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules

2020-09-27 Thread Oliver O'Halloran
On Sat, Sep 26, 2020 at 5:43 AM Bjorn Helgaas  wrote:
>
> On Thu, Sep 24, 2020 at 04:41:39PM +1000, Oliver O'Halloran wrote:
> > On Thu, Sep 24, 2020 at 3:15 PM Mamatha Inamdar
> >  wrote:
> > >
> > > This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
> > > (descriptions taken from Kconfig file)
> > >
> > > Signed-off-by: Mamatha Inamdar 
> > > ---
> > >  drivers/pci/hotplug/rpadlpar_core.c |1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> > > b/drivers/pci/hotplug/rpadlpar_core.c
> > > index f979b70..bac65ed 100644
> > > --- a/drivers/pci/hotplug/rpadlpar_core.c
> > > +++ b/drivers/pci/hotplug/rpadlpar_core.c
> > > @@ -478,3 +478,4 @@ static void __exit rpadlpar_io_exit(void)
> > >  module_init(rpadlpar_io_init);
> > >  module_exit(rpadlpar_io_exit);
> > >  MODULE_LICENSE("GPL");
> > > +MODULE_DESCRIPTION("RPA Dynamic Logical Partitioning driver for I/O 
> > > slots");
> >
> > RPA as a spec was superseded by PAPR in the early 2000s. Can we rename
> > this already?
> >
> > The only potential problem I can see is scripts doing: modprobe
> > rpadlpar_io or similar
> >
> > However, we should be able to fix that with a module alias.
>
> Is MODULE_DESCRIPTION() connected with how modprobe works?

I don't think so. The description is just there as an FYI.

> If this patch just improves documentation, without breaking users of
> modprobe, I'm fine with it, even if it would be nice to rename to PAPR
> or something in the future.

Right, the change in this patch is just a documentation fix and
shouldn't cause any problems.

I was suggesting renaming the module itself since the term "RPA" is
only used in this hotplug driver and some of the corresponding PHB add
/ remove handling in arch/powerpc/platforms/pseries/. We can make that
change in a follow up though.

> But, please use "git log --oneline drivers/pci/hotplug/rpadlpar*" and
> match the style, and also look through the rest of drivers/pci/ to see
> if we should do the same thing to any other modules.
>
> Bjorn


Re: [PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules

2020-09-24 Thread Oliver O'Halloran
On Thu, Sep 24, 2020 at 3:15 PM Mamatha Inamdar
 wrote:
>
> This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
> (descriptions taken from Kconfig file)
>
> Signed-off-by: Mamatha Inamdar 
> ---
>  drivers/pci/hotplug/rpadlpar_core.c |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> b/drivers/pci/hotplug/rpadlpar_core.c
> index f979b70..bac65ed 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -478,3 +478,4 @@ static void __exit rpadlpar_io_exit(void)
>  module_init(rpadlpar_io_init);
>  module_exit(rpadlpar_io_exit);
>  MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("RPA Dynamic Logical Partitioning driver for I/O slots");

RPA as a spec was superseded by PAPR in the early 2000s. Can we rename
this already?

The only potential problem I can see is scripts doing: modprobe
rpadlpar_io or similar

However, we should be able to fix that with a module alias.

Oliver


[RFC PATCH 18/18] powerpc/powermac: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with pmac32_defconfig and g5_defconfig
---
 arch/powerpc/platforms/powermac/setup.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/setup.c 
b/arch/powerpc/platforms/powermac/setup.c
index f002b0fa69b8..0f8669139a21 100644
--- a/arch/powerpc/platforms/powermac/setup.c
+++ b/arch/powerpc/platforms/powermac/setup.c
@@ -298,9 +298,6 @@ static void __init pmac_setup_arch(void)
of_node_put(ic);
}
 
-   /* Lookup PCI hosts */
-   pmac_pci_init();
-
 #ifdef CONFIG_PPC32
ohare_init();
l2cr_init();
@@ -600,6 +597,7 @@ define_machine(powermac) {
.name   = "PowerMac",
.probe  = pmac_probe,
.setup_arch = pmac_setup_arch,
+   .discover_phbs  = pmac_pci_init,
.show_cpuinfo   = pmac_show_cpuinfo,
.init_IRQ   = pmac_pic_init,
.get_irq= NULL, /* changed later */
-- 
2.26.2



[RFC PATCH 17/18] powerpc/pasemi: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with pasemi_defconfig
---
 arch/powerpc/platforms/pasemi/setup.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pasemi/setup.c 
b/arch/powerpc/platforms/pasemi/setup.c
index b612474f8f8e..376797eb7894 100644
--- a/arch/powerpc/platforms/pasemi/setup.c
+++ b/arch/powerpc/platforms/pasemi/setup.c
@@ -144,8 +144,6 @@ static void __init pas_setup_arch(void)
/* Setup SMP callback */
smp_ops = _smp_ops;
 #endif
-   /* Lookup PCI hosts */
-   pas_pci_init();
 
/* Remap SDC register for doing reset */
/* XXXOJN This should maybe come out of the device tree */
@@ -446,6 +444,7 @@ define_machine(pasemi) {
.name   = "PA Semi PWRficient",
.probe  = pas_probe,
.setup_arch = pas_setup_arch,
+   .discover_phbs  = pas_pci_init,
.init_IRQ   = pas_init_IRQ,
.get_irq= mpic_get_irq,
.restart= pas_restart,
-- 
2.26.2



[RFC PATCH 16/18] powerpc/embedded6xx/mve5100: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with mvme5100_defconfig
---
 arch/powerpc/platforms/embedded6xx/mvme5100.c   | 13 -
 arch/powerpc/platforms/embedded6xx/storcenter.c |  8 ++--
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/mvme5100.c 
b/arch/powerpc/platforms/embedded6xx/mvme5100.c
index 1cd488daa0bf..c06a0490d157 100644
--- a/arch/powerpc/platforms/embedded6xx/mvme5100.c
+++ b/arch/powerpc/platforms/embedded6xx/mvme5100.c
@@ -154,17 +154,19 @@ static const struct of_device_id mvme5100_of_bus_ids[] 
__initconst = {
  */
 static void __init mvme5100_setup_arch(void)
 {
-   struct device_node *np;
-
if (ppc_md.progress)
ppc_md.progress("mvme5100_setup_arch()", 0);
 
-   for_each_compatible_node(np, "pci", "hawk-pci")
-   mvme5100_add_bridge(np);
-
restart = ioremap(BOARD_MODRST_REG, 4);
 }
 
+static void __init mvme5100_setup_pci(void)
+{
+   struct device_node *np;
+
+   for_each_compatible_node(np, "pci", "hawk-pci")
+   mvme5100_add_bridge(np);
+}
 
 static void mvme5100_show_cpuinfo(struct seq_file *m)
 {
@@ -205,6 +207,7 @@ define_machine(mvme5100) {
.name   = "MVME5100",
.probe  = mvme5100_probe,
.setup_arch = mvme5100_setup_arch,
+   .discover_phbs  = mvme5100_setup_pci,
.init_IRQ   = mvme5100_pic_init,
.show_cpuinfo   = mvme5100_show_cpuinfo,
.get_irq= mpic_get_irq,
diff --git a/arch/powerpc/platforms/embedded6xx/storcenter.c 
b/arch/powerpc/platforms/embedded6xx/storcenter.c
index ed1914dd34bb..e8c5de54b0e1 100644
--- a/arch/powerpc/platforms/embedded6xx/storcenter.c
+++ b/arch/powerpc/platforms/embedded6xx/storcenter.c
@@ -65,14 +65,17 @@ static int __init storcenter_add_bridge(struct device_node 
*dev)
 }
 
 static void __init storcenter_setup_arch(void)
+{
+   printk(KERN_INFO "IOMEGA StorCenter\n");
+}
+
+static void __init storcenter_setup_pci(void)
 {
struct device_node *np;
 
/* Lookup PCI host bridges */
for_each_compatible_node(np, "pci", "mpc10x-pci")
storcenter_add_bridge(np);
-
-   printk(KERN_INFO "IOMEGA StorCenter\n");
 }
 
 /*
@@ -116,6 +119,7 @@ define_machine(storcenter){
.name   = "IOMEGA StorCenter",
.probe  = storcenter_probe,
.setup_arch = storcenter_setup_arch,
+   .discover_phbs  = storcenter_setup_pci,
.init_IRQ   = storcenter_init_IRQ,
.get_irq= mpic_get_irq,
.restart= storcenter_restart,
-- 
2.26.2



[RFC PATCH 15/18] powerpc/embedded6xx/mpc7448: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with mpc7448_hpc2_defconfig
---
 arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c 
b/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
index 15437abe1f6d..20b727584e40 100644
--- a/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
+++ b/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
@@ -58,16 +58,14 @@ int mpc7448_hpc2_exclude_device(struct pci_controller *hose,
return PCIBIOS_SUCCESSFUL;
 }
 
-static void __init mpc7448_hpc2_setup_arch(void)
+static void __init mpc7448_hpc2_setup_pci(void)
 {
+#ifdef CONFIG_PCI
struct device_node *np;
if (ppc_md.progress)
-   ppc_md.progress("mpc7448_hpc2_setup_arch():set_bridge", 0);
-
-   tsi108_csr_vir_base = get_vir_csrbase();
+   ppc_md.progress("mpc7448_hpc2_setup_pci():set_bridge", 0);
 
/* setup PCI host bridge */
-#ifdef CONFIG_PCI
for_each_compatible_node(np, "pci", "tsi108-pci")
tsi108_setup_pci(np, MPC7448HPC2_PCI_CFG_PHYS, 0);
 
@@ -75,6 +73,11 @@ static void __init mpc7448_hpc2_setup_arch(void)
if (ppc_md.progress)
ppc_md.progress("tsi108: resources set", 0x100);
 #endif
+}
+
+static void __init mpc7448_hpc2_setup_arch(void)
+{
+   tsi108_csr_vir_base = get_vir_csrbase();
 
printk(KERN_INFO "MPC7448HPC2 (TAIGA) Platform\n");
printk(KERN_INFO
@@ -180,6 +183,7 @@ define_machine(mpc7448_hpc2){
.name   = "MPC7448 HPC2",
.probe  = mpc7448_hpc2_probe,
.setup_arch = mpc7448_hpc2_setup_arch,
+   .discover_phbs  = mpc7448_hpc2_setup_pci,
.init_IRQ   = mpc7448_hpc2_init_IRQ,
.show_cpuinfo   = mpc7448_hpc2_show_cpuinfo,
.get_irq= mpic_get_irq,
-- 
2.26.2



[RFC PATCH 13/18] powerpc/embedded6xx/holly: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with holly_defconfig
---
 arch/powerpc/platforms/embedded6xx/holly.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/holly.c 
b/arch/powerpc/platforms/embedded6xx/holly.c
index d8f2e2c737bb..53065d564161 100644
--- a/arch/powerpc/platforms/embedded6xx/holly.c
+++ b/arch/powerpc/platforms/embedded6xx/holly.c
@@ -108,15 +108,13 @@ static void holly_remap_bridge(void)
tsi108_write_reg(TSI108_PCI_P2O_BAR2, 0x0);
 }
 
-static void __init holly_setup_arch(void)
+static void __init holly_init_pci(void)
 {
struct device_node *np;
 
if (ppc_md.progress)
ppc_md.progress("holly_setup_arch():set_bridge", 0);
 
-   tsi108_csr_vir_base = get_vir_csrbase();
-
/* setup PCI host bridge */
holly_remap_bridge();
 
@@ -127,6 +125,11 @@ static void __init holly_setup_arch(void)
ppc_md.pci_exclude_device = holly_exclude_device;
if (ppc_md.progress)
ppc_md.progress("tsi108: resources set", 0x100);
+}
+
+static void __init holly_setup_arch(void)
+{
+   tsi108_csr_vir_base = get_vir_csrbase();
 
printk(KERN_INFO "PPC750GX/CL Platform\n");
 }
@@ -259,6 +262,7 @@ define_machine(holly){
.name   = "PPC750 GX/CL TSI",
.probe  = holly_probe,
.setup_arch = holly_setup_arch,
+   .discover_phbs  = holly_init_pci,
.init_IRQ   = holly_init_IRQ,
.show_cpuinfo   = holly_show_cpuinfo,
.get_irq= mpic_get_irq,
-- 
2.26.2



[RFC PATCH 14/18] powerpc/embedded6xx/linkstation: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with linkstation_defconfig
---
 arch/powerpc/platforms/embedded6xx/linkstation.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/linkstation.c 
b/arch/powerpc/platforms/embedded6xx/linkstation.c
index f514d5d28cd4..eb8342e7f84e 100644
--- a/arch/powerpc/platforms/embedded6xx/linkstation.c
+++ b/arch/powerpc/platforms/embedded6xx/linkstation.c
@@ -63,15 +63,18 @@ static int __init linkstation_add_bridge(struct device_node 
*dev)
 }
 
 static void __init linkstation_setup_arch(void)
+{
+   printk(KERN_INFO "BUFFALO Network Attached Storage Series\n");
+   printk(KERN_INFO "(C) 2002-2005 BUFFALO INC.\n");
+}
+
+static void __init linkstation_setup_pci(void)
 {
struct device_node *np;
 
/* Lookup PCI host bridges */
for_each_compatible_node(np, "pci", "mpc10x-pci")
linkstation_add_bridge(np);
-
-   printk(KERN_INFO "BUFFALO Network Attached Storage Series\n");
-   printk(KERN_INFO "(C) 2002-2005 BUFFALO INC.\n");
 }
 
 /*
@@ -153,6 +156,7 @@ define_machine(linkstation){
.name   = "Buffalo Linkstation",
.probe  = linkstation_probe,
.setup_arch = linkstation_setup_arch,
+   .discover_phbs  = linkstation_setup_pci,
.init_IRQ   = linkstation_init_IRQ,
.show_cpuinfo   = linkstation_show_cpuinfo,
.get_irq= mpic_get_irq,
-- 
2.26.2



[RFC PATCH 12/18] powerpc/chrp: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with chrp32_defconfig
---
 arch/powerpc/platforms/chrp/pci.c   |  8 
 arch/powerpc/platforms/chrp/setup.c | 12 +---
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/chrp/pci.c 
b/arch/powerpc/platforms/chrp/pci.c
index b2c2bf35b76c..8c421dc78b28 100644
--- a/arch/powerpc/platforms/chrp/pci.c
+++ b/arch/powerpc/platforms/chrp/pci.c
@@ -314,6 +314,14 @@ chrp_find_bridges(void)
}
}
of_node_put(root);
+
+   /*
+*  "Temporary" fixes for PCI devices.
+*  -- Geert
+*/
+   hydra_init();   /* Mac I/O */
+
+   pci_create_OF_bus_map();
 }
 
 /* SL82C105 IDE Control/Status Register */
diff --git a/arch/powerpc/platforms/chrp/setup.c 
b/arch/powerpc/platforms/chrp/setup.c
index c45435aa5e36..3cfc382841e5 100644
--- a/arch/powerpc/platforms/chrp/setup.c
+++ b/arch/powerpc/platforms/chrp/setup.c
@@ -334,22 +334,11 @@ static void __init chrp_setup_arch(void)
/* On pegasos, enable the L2 cache if not already done by OF */
pegasos_set_l2cr();
 
-   /* Lookup PCI host bridges */
-   chrp_find_bridges();
-
-   /*
-*  Temporary fixes for PCI devices.
-*  -- Geert
-*/
-   hydra_init();   /* Mac I/O */
-
/*
 *  Fix the Super I/O configuration
 */
sio_init();
 
-   pci_create_OF_bus_map();
-
/*
 * Print the banner, then scroll down so boot progress
 * can be printed.  -- Cort
@@ -582,6 +571,7 @@ define_machine(chrp) {
.name   = "CHRP",
.probe  = chrp_probe,
.setup_arch = chrp_setup_arch,
+   .discover_phbs  = chrp_find_bridges,
.init   = chrp_init2,
.show_cpuinfo   = chrp_show_cpuinfo,
.init_IRQ   = chrp_init_IRQ,
-- 
2.26.2



[RFC PATCH 11/18] powerpc/amigaone: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with amigaone_defconfig
---
 arch/powerpc/platforms/amigaone/setup.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/amigaone/setup.c 
b/arch/powerpc/platforms/amigaone/setup.c
index f5d0bf999759..b25ddf39dd43 100644
--- a/arch/powerpc/platforms/amigaone/setup.c
+++ b/arch/powerpc/platforms/amigaone/setup.c
@@ -65,6 +65,12 @@ static int __init amigaone_add_bridge(struct device_node 
*dev)
 }
 
 void __init amigaone_setup_arch(void)
+{
+   if (ppc_md.progress)
+   ppc_md.progress("Linux/PPC "UTS_RELEASE"\n", 0);
+}
+
+void __init amigaone_discover_phbs(void)
 {
struct device_node *np;
int phb = -ENODEV;
@@ -74,9 +80,6 @@ void __init amigaone_setup_arch(void)
phb = amigaone_add_bridge(np);
 
BUG_ON(phb != 0);
-
-   if (ppc_md.progress)
-   ppc_md.progress("Linux/PPC "UTS_RELEASE"\n", 0);
 }
 
 void __init amigaone_init_IRQ(void)
@@ -159,6 +162,7 @@ define_machine(amigaone) {
.name   = "AmigaOne",
.probe  = amigaone_probe,
.setup_arch = amigaone_setup_arch,
+   .discover_phbs  = amigaone_discover_phbs,
.show_cpuinfo   = amigaone_show_cpuinfo,
.init_IRQ   = amigaone_init_IRQ,
.restart= amigaone_restart,
-- 
2.26.2



[RFC PATCH 10/18] powerpc/83xx: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with mpc83xx_defconfig
---
 arch/powerpc/platforms/83xx/asp834x.c | 1 +
 arch/powerpc/platforms/83xx/km83xx.c  | 1 +
 arch/powerpc/platforms/83xx/misc.c| 2 --
 arch/powerpc/platforms/83xx/mpc830x_rdb.c | 1 +
 arch/powerpc/platforms/83xx/mpc831x_rdb.c | 1 +
 arch/powerpc/platforms/83xx/mpc832x_mds.c | 1 +
 arch/powerpc/platforms/83xx/mpc832x_rdb.c | 1 +
 arch/powerpc/platforms/83xx/mpc834x_itx.c | 1 +
 arch/powerpc/platforms/83xx/mpc834x_mds.c | 1 +
 arch/powerpc/platforms/83xx/mpc836x_mds.c | 1 +
 arch/powerpc/platforms/83xx/mpc836x_rdk.c | 1 +
 arch/powerpc/platforms/83xx/mpc837x_mds.c | 1 +
 arch/powerpc/platforms/83xx/mpc837x_rdb.c | 1 +
 13 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/83xx/asp834x.c 
b/arch/powerpc/platforms/83xx/asp834x.c
index 28474876f41b..68061c2a57c1 100644
--- a/arch/powerpc/platforms/83xx/asp834x.c
+++ b/arch/powerpc/platforms/83xx/asp834x.c
@@ -44,6 +44,7 @@ define_machine(asp834x) {
.name   = "ASP8347E",
.probe  = asp834x_probe,
.setup_arch = asp834x_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/km83xx.c 
b/arch/powerpc/platforms/83xx/km83xx.c
index bcdc2c203ec9..108e1e4d2683 100644
--- a/arch/powerpc/platforms/83xx/km83xx.c
+++ b/arch/powerpc/platforms/83xx/km83xx.c
@@ -180,6 +180,7 @@ define_machine(mpc83xx_km) {
.name   = "mpc83xx-km-platform",
.probe  = mpc83xx_km_probe,
.setup_arch = mpc83xx_km_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/misc.c 
b/arch/powerpc/platforms/83xx/misc.c
index a952e91db3ee..3285dabcf923 100644
--- a/arch/powerpc/platforms/83xx/misc.c
+++ b/arch/powerpc/platforms/83xx/misc.c
@@ -132,8 +132,6 @@ void __init mpc83xx_setup_arch(void)
setbat(-1, va, immrbase, immrsize, PAGE_KERNEL_NCG);
update_bats();
}
-
-   mpc83xx_setup_pci();
 }
 
 int machine_check_83xx(struct pt_regs *regs)
diff --git a/arch/powerpc/platforms/83xx/mpc830x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc830x_rdb.c
index 51426e88ec67..956d4389effa 100644
--- a/arch/powerpc/platforms/83xx/mpc830x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc830x_rdb.c
@@ -48,6 +48,7 @@ define_machine(mpc830x_rdb) {
.name   = "MPC830x RDB",
.probe  = mpc830x_rdb_probe,
.setup_arch = mpc830x_rdb_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/mpc831x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc831x_rdb.c
index 5ccd57a48492..3b578f080e3b 100644
--- a/arch/powerpc/platforms/83xx/mpc831x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc831x_rdb.c
@@ -48,6 +48,7 @@ define_machine(mpc831x_rdb) {
.name   = "MPC831x RDB",
.probe  = mpc831x_rdb_probe,
.setup_arch = mpc831x_rdb_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/mpc832x_mds.c 
b/arch/powerpc/platforms/83xx/mpc832x_mds.c
index 6fa5402ebf20..850d566ef900 100644
--- a/arch/powerpc/platforms/83xx/mpc832x_mds.c
+++ b/arch/powerpc/platforms/83xx/mpc832x_mds.c
@@ -101,6 +101,7 @@ define_machine(mpc832x_mds) {
.name   = "MPC832x MDS",
.probe  = mpc832x_sys_probe,
.setup_arch = mpc832x_sys_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restart,
diff --git a/arch/powerpc/platforms/83xx/mpc832x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc832x_rdb.c
index 622c625d5ce4..b6133a237a70 100644
--- a/arch/powerpc/platforms/83xx/mpc832x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc832x_rdb.c
@@ -219,6 +219,7 @@ define_machine(mpc832x_rdb) {
.name   = "MPC832x RDB",
.probe  = mpc832x_rdb_probe,
.setup_arch = mpc832x_rdb_setup_arch,
+   .discover_phbs  = mpc83xx_setup_pci,
.init_IRQ   = mpc83xx_ipic_init_IRQ,
.get_irq= ipic_get_irq,
.restart= mpc83xx_restar

[RFC PATCH 09/18] powerpc/82xx/*: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with pq2fads_defconfig
---
 arch/powerpc/platforms/82xx/mpc8272_ads.c | 2 +-
 arch/powerpc/platforms/82xx/pq2fads.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/82xx/mpc8272_ads.c 
b/arch/powerpc/platforms/82xx/mpc8272_ads.c
index 3fe1a6593280..0b5b9dec16d5 100644
--- a/arch/powerpc/platforms/82xx/mpc8272_ads.c
+++ b/arch/powerpc/platforms/82xx/mpc8272_ads.c
@@ -171,7 +171,6 @@ static void __init mpc8272_ads_setup_arch(void)
iounmap(bcsr);
 
init_ioports();
-   pq2_init_pci();
 
if (ppc_md.progress)
ppc_md.progress("mpc8272_ads_setup_arch(), finish", 0);
@@ -205,6 +204,7 @@ define_machine(mpc8272_ads)
.name = "Freescale MPC8272 ADS",
.probe = mpc8272_ads_probe,
.setup_arch = mpc8272_ads_setup_arch,
+   .discover_phbs = pq2_init_pci,
.init_IRQ = mpc8272_ads_pic_init,
.get_irq = cpm2_get_irq,
.calibrate_decr = generic_calibrate_decr,
diff --git a/arch/powerpc/platforms/82xx/pq2fads.c 
b/arch/powerpc/platforms/82xx/pq2fads.c
index a74082140718..ac9113d524af 100644
--- a/arch/powerpc/platforms/82xx/pq2fads.c
+++ b/arch/powerpc/platforms/82xx/pq2fads.c
@@ -150,8 +150,6 @@ static void __init pq2fads_setup_arch(void)
/* Enable external IRQs */
clrbits32(_immr->im_siu_conf.siu_82xx.sc_siumcr, 0x0c00);
 
-   pq2_init_pci();
-
if (ppc_md.progress)
ppc_md.progress("pq2fads_setup_arch(), finish", 0);
 }
@@ -184,6 +182,7 @@ define_machine(pq2fads)
.name = "Freescale PQ2FADS",
.probe = pq2fads_probe,
.setup_arch = pq2fads_setup_arch,
+   .discover_phbs = pq2_init_pci,
.init_IRQ = pq2fads_pic_init,
.get_irq = cpm2_get_irq,
.calibrate_decr = generic_calibrate_decr,
-- 
2.26.2



[RFC PATCH 08/18] powerpc/52xx/mpc5200_simple: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/52xx/mpc5200_simple.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/mpc5200_simple.c 
b/arch/powerpc/platforms/52xx/mpc5200_simple.c
index 2d01e9b2e779..b9f5675b0a1d 100644
--- a/arch/powerpc/platforms/52xx/mpc5200_simple.c
+++ b/arch/powerpc/platforms/52xx/mpc5200_simple.c
@@ -40,8 +40,6 @@ static void __init mpc5200_simple_setup_arch(void)
 
/* Some mpc5200 & mpc5200b related configuration */
mpc5200_setup_xlb_arbiter();
-
-   mpc52xx_setup_pci();
 }
 
 /* list of the supported boards */
@@ -73,6 +71,7 @@ define_machine(mpc5200_simple_platform) {
.name   = "mpc5200-simple-platform",
.probe  = mpc5200_simple_probe,
.setup_arch = mpc5200_simple_setup_arch,
+   .discover_phbs  = mpc52xx_setup_pci,
.init   = mpc52xx_declare_of_platform_devices,
.init_IRQ   = mpc52xx_init_irq,
.get_irq= mpc52xx_get_irq,
-- 
2.26.2



[RFC PATCH 07/18] powerpc/52xx/media5200: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/52xx/media5200.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/media5200.c 
b/arch/powerpc/platforms/52xx/media5200.c
index 07c5bc4ed0b5..efb8bdecbcc7 100644
--- a/arch/powerpc/platforms/52xx/media5200.c
+++ b/arch/powerpc/platforms/52xx/media5200.c
@@ -202,8 +202,6 @@ static void __init media5200_setup_arch(void)
/* Some mpc5200 & mpc5200b related configuration */
mpc5200_setup_xlb_arbiter();
 
-   mpc52xx_setup_pci();
-
np = of_find_matching_node(NULL, mpc5200_gpio_ids);
gpio = of_iomap(np, 0);
of_node_put(np);
@@ -244,6 +242,7 @@ define_machine(media5200_platform) {
.name   = "media5200-platform",
.probe  = media5200_probe,
.setup_arch = media5200_setup_arch,
+   .discover_phbs  = mpc52xx_setup_pci,
.init   = mpc52xx_declare_of_platform_devices,
.init_IRQ   = media5200_init_irq,
.get_irq= mpc52xx_get_irq,
-- 
2.26.2



[RFC PATCH 06/18] powerpc/52xx/lite5200: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with 52xx/lite5200b_defconfig
---
 arch/powerpc/platforms/52xx/lite5200.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/lite5200.c 
b/arch/powerpc/platforms/52xx/lite5200.c
index 3181aac08225..04cc97397095 100644
--- a/arch/powerpc/platforms/52xx/lite5200.c
+++ b/arch/powerpc/platforms/52xx/lite5200.c
@@ -165,8 +165,6 @@ static void __init lite5200_setup_arch(void)
mpc52xx_suspend.board_resume_finish = lite5200_resume_finish;
lite5200_pm_init();
 #endif
-
-   mpc52xx_setup_pci();
 }
 
 static const char * const board[] __initconst = {
@@ -187,6 +185,7 @@ define_machine(lite5200) {
.name   = "lite5200",
.probe  = lite5200_probe,
.setup_arch = lite5200_setup_arch,
+   .discover_phbs  = mpc52xx_setup_pci,
.init   = mpc52xx_declare_of_platform_devices,
.init_IRQ   = mpc52xx_init_irq,
.get_irq= mpc52xx_get_irq,
-- 
2.26.2



[RFC PATCH 05/18] powerpc/52xx/efika: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
compile tested with mpc5200_defconfig
---
 arch/powerpc/platforms/52xx/efika.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/efika.c 
b/arch/powerpc/platforms/52xx/efika.c
index 4514a6f7458a..3b7d70d71692 100644
--- a/arch/powerpc/platforms/52xx/efika.c
+++ b/arch/powerpc/platforms/52xx/efika.c
@@ -185,8 +185,6 @@ static void __init efika_setup_arch(void)
/* Map important registers from the internal memory map */
mpc52xx_map_common_devices();
 
-   efika_pcisetup();
-
 #ifdef CONFIG_PM
mpc52xx_suspend.board_suspend_prepare = efika_suspend_prepare;
mpc52xx_pm_init();
@@ -218,6 +216,7 @@ define_machine(efika)
.name   = EFIKA_PLATFORM_NAME,
.probe  = efika_probe,
.setup_arch = efika_setup_arch,
+   .discover_phbs  = efika_pcisetup,
.init   = mpc52xx_declare_of_platform_devices,
.show_cpuinfo   = efika_show_cpuinfo,
.init_IRQ   = mpc52xx_init_irq,
-- 
2.26.2



[RFC PATCH 04/18] powerpc/512x: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
only compile tested
---
 arch/powerpc/platforms/512x/mpc5121_ads.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/512x/mpc5121_ads.c 
b/arch/powerpc/platforms/512x/mpc5121_ads.c
index 6303fbfc4e4f..9d030c2e0004 100644
--- a/arch/powerpc/platforms/512x/mpc5121_ads.c
+++ b/arch/powerpc/platforms/512x/mpc5121_ads.c
@@ -24,21 +24,23 @@
 
 static void __init mpc5121_ads_setup_arch(void)
 {
-#ifdef CONFIG_PCI
-   struct device_node *np;
-#endif
printk(KERN_INFO "MPC5121 ADS board from Freescale Semiconductor\n");
/*
 * cpld regs are needed early
 */
mpc5121_ads_cpld_map();
 
+   mpc512x_setup_arch();
+}
+
+static void __init mpc5121_ads_setup_pci(void)
+{
 #ifdef CONFIG_PCI
+   struct device_node *np;
+
for_each_compatible_node(np, "pci", "fsl,mpc5121-pci")
mpc83xx_add_bridge(np);
 #endif
-
-   mpc512x_setup_arch();
 }
 
 static void __init mpc5121_ads_init_IRQ(void)
@@ -64,6 +66,7 @@ define_machine(mpc5121_ads) {
.name   = "MPC5121 ADS",
.probe  = mpc5121_ads_probe,
.setup_arch = mpc5121_ads_setup_arch,
+   .discover_phbs  = mpc5121_ads_setup_pci,
.init   = mpc512x_init,
.init_IRQ   = mpc5121_ads_init_IRQ,
.get_irq= ipic_get_irq,
-- 
2.26.2



[RFC PATCH 03/18] powerpc/maple: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/maple/setup.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index f7e66a2005b4..4e9ad5bf3efb 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -179,9 +179,6 @@ static void __init maple_setup_arch(void)
 #ifdef CONFIG_SMP
smp_ops = _smp_ops;
 #endif
-   /* Lookup PCI hosts */
-   maple_pci_init();
-
maple_use_rtas_reboot_and_halt_if_present();
 
printk(KERN_DEBUG "Using native/NAP idle loop\n");
@@ -351,6 +348,7 @@ define_machine(maple) {
.name   = "Maple",
.probe  = maple_probe,
.setup_arch = maple_setup_arch,
+   .discover_phbs  = maple_pci_init,
.init_IRQ   = maple_init_IRQ,
.pci_irq_fixup  = maple_pci_irq_fixup,
.pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-- 
2.26.2



[RFC PATCH 02/18] powerpc/{powernv,pseries}: Move PHB discovery

2020-09-24 Thread Oliver O'Halloran
Make powernv and pseries use ppc_mc.discover_phbs. These two platforms need
to be done together because they both depends on pci_dn's being created
from the DT. The pci_dn contains a pointer to the relevant pci_controller
so they need to be created after the pci_controller structures are
available, but before  and before PCI devices are scanned. Currently this
ordering is provided by initcalls and the sequence is:

1. PHBs are discovered (setup_arch) (early boot, pre-initcalls)
2. pci_dn are created from the unflattended DT (core initcall)
3. PHBs are scanned pcibios_init() (subsys initcall)

The new ppc_md.discover_phbs() function is also a core_initcall so we can't
guarantee ordering between the creations of pci_controllers and the
creation of pci_dn's which require a pci_controller. We could use the
postcore, or core_sync initcall levels, but it's cleaner to just move the
pci_dn setup into the per-PHB inits which occur inside of .discover_phb()
for these platforms. This brings the boot-time path in line with the PHB
hotplug path that is used for pseries DLPAR operations too.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/pci_dn.c  | 22 --
 arch/powerpc/platforms/powernv/pci-ioda.c |  3 +++
 arch/powerpc/platforms/powernv/setup.c|  4 +---
 arch/powerpc/platforms/pseries/setup.c|  7 +--
 4 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 54e240597fd9..61571ae23953 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -481,28 +481,6 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
pci_traverse_device_nodes(dn, add_pdn, phb);
 }
 
-/** 
- * pci_devs_phb_init - Initialize phbs and pci devs under them.
- * 
- * This routine walks over all phb's (pci-host bridges) on the
- * system, and sets up assorted pci-related structures 
- * (including pci info in the device node structs) for each
- * pci device found underneath.  This routine runs once,
- * early in the boot sequence.
- */
-static int __init pci_devs_phb_init(void)
-{
-   struct pci_controller *phb, *tmp;
-
-   /* This must be done first so the device nodes have valid pci info! */
-   list_for_each_entry_safe(phb, tmp, _list, list_node)
-   pci_devs_phb_init_dynamic(phb);
-
-   return 0;
-}
-
-core_initcall(pci_devs_phb_init);
-
 static void pci_dev_pdn_setup(struct pci_dev *pdev)
 {
struct pci_dn *pdn;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 023a4f987bb2..987654a08e2e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3184,6 +3184,9 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
/* Remove M64 resource if we can't configure it successfully */
if (!phb->init_m64 || phb->init_m64(phb))
hose->mem_resources[1].flags = 0;
+
+   /* create pci_dn's for DT nodes under this PHB */
+   pci_devs_phb_init_dynamic(hose);
 }
 
 void __init pnv_pci_init_ioda2_phb(struct device_node *np)
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 7fcb88623081..a4d6d41b1155 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -140,9 +140,6 @@ static void __init pnv_setup_arch(void)
/* Initialize SMP */
pnv_smp_init();
 
-   /* Setup PCI */
-   pnv_pci_init();
-
/* Setup RTC and NVRAM callbacks */
if (firmware_has_feature(FW_FEATURE_OPAL))
opal_nvram_init();
@@ -500,6 +497,7 @@ define_machine(powernv) {
.init_IRQ   = pnv_init_IRQ,
.show_cpuinfo   = pnv_show_cpuinfo,
.get_proc_freq  = pnv_get_proc_freq,
+   .discover_phbs  = pnv_pci_init,
.progress   = pnv_progress,
.machine_shutdown   = pnv_shutdown,
.power_save = NULL,
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 2f4ee0a90284..d0bf487fadf5 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -463,7 +463,7 @@ void pseries_little_endian_exceptions(void)
 }
 #endif
 
-static void __init find_and_init_phbs(void)
+static void __init pSeries_discover_phbs(void)
 {
struct device_node *node;
struct pci_controller *phb;
@@ -481,6 +481,9 @@ static void __init find_and_init_phbs(void)
pci_process_bridge_OF_ranges(phb, node, 0);
isa_bridge_find_early(phb);
phb->controller_ops = pseries_pci_controller_ops;
+
+   /* create pci_dn's for DT nodes under this PHB */
+   pci_devs_phb_init_dynamic(phb);
}
 
of_node_put(root);
@@ -771,7 +774,6 @@ static void __init pSeries_set

[RFC PATCH 01/18] powerpc/pci: Add ppc_md.discover_phbs()

2020-09-24 Thread Oliver O'Halloran
On many powerpc platforms the discovery and initalisation of
pci_controllers (PHBs) happens inside of setup_arch(). This is very early
in boot (pre-initcalls) and means that we're initialising the PHB long
before many basic kernel services (slab allocator, debugfs, a real ioremap)
are available.

On PowerNV this causes an additional problem since we map the PHB registers
with ioremap(). As of commit d538aadc2718 ("powerpc/ioremap: warn on early
use of ioremap()") a warning is printed because we're using the "incorrect"
API to setup and MMIO mapping in searly boot. The kernel does provide
early_ioremap(), but that is not intended to create long-lived MMIO
mappings and a seperate warning is printed by generic code if
early_ioremap() mappings are "leaked."

This is all fixable with dumb hacks like using early_ioremap() to setup
the initial mapping then replacing it with a real ioremap later on in
boot, but it does raise the question: Why the hell are we setting up the
PHB's this early in boot?

The old and wise claim it's due to "hysterical rasins." Aside from amused
grapes there doesn't appear to be any real reason to maintain the current
behaviour. Already most of the newer embedded platforms perform PHB
discovery in an arch_initcall and between the end of setup_arch() and the
start of initcalls none of the generic kernel code does anything PCI
related. On powerpc scanning PHBs occurs in a subsys_initcall so it should
be possible to move the PHB discovery to a core, postcore or arch initcall.

This patch adds the ppc_md.discover_phbs hook and a core_initcall stub that
calls it. The core_initcalls are the earliest to be called so this will
any possibly issues with dependency between initcalls. This isn't just an
academic issue either since on pseries and PowerNV EEH init occurs in an
arch_initcall and depends on the pci_controllers being available, similarly
the creation of pci_dns occurs at core_initcall_sync (i.e. between core and
postcore initcalls). These problems need to be addressed seperately.

Cc: Paul Mackerras 
Cc: Christophe Leroy 
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   | 10 ++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index a90b892f0bfe..b732dd5722aa 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -59,6 +59,9 @@ struct machdep_calls {
int (*pcibios_root_bridge_prepare)(struct pci_host_bridge
*bridge);
 
+   /* finds all the pci_controllers present at boot */
+   void(*discover_phbs)(void);
+
/* To setup PHBs when using automatic OF platform driver for PCI */
int (*pci_setup_phb)(struct pci_controller *host);
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index be108616a721..6265e7d1c697 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1625,3 +1625,13 @@ static void fixup_hide_host_resource_fsl(struct pci_dev 
*dev)
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MOTOROLA, PCI_ANY_ID, 
fixup_hide_host_resource_fsl);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, PCI_ANY_ID, 
fixup_hide_host_resource_fsl);
+
+
+int __init discover_phbs(void)
+{
+   if (ppc_md.discover_phbs)
+   ppc_md.discover_phbs();
+
+   return 0;
+}
+core_initcall(discover_phbs);
-- 
2.26.2



[PATCH v2 9/9] powerpc/eeh: Clean up PE addressing

2020-09-18 Thread Oliver O'Halloran
onfig_addr is invalid so there
is no situation where eeh_pe_get() will search for a PE based on the 3rd
argument. The check also means that we'll never insert a PE into the tree
where pe_config_addr is zero since EEH_VALID_PE_ZERO is never set on
pseries. All the users of the fallback address on pseries never actually
use the fallback and all the only caller that supplies something for the
config_addr argument to eeh_pe_get() never use it either. It's all dead
code.

This patch removes the fallback address from eeh_pe since nothing uses it.
Specificly, we do this by:

1) Removing pe->config_addr
2) Removing the EEH_VALID_PE_ZERO flag
3) Removing the fallback address argument to eeh_pe_get().
4) Removing all the checks for pe->addr being zero in the pseries EEH code.

This leaves us with PE's only being identified by what's in their pe->addr
field and the EEH core relying on the platform to ensure that eeh_dev's are
only inserted into the EEH tree if they're actually inside a PE.

No functional changes, I hope.

Signed-off-by: Oliver O'Halloran 
---
v2: re-wrote commit message
---
 arch/powerpc/include/asm/eeh.h   |  4 +-
 arch/powerpc/kernel/eeh.c|  2 +-
 arch/powerpc/kernel/eeh_pe.c | 46 +++-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 16 ++-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 42 +++---
 5 files changed, 17 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 85030c05e67e..dd6a4ac6c713 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -27,7 +27,6 @@ struct pci_dn;
 #define EEH_FORCE_DISABLED 0x02/* EEH disabled  */
 #define EEH_PROBE_MODE_DEV 0x04/* From PCI device   */
 #define EEH_PROBE_MODE_DEVTREE 0x08/* From device tree  */
-#define EEH_VALID_PE_ZERO  0x10/* PE#0 is valid */
 #define EEH_ENABLE_IO_FOR_LOG  0x20/* Enable IO for log */
 #define EEH_EARLY_DUMP_LOG 0x40/* Dump log immediately  */
 
@@ -280,8 +279,7 @@ int eeh_phb_pe_create(struct pci_controller *phb);
 int eeh_wait_state(struct eeh_pe *pe, int max_wait);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
 struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root);
-struct eeh_pe *eeh_pe_get(struct pci_controller *phb,
- int pe_no, int config_addr);
+struct eeh_pe *eeh_pe_get(struct pci_controller *phb, int pe_no);
 int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent);
 int eeh_pe_tree_remove(struct eeh_dev *edev);
 void eeh_pe_update_time_stamp(struct eeh_pe *pe);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c9e25cfce8f0..87de8b798b2d 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1657,7 +1657,7 @@ static ssize_t eeh_force_recover_write(struct file *filp,
return -ENODEV;
 
/* Retrieve PE */
-   pe = eeh_pe_get(hose, pe_no, 0);
+   pe = eeh_pe_get(hose, pe_no);
if (!pe)
return -ENODEV;
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index d273fdd5..61b7d4019051 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -251,43 +251,21 @@ void eeh_pe_dev_traverse(struct eeh_pe *root,
 
 /**
  * __eeh_pe_get - Check the PE address
- * @data: EEH PE
- * @flag: EEH device
  *
  * For one particular PE, it can be identified by PE address
  * or tranditional BDF address. BDF address is composed of
  * Bus/Device/Function number. The extra data referred by flag
  * indicates which type of address should be used.
  */
-struct eeh_pe_get_flag {
-   int pe_no;
-   int config_addr;
-};
-
 static void *__eeh_pe_get(struct eeh_pe *pe, void *flag)
 {
-   struct eeh_pe_get_flag *tmp = (struct eeh_pe_get_flag *) flag;
+   int *target_pe = flag;
 
-   /* Unexpected PHB PE */
+   /* PHB PEs are special and should be ignored */
if (pe->type & EEH_PE_PHB)
return NULL;
 
-   /*
-* We prefer PE address. For most cases, we should
-* have non-zero PE address
-*/
-   if (eeh_has_flag(EEH_VALID_PE_ZERO)) {
-   if (tmp->pe_no == pe->addr)
-   return pe;
-   } else {
-   if (tmp->pe_no &&
-   (tmp->pe_no == pe->addr))
-   return pe;
-   }
-
-   /* Try BDF address */
-   if (tmp->config_addr &&
-  (tmp->config_addr == pe->config_addr))
+   if (*target_pe == pe->addr)
return pe;
 
return NULL;
@@ -297,7 +275,6 @@ static void *__eeh_pe_get(struct eeh_pe *pe, void *flag)
  * eeh_pe_get - Search PE based on the given address
  * @phb: PCI controller

[PATCH v2 8/9] powerpc/pseries/eeh: Allow zero to be a valid PE configuration address

2020-09-18 Thread Oliver O'Halloran
There's no real reason why zero can't be a valid PE configuration address.
Under qemu each sPAPR PHB (i.e. EEH supporting) has the passed-though
devices on bus zero, so the PE address of bus :00 should be zero.

However, all previous versions of Linux will reject that, so Qemu at least
goes out of it's way to avoid it. The Qemu implementation of
ibm,get-config-addr-info2 RTAS has the following comment:

> /*
>  * We always have PE address of form "00BB0001". "BB"
>  * represents the bus number of PE's primary bus.
>  */

So qemu puts a one into the register portion of the PE's config_addr to
avoid it being zero. The whole is pretty silly considering that RTAS will
return a negative error code if it can't map the device's config_addr to a
PE.

This patch fixes Linux to treat zero as a valid PE address. This shouldn't
have any real effects due to the Qemu hack mentioned above. And the fact
that Linux EEH has worked historically on PowerVM means they never pass
through devices on bus zero so we would never see the problem there either.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 38 +++-
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index c2ecc0db2f94..e42c026392aa 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -87,21 +87,20 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
  * pseries_eeh_get_pe_config_addr - Find the pe_config_addr for a device
  * @pdn: pci_dn of the input device
  *
- * Retrieve the assocated config address. Actually, there're 2 RTAS
- * function calls dedicated for the purpose. We need implement
- * it through the new function and then the old one. Besides,
- * you should make sure the config address is figured out from
- * FDT node before calling the function.
+ * The EEH RTAS calls use a tuple consisting of: (buid_hi, buid_lo,
+ * pe_config_addr) as a handle to a given PE. This function finds the
+ * pe_config_addr based on the device's config addr.
  *
- * It's notable that zero'ed return value means invalid PE config
- * address.
+ * Keep in mind that the pe_config_addr *might* be numerically identical to the
+ * device's config addr, but the two are conceptually distinct.
+ *
+ * Returns the pe_config_addr, or a negative error code.
  */
 static int pseries_eeh_get_pe_config_addr(struct pci_dn *pdn)
 {
int config_addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
struct pci_controller *phb = pdn->phb;
-   int ret = 0;
-   int rets[3];
+   int ret, rets[3];
 
if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
/*
@@ -112,16 +111,16 @@ static int pseries_eeh_get_pe_config_addr(struct pci_dn 
*pdn)
config_addr, BUID_HI(phb->buid),
BUID_LO(phb->buid), 1);
if (ret || (rets[0] == 0))
-   return 0;
+   return -ENOENT;
 
-   /* Retrieve the associated PE config address */
+   /* Retrieve the associated PE config address with function 0 */
ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
config_addr, BUID_HI(phb->buid),
BUID_LO(phb->buid), 0);
if (ret) {
pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
__func__, phb->global_number, config_addr);
-   return 0;
+   return -ENXIO;
}
 
return rets[0];
@@ -134,13 +133,20 @@ static int pseries_eeh_get_pe_config_addr(struct pci_dn 
*pdn)
if (ret) {
pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
__func__, phb->global_number, config_addr);
-   return 0;
+   return -ENXIO;
}
 
return rets[0];
}
 
-   return ret;
+   /*
+* PAPR does describe a process for finding the pe_config_addr that was
+* used before the ibm,get-config-addr-info calls were added. However,
+* I haven't found *any* systems that don't have that RTAS call
+* implemented. If you happen to find one that needs the old DT based
+* process, patches are welcome!
+*/
+   return -ENOENT;
 }
 
 /**
@@ -419,7 +425,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
 
/* first up, find the pe_config_addr for the PE containing the device */
addr = pseries_eeh_get_pe_config_addr(pdn);
-   if (addr == 0) {
+   if (addr < 0) {
eeh_edev_dbg(edev, "Unable to find pe_config_addr\n")

[PATCH v2 7/9] powerpc/pseries/eeh: Rework device EEH PE determination

2020-09-18 Thread Oliver O'Halloran
The process Linux uses for determining if a device supports EEH or not
appears to be at odds with what PAPR says the OS should be doing. The
current flow is something like:

1. Assume pe_config_addr is equal the the device's config_addr.
2. Attempt to enable EEH on that PE
3. Verify EEH was enabled (POWER4 bug workaround)
4. Try find the pe_config_addr using the ibm,get-config-addr-info2 RTAS
   call.
5. If that fails walk the pci_dn tree upwards trying to find a parent
   device with EEH support. If we find one then add the device to that PE.

The first major problem with this process is that we need the PE config
address in step 2) since its needs to be passed to the ibm,set-eeh-option
RTAS call when enabling EEH for th PE. We hack around this requirement in
by making the assumption in 1) and delay finding the actual PE address
until 4). This is fine if:

a) The PCI device is the 0th function, and
b) The device is on the PE's root bus.

Granted, the current sequence does appear to work on most systems even when
these conditions are false. At a guess PowerVM's RTAS has workarounds to
accommodate Linux's quirks or the RTAS call to enable EEH is treated as
no-op on most platforms since EEH is usually enabled by default. However,
what is currently implemented is a bit sketch and is downright confusing
since it doesn't match up with what what PAPR suggests we should be doing.

This patch re-works how we handle EEH init so that we find the PE config
address using the ibm,get-config-addr-info2 RTAS call first, then use the
found address to finish the EEH init process. It also drops the Power4
workaround since as of commit 471d7ff8b51b ("powerpc/64s: Remove POWER4
support") the kernel does not support running on a Power4 CPU so there's
no need to support the Power4 platform's quirks either. With the patch
applied the sequence is now:

1. Find the pe_config_addr from the device using the RTAS call.
2. Enable the PE.
3. Insert the edev into the tree and create an eeh_pe if needed.

The other change made here is ignoring unsupported devices entirely.
Currently the device's BARs are saved to the eeh_dev even if the device is
not part of an EEH PE. Not being part of a PE means that an EEH recovery
pass will never see that device so the saving the BARs is pointless.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 57 
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 10303de3d8d5..c2ecc0db2f94 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -357,10 +357,10 @@ static struct eeh_pe *pseries_eeh_pe_get_parent(struct 
eeh_dev *edev)
  */
 void pseries_eeh_init_edev(struct pci_dn *pdn)
 {
+   struct eeh_pe pe, *parent;
struct eeh_dev *edev;
-   struct eeh_pe pe;
+   int addr;
u32 pcie_flags;
-   int enable = 0;
int ret;
 
if (WARN_ON_ONCE(!eeh_has_flag(EEH_PROBE_MODE_DEVTREE)))
@@ -417,51 +417,38 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
}
}
 
-   /* Initialize the fake PE */
+   /* first up, find the pe_config_addr for the PE containing the device */
+   addr = pseries_eeh_get_pe_config_addr(pdn);
+   if (addr == 0) {
+   eeh_edev_dbg(edev, "Unable to find pe_config_addr\n");
+   goto err;
+   }
+
+   /* Try enable EEH on the fake PE */
memset(, 0, sizeof(struct eeh_pe));
pe.phb = pdn->phb;
-   pe.config_addr = (pdn->busno << 16) | (pdn->devfn << 8);
+   pe.addr = addr;
 
-   /* Enable EEH on the device */
eeh_edev_dbg(edev, "Enabling EEH on device\n");
ret = eeh_ops->set_option(, EEH_OPT_ENABLE);
if (ret) {
eeh_edev_dbg(edev, "EEH failed to enable on device (code 
%d)\n", ret);
-   } else {
-   struct eeh_pe *parent;
+   goto err;
+   }
 
-   /* Retrieve PE address */
-   edev->pe_config_addr = pseries_eeh_get_pe_config_addr(pdn);
-   pe.addr = edev->pe_config_addr;
+   edev->pe_config_addr = addr;
 
-   /* Some older systems (Power4) allow the ibm,set-eeh-option
-* call to succeed even on nodes where EEH is not supported.
-* Verify support explicitly.
-*/
-   ret = eeh_ops->get_state(, NULL);
-   if (ret > 0 && ret != EEH_STATE_NOT_SUPPORT)
-   enable = 1;
+   eeh_add_flag(EEH_ENABLED);
 
-   /*
-* This device doesn't support EEH, but it may have an
-* EEH parent. In this case any error on the device will
-* freeze the PE of it's upstream bridge, so add

[PATCH v2 6/9] powerpc/pseries/eeh: Clean up pe_config_addr lookups

2020-09-18 Thread Oliver O'Halloran
De-duplicate, and fix up the comments, and make the prototype just take a
pci_dn since the job of the function is to return the pe_config_addr of the
PE which contains a given device.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 80 +++-
 1 file changed, 11 insertions(+), 69 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index b1561961c7ff..10303de3d8d5 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -33,8 +33,6 @@
 #include 
 #include 
 
-static int pseries_eeh_get_pe_addr(struct pci_dn *pdn);
-
 /* RTAS tokens */
 static int ibm_set_eeh_option;
 static int ibm_set_slot_reset;
@@ -86,7 +84,8 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 
 
 /**
- * pseries_eeh_get_config_addr - Retrieve config address
+ * pseries_eeh_get_pe_config_addr - Find the pe_config_addr for a device
+ * @pdn: pci_dn of the input device
  *
  * Retrieve the assocated config address. Actually, there're 2 RTAS
  * function calls dedicated for the purpose. We need implement
@@ -97,16 +96,17 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
  * It's notable that zero'ed return value means invalid PE config
  * address.
  */
-static int pseries_eeh_get_config_addr(struct pci_controller *phb, int 
config_addr)
+static int pseries_eeh_get_pe_config_addr(struct pci_dn *pdn)
 {
+   int config_addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
+   struct pci_controller *phb = pdn->phb;
int ret = 0;
int rets[3];
 
if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
/*
-* First of all, we need to make sure there has one PE
-* associated with the device. Otherwise, PE address is
-* meaningless.
+* First of all, use function 1 to determine if this device is
+* part of a PE or not. ret[0] being zero indicates it's not.
 */
ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
config_addr, BUID_HI(phb->buid),
@@ -431,7 +431,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
struct eeh_pe *parent;
 
/* Retrieve PE address */
-   edev->pe_config_addr = pseries_eeh_get_pe_addr(pdn);
+   edev->pe_config_addr = pseries_eeh_get_pe_config_addr(pdn);
pe.addr = edev->pe_config_addr;
 
/* Some older systems (Power4) allow the ibm,set-eeh-option
@@ -551,64 +551,6 @@ static int pseries_eeh_set_option(struct eeh_pe *pe, int 
option)
return ret;
 }
 
-/**
- * pseries_eeh_get_pe_addr - Retrieve PE address
- * @pe: EEH PE
- *
- * Retrieve the assocated PE address. Actually, there're 2 RTAS
- * function calls dedicated for the purpose. We need implement
- * it through the new function and then the old one. Besides,
- * you should make sure the config address is figured out from
- * FDT node before calling the function.
- *
- * It's notable that zero'ed return value means invalid PE config
- * address.
- */
-static int pseries_eeh_get_pe_addr(struct pci_dn *pdn)
-{
-   int config_addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
-   unsigned long buid = pdn->phb->buid;
-   int ret = 0;
-   int rets[3];
-
-   if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
-   /*
-* First of all, we need to make sure there has one PE
-* associated with the device. Otherwise, PE address is
-* meaningless.
-*/
-   ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
-   config_addr, BUID_HI(buid), BUID_LO(buid), 1);
-   if (ret || (rets[0] == 0))
-   return 0;
-
-   /* Retrieve the associated PE config address */
-   ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
-   config_addr, BUID_HI(buid), BUID_LO(buid), 0);
-   if (ret) {
-   pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
-   __func__, pdn->phb->global_number, config_addr);
-   return 0;
-   }
-
-   return rets[0];
-   }
-
-   if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) {
-   ret = rtas_call(ibm_get_config_addr_info, 4, 2, rets,
-   config_addr, BUID_HI(buid), BUID_LO(buid), 0);
-   if (ret) {
-   pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
-   __func__, pdn->phb->global_number, config_addr);
-   return 0;
-   }
-
-   return rets[0];
- 

[PATCH v2 5/9] powerpc/eeh: Move EEH initialisation to an arch initcall

2020-09-18 Thread Oliver O'Halloran
The initialisation of EEH mostly happens in a core_initcall_sync initcall,
followed by registering a bus notifier later on in an arch_initcall.
Anything involving initcall dependecies is mostly incomprehensible unless
you've spent a while staring at code so here's the full sequence:

ppc_md.setup_arch   <-- pci_controllers are created here

...time passes...

core_initcall   <-- pci_dns are created from DT nodes
core_initcall_sync  <-- platforms call eeh_init()
postcore_initcall   <-- PCI bus type is registered
postcore_initcall_sync
arch_initcall   <-- EEH pci_bus notifier registered
subsys_initcall <-- PHBs are scanned here

There's no real requirement to do the EEH setup at the core_initcall_sync
level. It just needs to be done after pci_dn's are created and before we
start scanning PHBs. Simplify the flow a bit by moving the platform EEH
inititalisation to an arch_initcall so we can fold the bus notifier
registration into eeh_init().

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/eeh.c| 64 ++--
 arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c |  2 +-
 3 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 98faf139e676..c9e25cfce8f0 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -940,6 +940,30 @@ static struct notifier_block eeh_reboot_nb = {
.notifier_call = eeh_reboot_notifier,
 };
 
+static int eeh_device_notifier(struct notifier_block *nb,
+  unsigned long action, void *data)
+{
+   struct device *dev = data;
+
+   switch (action) {
+   /*
+* Note: It's not possible to perform EEH device addition (i.e.
+* {pseries,pnv}_pcibios_bus_add_device()) here because it depends on
+* the device's resources, which have not yet been set up.
+*/
+   case BUS_NOTIFY_DEL_DEVICE:
+   eeh_remove_device(to_pci_dev(dev));
+   break;
+   default:
+   break;
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block eeh_device_nb = {
+   .notifier_call = eeh_device_notifier,
+};
+
 /**
  * eeh_init - System wide EEH initialization
  *
@@ -960,7 +984,14 @@ int eeh_init(struct eeh_ops *ops)
/* Register reboot notifier */
ret = register_reboot_notifier(_reboot_nb);
if (ret) {
-   pr_warn("%s: Failed to register notifier (%d)\n",
+   pr_warn("%s: Failed to register reboot notifier (%d)\n",
+   __func__, ret);
+   return ret;
+   }
+
+   ret = bus_register_notifier(_bus_type, _device_nb);
+   if (ret) {
+   pr_warn("%s: Failed to register bus notifier (%d)\n",
__func__, ret);
return ret;
}
@@ -975,37 +1006,6 @@ int eeh_init(struct eeh_ops *ops)
return eeh_event_init();
 }
 
-static int eeh_device_notifier(struct notifier_block *nb,
-  unsigned long action, void *data)
-{
-   struct device *dev = data;
-
-   switch (action) {
-   /*
-* Note: It's not possible to perform EEH device addition (i.e.
-* {pseries,pnv}_pcibios_bus_add_device()) here because it depends on
-* the device's resources, which have not yet been set up.
-*/
-   case BUS_NOTIFY_DEL_DEVICE:
-   eeh_remove_device(to_pci_dev(dev));
-   break;
-   default:
-   break;
-   }
-   return NOTIFY_DONE;
-}
-
-static struct notifier_block eeh_device_nb = {
-   .notifier_call = eeh_device_notifier,
-};
-
-static __init int eeh_set_bus_notifier(void)
-{
-   bus_register_notifier(_bus_type, _device_nb);
-   return 0;
-}
-arch_initcall(eeh_set_bus_notifier);
-
 /**
  * eeh_probe_device() - Perform EEH initialization for the indicated pci device
  * @dev: pci device for which to set up EEH
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 8b0fb6c8b8d9..6d95e774bded 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1703,4 +1703,4 @@ static int __init eeh_powernv_init(void)
 
return ret;
 }
-machine_core_initcall_sync(powernv, eeh_powernv_init);
+machine_arch_initcall(powernv, eeh_powernv_init);
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index fd328632..b1561961c7ff 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -989,4 +989,4 @@ static int __init eeh_pseries_init(void)
ret);
return ret;
 }
-machine_core_initcall_sync(pseries, eeh_pseries_init);
+machine_arch_initcall(pseries, eeh_pseries_init);
-- 
2.26.2



[PATCH v2 4/9] powerpc/eeh: Delete eeh_ops->init

2020-09-18 Thread Oliver O'Halloran
No longer used since the platforms perform their EEH initialisation before
calling eeh_init().

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h | 1 -
 arch/powerpc/kernel/eeh.c  | 8 
 2 files changed, 9 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 765bcf63edea..85030c05e67e 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -216,7 +216,6 @@ enum {
 
 struct eeh_ops {
char *name;
-   int (*init)(void);
struct eeh_dev *(*probe)(struct pci_dev *pdev);
int (*set_option)(struct eeh_pe *pe, int option);
int (*get_state)(struct eeh_pe *pe, int *delay);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 28a0ea5d9faa..98faf139e676 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -965,14 +965,6 @@ int eeh_init(struct eeh_ops *ops)
return ret;
}
 
-   if (eeh_ops->init)
-   ret = eeh_ops->init();
-   if (ret) {
-   pr_warn("%s: platform EEH init failed (%d)\n",
-   __func__, ret);
-   return ret;
-   }
-
/* Initialize PHB PEs */
list_for_each_entry_safe(hose, tmp, _list, list_node)
eeh_phb_pe_create(hose);
-- 
2.26.2



[PATCH v2 3/9] powerpc/pseries: Stop using eeh_ops->init()

2020-09-18 Thread Oliver O'Halloran
Fold pseries_eeh_init() into eeh_pseries_init() rather than having
eeh_init() call it via eeh_ops->init(). It's simpler and it'll let us
delete eeh_ops.init.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 155 +--
 1 file changed, 71 insertions(+), 84 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 3cc569e8b6d4..fd328632 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -239,88 +239,6 @@ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
 static DEFINE_SPINLOCK(slot_errbuf_lock);
 static int eeh_error_buf_size;
 
-/**
- * pseries_eeh_init - EEH platform dependent initialization
- *
- * EEH platform dependent initialization on pseries.
- */
-static int pseries_eeh_init(void)
-{
-   struct pci_controller *phb;
-   struct pci_dn *pdn;
-   int addr, config_addr;
-
-   /* figure out EEH RTAS function call tokens */
-   ibm_set_eeh_option  = rtas_token("ibm,set-eeh-option");
-   ibm_set_slot_reset  = rtas_token("ibm,set-slot-reset");
-   ibm_read_slot_reset_state2  = 
rtas_token("ibm,read-slot-reset-state2");
-   ibm_read_slot_reset_state   = 
rtas_token("ibm,read-slot-reset-state");
-   ibm_slot_error_detail   = rtas_token("ibm,slot-error-detail");
-   ibm_get_config_addr_info2   = 
rtas_token("ibm,get-config-addr-info2");
-   ibm_get_config_addr_info= 
rtas_token("ibm,get-config-addr-info");
-   ibm_configure_pe= rtas_token("ibm,configure-pe");
-
-   /*
-* ibm,configure-pe and ibm,configure-bridge have the same semantics,
-* however ibm,configure-pe can be faster.  If we can't find
-* ibm,configure-pe then fall back to using ibm,configure-bridge.
-*/
-   if (ibm_configure_pe == RTAS_UNKNOWN_SERVICE)
-   ibm_configure_pe= rtas_token("ibm,configure-bridge");
-
-   /*
-* Necessary sanity check. We needn't check "get-config-addr-info"
-* and its variant since the old firmware probably support address
-* of domain/bus/slot/function for EEH RTAS operations.
-*/
-   if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE  ||
-   ibm_set_slot_reset == RTAS_UNKNOWN_SERVICE  ||
-   (ibm_read_slot_reset_state2 == RTAS_UNKNOWN_SERVICE &&
-ibm_read_slot_reset_state == RTAS_UNKNOWN_SERVICE) ||
-   ibm_slot_error_detail == RTAS_UNKNOWN_SERVICE   ||
-   ibm_configure_pe == RTAS_UNKNOWN_SERVICE) {
-   pr_info("EEH functionality not supported\n");
-   return -EINVAL;
-   }
-
-   /* Initialize error log lock and size */
-   spin_lock_init(_errbuf_lock);
-   eeh_error_buf_size = rtas_token("rtas-error-log-max");
-   if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
-   pr_info("%s: unknown EEH error log size\n",
-   __func__);
-   eeh_error_buf_size = 1024;
-   } else if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) {
-   pr_info("%s: EEH error log size %d exceeds the maximal %d\n",
-   __func__, eeh_error_buf_size, RTAS_ERROR_LOG_MAX);
-   eeh_error_buf_size = RTAS_ERROR_LOG_MAX;
-   }
-
-   /* Set EEH probe mode */
-   eeh_add_flag(EEH_PROBE_MODE_DEVTREE | EEH_ENABLE_IO_FOR_LOG);
-
-   /* Set EEH machine dependent code */
-   ppc_md.pcibios_bus_add_device = pseries_pcibios_bus_add_device;
-
-   if (is_kdump_kernel() || reset_devices) {
-   pr_info("Issue PHB reset ...\n");
-   list_for_each_entry(phb, _list, list_node) {
-   pdn = list_first_entry(_DN(phb->dn)->child_list, 
struct pci_dn, list);
-   addr = (pdn->busno << 16) | (pdn->devfn << 8);
-   config_addr = pseries_eeh_get_config_addr(phb, addr);
-   /* invalid PE config addr */
-   if (config_addr == 0)
-   continue;
-
-   pseries_eeh_phb_reset(phb, config_addr, 
EEH_RESET_FUNDAMENTAL);
-   pseries_eeh_phb_reset(phb, config_addr, 
EEH_RESET_DEACTIVATE);
-   pseries_eeh_phb_configure_bridge(phb, config_addr);
-   }
-   }
-
-   return 0;
-}
-
 static int pseries_eeh_cap_start(struct pci_dn *pdn)
 {
u32 status;
@@ -967,7 +885,6 @@ static int pseries_notify_resume(struct eeh_dev *edev)
 
 static struct eeh_ops pseries_eeh_ops = {
.name   = "pseries",
-   .init   = pseries_eeh_ini

[PATCH v2 2/9] powerpc/powernv: Stop using eeh_ops->init()

2020-09-18 Thread Oliver O'Halloran
Fold pnv_eeh_init() into eeh_powernv_init() rather than having eeh_init()
call it via eeh_ops->init(). It's simpler and it'll let us delete
eeh_ops.init.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 94 ++--
 1 file changed, 45 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 3eb0f2439da8..8b0fb6c8b8d9 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -44,54 +44,6 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
eeh_probe_device(pdev);
 }
 
-static int pnv_eeh_init(void)
-{
-   struct pci_controller *hose;
-   struct pnv_phb *phb;
-   int max_diag_size = PNV_PCI_DIAG_BUF_SIZE;
-
-   if (!firmware_has_feature(FW_FEATURE_OPAL)) {
-   pr_warn("%s: OPAL is required !\n",
-   __func__);
-   return -EINVAL;
-   }
-
-   /* Set probe mode */
-   eeh_add_flag(EEH_PROBE_MODE_DEV);
-
-   /*
-* P7IOC blocks PCI config access to frozen PE, but PHB3
-* doesn't do that. So we have to selectively enable I/O
-* prior to collecting error log.
-*/
-   list_for_each_entry(hose, _list, list_node) {
-   phb = hose->private_data;
-
-   if (phb->model == PNV_PHB_MODEL_P7IOC)
-   eeh_add_flag(EEH_ENABLE_IO_FOR_LOG);
-
-   if (phb->diag_data_size > max_diag_size)
-   max_diag_size = phb->diag_data_size;
-
-   /*
-* PE#0 should be regarded as valid by EEH core
-* if it's not the reserved one. Currently, we
-* have the reserved PE#255 and PE#127 for PHB3
-* and P7IOC separately. So we should regard
-* PE#0 as valid for PHB3 and P7IOC.
-*/
-   if (phb->ioda.reserved_pe_idx != 0)
-   eeh_add_flag(EEH_VALID_PE_ZERO);
-
-   break;
-   }
-
-   eeh_set_pe_aux_size(max_diag_size);
-   ppc_md.pcibios_bus_add_device = pnv_pcibios_bus_add_device;
-
-   return 0;
-}
-
 static irqreturn_t pnv_eeh_event(int irq, void *data)
 {
/*
@@ -1674,7 +1626,6 @@ static int pnv_eeh_restore_config(struct eeh_dev *edev)
 
 static struct eeh_ops pnv_eeh_ops = {
.name   = "powernv",
-   .init   = pnv_eeh_init,
.probe  = pnv_eeh_probe,
.set_option = pnv_eeh_set_option,
.get_state  = pnv_eeh_get_state,
@@ -1697,8 +1648,53 @@ static struct eeh_ops pnv_eeh_ops = {
  */
 static int __init eeh_powernv_init(void)
 {
+   int max_diag_size = PNV_PCI_DIAG_BUF_SIZE;
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
int ret = -EINVAL;
 
+   if (!firmware_has_feature(FW_FEATURE_OPAL)) {
+   pr_warn("%s: OPAL is required !\n", __func__);
+   return -EINVAL;
+   }
+
+   /* Set probe mode */
+   eeh_add_flag(EEH_PROBE_MODE_DEV);
+
+   /*
+* P7IOC blocks PCI config access to frozen PE, but PHB3
+* doesn't do that. So we have to selectively enable I/O
+* prior to collecting error log.
+*/
+   list_for_each_entry(hose, _list, list_node) {
+   phb = hose->private_data;
+
+   if (phb->model == PNV_PHB_MODEL_P7IOC)
+   eeh_add_flag(EEH_ENABLE_IO_FOR_LOG);
+
+   if (phb->diag_data_size > max_diag_size)
+   max_diag_size = phb->diag_data_size;
+
+   /*
+* PE#0 should be regarded as valid by EEH core
+* if it's not the reserved one. Currently, we
+* have the reserved PE#255 and PE#127 for PHB3
+* and P7IOC separately. So we should regard
+* PE#0 as valid for PHB3 and P7IOC.
+*/
+   if (phb->ioda.reserved_pe_idx != 0)
+   eeh_add_flag(EEH_VALID_PE_ZERO);
+
+   break;
+   }
+
+   /*
+* eeh_init() allocates the eeh_pe and its aux data buf so the
+* size needs to be set before calling eeh_init().
+*/
+   eeh_set_pe_aux_size(max_diag_size);
+   ppc_md.pcibios_bus_add_device = pnv_pcibios_bus_add_device;
+
ret = eeh_init(_eeh_ops);
if (!ret)
pr_info("EEH: PowerNV platform initialized\n");
-- 
2.26.2



[PATCH v2 1/9] powerpc/eeh: Rework EEH initialisation

2020-09-18 Thread Oliver O'Halloran
Drop the EEH register / unregister ops thing and have the platform pass the
ops structure into eeh_init() directly. This takes one initcall out of the
EEH setup path and it means we're only doing EEH setup on the platforms
which actually support it. It's also less code and generally easier to
follow.

No functional changes.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   |  3 +-
 arch/powerpc/kernel/eeh.c| 87 
 arch/powerpc/platforms/powernv/eeh-powernv.c |  4 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c |  5 +-
 4 files changed, 21 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index d5f369bcd130..765bcf63edea 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -295,8 +295,7 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 void eeh_show_enabled(void);
-int __init eeh_ops_register(struct eeh_ops *ops);
-int __exit eeh_ops_unregister(const char *name);
+int __init eeh_init(struct eeh_ops *ops);
 int eeh_check_failure(const volatile void __iomem *token);
 int eeh_dev_check_failure(struct eeh_dev *edev);
 void eeh_addr_cache_init(void);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 94682382fc8c..28a0ea5d9faa 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -929,56 +929,6 @@ void eeh_save_bars(struct eeh_dev *edev)
edev->config_space[1] |= PCI_COMMAND_MASTER;
 }
 
-/**
- * eeh_ops_register - Register platform dependent EEH operations
- * @ops: platform dependent EEH operations
- *
- * Register the platform dependent EEH operation callback
- * functions. The platform should call this function before
- * any other EEH operations.
- */
-int __init eeh_ops_register(struct eeh_ops *ops)
-{
-   if (!ops->name) {
-   pr_warn("%s: Invalid EEH ops name for %p\n",
-   __func__, ops);
-   return -EINVAL;
-   }
-
-   if (eeh_ops && eeh_ops != ops) {
-   pr_warn("%s: EEH ops of platform %s already existing (%s)\n",
-   __func__, eeh_ops->name, ops->name);
-   return -EEXIST;
-   }
-
-   eeh_ops = ops;
-
-   return 0;
-}
-
-/**
- * eeh_ops_unregister - Unreigster platform dependent EEH operations
- * @name: name of EEH platform operations
- *
- * Unregister the platform dependent EEH operation callback
- * functions.
- */
-int __exit eeh_ops_unregister(const char *name)
-{
-   if (!name || !strlen(name)) {
-   pr_warn("%s: Invalid EEH ops name\n",
-   __func__);
-   return -EINVAL;
-   }
-
-   if (eeh_ops && !strcmp(eeh_ops->name, name)) {
-   eeh_ops = NULL;
-   return 0;
-   }
-
-   return -EEXIST;
-}
-
 static int eeh_reboot_notifier(struct notifier_block *nb,
   unsigned long action, void *unused)
 {
@@ -991,25 +941,22 @@ static struct notifier_block eeh_reboot_nb = {
 };
 
 /**
- * eeh_init - EEH initialization
- *
- * Initialize EEH by trying to enable it for all of the adapters in the system.
- * As a side effect we can determine here if eeh is supported at all.
- * Note that we leave EEH on so failed config cycles won't cause a machine
- * check.  If a user turns off EEH for a particular adapter they are really
- * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
- * grant access to a slot if EEH isn't enabled, and so we always enable
- * EEH for all slots/all devices.
+ * eeh_init - System wide EEH initialization
  *
- * The eeh-force-off option disables EEH checking globally, for all slots.
- * Even if force-off is set, the EEH hardware is still enabled, so that
- * newer systems can boot.
+ * It's the platform's job to call this from an arch_initcall().
  */
-static int eeh_init(void)
+int eeh_init(struct eeh_ops *ops)
 {
struct pci_controller *hose, *tmp;
int ret = 0;
 
+   /* the platform should only initialise EEH once */
+   if (WARN_ON(eeh_ops))
+   return -EEXIST;
+   if (WARN_ON(!ops))
+   return -ENOENT;
+   eeh_ops = ops;
+
/* Register reboot notifier */
ret = register_reboot_notifier(_reboot_nb);
if (ret) {
@@ -1018,13 +965,13 @@ static int eeh_init(void)
return ret;
}
 
-   /* call platform initialization function */
-   if (!eeh_ops) {
-   pr_warn("%s: Platform EEH operation not found\n",
-   __func__);
-   return -EEXIST;
-   } else if ((ret = eeh_ops->init()))
+   if (eeh_ops->init)
+   ret = eeh_ops->init();
+   if (ret) {
+   pr_warn("%s: platform EEH init failed (%d)\n",
+  

Re: [PATCH 5/7] powerpc/powernv/pci: Fix W=1 compile warning

2020-09-11 Thread Oliver O'Halloran
On Fri, Sep 11, 2020 at 7:02 AM Cédric Le Goater  wrote:
>
>   CC  arch/powerpc/platforms/powernv/pci-ioda.o
> ../arch/powerpc/platforms/powernv/pci-ioda.c: In function 
> ‘pnv_ioda_configure_pe’:
> ../arch/powerpc/platforms/powernv/pci-ioda.c:897:18: error: variable ‘parent’ 
> set but not used [-Werror=unused-but-set-variable]
>   struct pci_dev *parent;
>       ^~
>
> Cc: Oliver O'Halloran 
> Signed-off-by: Cédric Le Goater 

Come to think of it a fix for this might already be in -next, see
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=193967=*

If not,

Reviewed-by: Oliver O'Halloran 


[PATCH 9/9] powerpc/eeh: Clean up PE addressing

2020-09-10 Thread Oliver O'Halloran
At some point in the distant past we only supported EEH on pseries. The
various EEH RTAS call use the "PE config address" as a handle to the PE
being manipulated so we need to find that address a PE.

There's three ways to determine the address of a PE starting from a device
inside of that PE: The old way, which requires traversing the DT until you
find a built-in device and finding its PE using the
ibm,read-slot-reset-state2 RTAS call. The new way, which requires using the
ibm,get-config-addr-info RTAS call to map the device address to a PE
address. And then there's the Linux way, which is "blindly assume the
device and PE addresses are the same." Naturally, PAPR doesn't recommend
(or even mention) that last one, but it's a technique
Linux on Power has used since the dawn of time.

Most (all?) modern pseries systems will provide the get-addr-info RTAS call
so Linux uses that for everything except for the initial ibm,eeh-set-option
RTAS call to enable EEH on the PE. The Linux way is still broken in that
case, but it seems to work so who maybe firmware have hacks to support it,
who knows. For systems that don't support the RTAS call we'll use the Linux
way as a fallback. For some reason we don't just use the fallback address
to initialise eeh_dev->addr and instead eeh_pe has two addresses and we'll
choose which one to use at runtime. This results in code that looks
something like:

config_addr = pe->config_addr;
if (pe->addr)
config_addr = pe->addr;

rtas_call(..., config_addr, ...);

In other words, if the result of the RTAS call is non-zero then Linux will
use that as the pe address. If not, it falls back to using the config_addr.
It's worth pointing out that both fields here used to be part of pci_dn and
ended up in eeh_pe after a while.

Storing both addresses in eeh_pe doesn't really make a whole lot of sense.
Why does the eeh_pe structure, which is platform independent, have two
addresses baked into it for the sake of a pseries platform quirk? Why
doesn't the pseries platform handle determining what the "correct" PE
address is during EEH initialisation for the device and just pass that
address to the EEH core? What does it even mean for a pe to have an
"address" if there's two of them?

There are no good answers to these questions; especially that last one. The
EEH core makes a token effort to support looking up a PE by either address
by having two arguments to eeh_pe_get(). However, a survey of all the
callers to eeh_pe_get() shows that all except one hard-code zero as the
config_addr argument. The only one that doesn't is in eeh_pe_tree_insert()a
which looks like this:

if (!eeh_has_flag(EEH_VALID_PE_ZERO) && !edev->pe_config_addr)
return -EINVAL;

pe = eeh_pe_get(hose, edev->pe_config_addr, edev->bdfn);

The third argument (config_addr) is only used if the second (pe->addr)
argument is invalid. In this case that would require edev->pe_config_addr
to be zero and the EEH_VALID_PE_ZERO flag to be unset. The preceding check
ensure that can never be true so there is no situation where eeh_pe_get()
will search for a PE with the specified pe->config_addr.

Similarly, on pseries the EEH_VALID_PE_ZERO flag isn't set so the check
above also ensures that there will never be a PE with pe->addr == 0. As a
result all the logic to choose whether we pass pe->config_addr or pe->addr
to an RTAS call is also dead code. The pe->config_addr will never be used
since pe->addr must be non-zero. Otherwise it wouldn't be in the PE tree.

This patch tries to clean up this mess by:

1) Removing pe->config_addr
2) Removing the EEH_VALID_PE_ZERO flag
3) Removing the fallback address argument to eeh_pe_get().
4) Removing all the checks for pe->addr being zero in the pseries EEH code.

This leaves us with PE's only being identified by what's in their pe->addr
field and relying on the platform to ensure that eeh_dev's are only
inserted into the EEH tree if they're actually inside a PE.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/include/asm/eeh.h   |  4 +-
 arch/powerpc/kernel/eeh.c|  2 +-
 arch/powerpc/kernel/eeh_pe.c | 46 +++-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 16 ++-
 arch/powerpc/platforms/pseries/eeh_pseries.c | 42 +++---
 5 files changed, 17 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 85030c05e67e..dd6a4ac6c713 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -27,7 +27,6 @@ struct pci_dn;
 #define EEH_FORCE_DISABLED 0x02/* EEH disabled  */
 #define EEH_PROBE_MODE_DEV 0x04/* From PCI device   */
 #define EEH_PROBE_MODE_DEVTREE 0x08/* From device tree  */
-#define EEH_V

[PATCH 8/9] powerpc/pseries/eeh: Allow zero to be a valid PE configuration address

2020-09-10 Thread Oliver O'Halloran
There's no real reason why zero can't be a valid PE configuration address.
Under qemu each sPAPR PHB (i.e. EEH supporting) has the passed-though
devices on bus zero, so the PE address of bus :00 should be zero.

However, all previous versions of Linux will reject that, so Qemu at least
goes out of it's way to avoid it. The Qemu implementation of
ibm,get-config-addr-info2 RTAS has the following comment:

> /*
>  * We always have PE address of form "00BB0001". "BB"
>  * represents the bus number of PE's primary bus.
>  */

So qemu puts a one into the register portion of the PE's config_addr to
avoid it being zero. The whole is pretty silly considering that RTAS will
return a negative error code if it can't map the device's config_addr to a
PE's.

Fix Linux to treat zero as a valid PE address. This shouldn't have any real
effects due to the Qemu hack mentioned above, and the fact that this has
worked historically on PowerVM means they never pass through devices on bus
zero.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 38 +++-
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index c2ecc0db2f94..e42c026392aa 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -87,21 +87,20 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
  * pseries_eeh_get_pe_config_addr - Find the pe_config_addr for a device
  * @pdn: pci_dn of the input device
  *
- * Retrieve the assocated config address. Actually, there're 2 RTAS
- * function calls dedicated for the purpose. We need implement
- * it through the new function and then the old one. Besides,
- * you should make sure the config address is figured out from
- * FDT node before calling the function.
+ * The EEH RTAS calls use a tuple consisting of: (buid_hi, buid_lo,
+ * pe_config_addr) as a handle to a given PE. This function finds the
+ * pe_config_addr based on the device's config addr.
  *
- * It's notable that zero'ed return value means invalid PE config
- * address.
+ * Keep in mind that the pe_config_addr *might* be numerically identical to the
+ * device's config addr, but the two are conceptually distinct.
+ *
+ * Returns the pe_config_addr, or a negative error code.
  */
 static int pseries_eeh_get_pe_config_addr(struct pci_dn *pdn)
 {
int config_addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
struct pci_controller *phb = pdn->phb;
-   int ret = 0;
-   int rets[3];
+   int ret, rets[3];
 
if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
/*
@@ -112,16 +111,16 @@ static int pseries_eeh_get_pe_config_addr(struct pci_dn 
*pdn)
config_addr, BUID_HI(phb->buid),
BUID_LO(phb->buid), 1);
if (ret || (rets[0] == 0))
-   return 0;
+   return -ENOENT;
 
-   /* Retrieve the associated PE config address */
+   /* Retrieve the associated PE config address with function 0 */
ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
config_addr, BUID_HI(phb->buid),
BUID_LO(phb->buid), 0);
if (ret) {
pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
__func__, phb->global_number, config_addr);
-   return 0;
+   return -ENXIO;
}
 
return rets[0];
@@ -134,13 +133,20 @@ static int pseries_eeh_get_pe_config_addr(struct pci_dn 
*pdn)
if (ret) {
pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
__func__, phb->global_number, config_addr);
-   return 0;
+   return -ENXIO;
}
 
return rets[0];
}
 
-   return ret;
+   /*
+* PAPR does describe a process for finding the pe_config_addr that was
+* used before the ibm,get-config-addr-info calls were added. However,
+* I haven't found *any* systems that don't have that RTAS call
+* implemented. If you happen to find one that needs the old DT based
+* process, patches are welcome!
+*/
+   return -ENOENT;
 }
 
 /**
@@ -419,7 +425,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
 
/* first up, find the pe_config_addr for the PE containing the device */
addr = pseries_eeh_get_pe_config_addr(pdn);
-   if (addr == 0) {
+   if (addr < 0) {
eeh_edev_dbg(edev, "Unable to find pe_config_addr\n");
goto err;
}
@@ -

[PATCH 7/9] powerpc/pseries/eeh: Rework device EEH PE determination

2020-09-09 Thread Oliver O'Halloran
The process Linux uses for determining if a device supports EEH or not
appears to be at odds with what PAPR+ says the OS should be doing. The
current flow is something like:

1. Assume pe_config_addr is equal the the device's config_addr.
2. Attempt to enable EEH on that PE
3. Verify EEH was enabled (POWER4 bug workaround)
4. Try find the pe_config_addr using the ibm,get-config-addr-info2 RTAS
   call.
5. If that fails walk the pci_dn tree upwards trying to find a parent
   device with EEH support. If we find one then add the device to that PE.

The first major flaw with this is that in order to enable EEH on a PE we
need to know the PE's configuration address since that's an input to the
ibm,set-eeh-option RTAS call which is used to enable EEH for the PE. We
hack around that by assuming that the PE address is equal to the device's
RTAS config address with the register fields set to zero (see
rtas_config_addr()). This assumption happens to be valid if:

a) The PCI device is the 0th function, and
b) The device is on the PE's root bus.

However, it this does also appear to work for devices where these
conditions are not true. At a guess PowerVM's RTAS has some workarounds to
accommodate Linux's quirks. However, it's a bit sketch and the code is
confusing since it's not implementing what PAPR claims is the correct way.

This patch re-works how we handle EEH init so that we find the PE config
address using the ibm,get-config-addr-info2 RTAS call, then use that to
finish the EEH init process. It also drops the Power4 workaround since as
of commit 471d7ff8b51b ("powerpc/64s: Remove POWER4 support") the kernel
does not support running on a Power4 CPU.

1. Find the pe_config_addr using the RTAS call.
2. Enable the PE (if needed)
3. Insert the edev into the tree and create an eeh_pe if needed.

The other change made here is ignoring unsupported devices entirely.
Currently the device's BARs are saved to the eeh_dev even if the device is
not part of an EEH PE. Not being part of a PE means that an EEH recovery
pass will never see that device so the saving the BARs is pointless.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 57 
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 10303de3d8d5..c2ecc0db2f94 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -357,10 +357,10 @@ static struct eeh_pe *pseries_eeh_pe_get_parent(struct 
eeh_dev *edev)
  */
 void pseries_eeh_init_edev(struct pci_dn *pdn)
 {
+   struct eeh_pe pe, *parent;
struct eeh_dev *edev;
-   struct eeh_pe pe;
+   int addr;
u32 pcie_flags;
-   int enable = 0;
int ret;
 
if (WARN_ON_ONCE(!eeh_has_flag(EEH_PROBE_MODE_DEVTREE)))
@@ -417,51 +417,38 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
}
}
 
-   /* Initialize the fake PE */
+   /* first up, find the pe_config_addr for the PE containing the device */
+   addr = pseries_eeh_get_pe_config_addr(pdn);
+   if (addr == 0) {
+   eeh_edev_dbg(edev, "Unable to find pe_config_addr\n");
+   goto err;
+   }
+
+   /* Try enable EEH on the fake PE */
memset(, 0, sizeof(struct eeh_pe));
pe.phb = pdn->phb;
-   pe.config_addr = (pdn->busno << 16) | (pdn->devfn << 8);
+   pe.addr = addr;
 
-   /* Enable EEH on the device */
eeh_edev_dbg(edev, "Enabling EEH on device\n");
ret = eeh_ops->set_option(, EEH_OPT_ENABLE);
if (ret) {
eeh_edev_dbg(edev, "EEH failed to enable on device (code 
%d)\n", ret);
-   } else {
-   struct eeh_pe *parent;
+   goto err;
+   }
 
-   /* Retrieve PE address */
-   edev->pe_config_addr = pseries_eeh_get_pe_config_addr(pdn);
-   pe.addr = edev->pe_config_addr;
+   edev->pe_config_addr = addr;
 
-   /* Some older systems (Power4) allow the ibm,set-eeh-option
-* call to succeed even on nodes where EEH is not supported.
-* Verify support explicitly.
-*/
-   ret = eeh_ops->get_state(, NULL);
-   if (ret > 0 && ret != EEH_STATE_NOT_SUPPORT)
-   enable = 1;
+   eeh_add_flag(EEH_ENABLED);
 
-   /*
-* This device doesn't support EEH, but it may have an
-* EEH parent. In this case any error on the device will
-* freeze the PE of it's upstream bridge, so added it to
-* the upstream PE.
-*/
-   parent = pseries_eeh_pe_get_parent(edev);
-   if (parent && !enable)
-   edev->pe_config

[PATCH 6/9] powerpc/pseries/eeh: Clean up pe_config_addr lookups

2020-09-09 Thread Oliver O'Halloran
De-duplicate, and fix up the comments, and make the prototype just take a
pci_dn since the job of the function is to return the pe_config_addr of the
PE which contains a given device.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 80 +++-
 1 file changed, 11 insertions(+), 69 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index b1561961c7ff..10303de3d8d5 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -33,8 +33,6 @@
 #include 
 #include 
 
-static int pseries_eeh_get_pe_addr(struct pci_dn *pdn);
-
 /* RTAS tokens */
 static int ibm_set_eeh_option;
 static int ibm_set_slot_reset;
@@ -86,7 +84,8 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 
 
 /**
- * pseries_eeh_get_config_addr - Retrieve config address
+ * pseries_eeh_get_pe_config_addr - Find the pe_config_addr for a device
+ * @pdn: pci_dn of the input device
  *
  * Retrieve the assocated config address. Actually, there're 2 RTAS
  * function calls dedicated for the purpose. We need implement
@@ -97,16 +96,17 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
  * It's notable that zero'ed return value means invalid PE config
  * address.
  */
-static int pseries_eeh_get_config_addr(struct pci_controller *phb, int 
config_addr)
+static int pseries_eeh_get_pe_config_addr(struct pci_dn *pdn)
 {
+   int config_addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
+   struct pci_controller *phb = pdn->phb;
int ret = 0;
int rets[3];
 
if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
/*
-* First of all, we need to make sure there has one PE
-* associated with the device. Otherwise, PE address is
-* meaningless.
+* First of all, use function 1 to determine if this device is
+* part of a PE or not. ret[0] being zero indicates it's not.
 */
ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
config_addr, BUID_HI(phb->buid),
@@ -431,7 +431,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
struct eeh_pe *parent;
 
/* Retrieve PE address */
-   edev->pe_config_addr = pseries_eeh_get_pe_addr(pdn);
+   edev->pe_config_addr = pseries_eeh_get_pe_config_addr(pdn);
pe.addr = edev->pe_config_addr;
 
/* Some older systems (Power4) allow the ibm,set-eeh-option
@@ -551,64 +551,6 @@ static int pseries_eeh_set_option(struct eeh_pe *pe, int 
option)
return ret;
 }
 
-/**
- * pseries_eeh_get_pe_addr - Retrieve PE address
- * @pe: EEH PE
- *
- * Retrieve the assocated PE address. Actually, there're 2 RTAS
- * function calls dedicated for the purpose. We need implement
- * it through the new function and then the old one. Besides,
- * you should make sure the config address is figured out from
- * FDT node before calling the function.
- *
- * It's notable that zero'ed return value means invalid PE config
- * address.
- */
-static int pseries_eeh_get_pe_addr(struct pci_dn *pdn)
-{
-   int config_addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
-   unsigned long buid = pdn->phb->buid;
-   int ret = 0;
-   int rets[3];
-
-   if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
-   /*
-* First of all, we need to make sure there has one PE
-* associated with the device. Otherwise, PE address is
-* meaningless.
-*/
-   ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
-   config_addr, BUID_HI(buid), BUID_LO(buid), 1);
-   if (ret || (rets[0] == 0))
-   return 0;
-
-   /* Retrieve the associated PE config address */
-   ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
-   config_addr, BUID_HI(buid), BUID_LO(buid), 0);
-   if (ret) {
-   pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
-   __func__, pdn->phb->global_number, config_addr);
-   return 0;
-   }
-
-   return rets[0];
-   }
-
-   if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) {
-   ret = rtas_call(ibm_get_config_addr_info, 4, 2, rets,
-   config_addr, BUID_HI(buid), BUID_LO(buid), 0);
-   if (ret) {
-   pr_warn("%s: Failed to get address for PHB#%x-PE#%x\n",
-   __func__, pdn->phb->global_number, config_addr);
-   return 0;
-   }
-
-   return rets[0];
- 

  1   2   3   4   5   6   7   >