Re: [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-15 Thread Oliver O'Halloran
On Wed, Jul 15, 2020 at 5:05 PM Cédric Le Goater  wrote:
>
> I could but can we fix the issue below before I reboot ?  I don't have a
> console anymore on these boxes.
>
> Firmware is :
> *snip*

Do you know when that started happening? I don't think anything
console related has changed in a very long time, but we probably
haven't tested it on p7 in even longer.


Re: [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-15 Thread Cédric Le Goater
On 7/15/20 5:33 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 15/07/2020 11:38, Oliver O'Halloran wrote:
>> On Tue, Jul 14, 2020 at 5:21 PM Alexey Kardashevskiy  wrote:
>>>
>>> On 14/07/2020 15:58, Oliver O'Halloran wrote:
 On Tue, Jul 14, 2020 at 3:37 PM Alexey Kardashevskiy  
 wrote:
>
> On 10/07/2020 15:23, Oliver O'Halloran wrote:
>> This also means the only remaining user of the old "DMA Weight" code is
>> the IODA1 DMA setup code that it was originally added for, which is good.
>
>
> Is ditching IODA1 in the plan? :)

 That or separating out the pci_controller_ops for IODA1 and IODA2 so
 we can stop any IODA2 specific changes from breaking it.
>>>
>>> Is IODA1 tested at all these days? Or, is anyone running upstream
>>> kernels anywhere and keeps shouting when it does not work on IODA1? Thanks,
>>
>> Cedric has a P7 with OPAL. That's probably the one left though.
> 
> Has he tried these patches on that box? Or we hope for the best here? :)

I could but can we fix the issue below before I reboot ?  I don't have a 
console anymore on these boxes. 

Firmware is : 

root@amure:~# dtc -I fs /proc/device-tree/ibm,opal/firmware/ -f
: ERROR (name_properties): /: "name" property is incorrect ("firmware" 
instead of base node name)
Warning: Input tree has errors, output forced
/dts-v1/;

/ {
git-id = "34b3400";
ml-version = [4d 4c 20 46 57 37 37 30 2e 32 30 20 46 57 37 37 30 2e 32 
30 20 46 57 37 37 30 2e 32 30];
compatible = "ibm,opal-firmware";
phandle = <0x4d>;
mi-version = <0x4d49205a 0x4c373730 0x5f303735 0x205a4c37 0x37305f30 
0x3735205a 0x4c373730 0x5f303735>;
linux,phandle = <0x4d>;
name = "firmware";
};

I rather not change it if possible. 


C.

[1.979581] [ cut here ]
[1.979582] opal: OPAL_CONSOLE_FLUSH missing.
[1.979583] WARNING: CPU: 0 PID: 253 at 
arch/powerpc/platforms/powernv/opal.c:446 .__opal_flush_console+0xfc/0x110
[1.979584] Modules linked in: ipr(E+) ptp(E) usb_common(E) pps_core(E)
[1.979587] CPU: 0 PID: 253 Comm: udevadm Tainted: GE 
5.4.0-4-powerpc64 #1 Debian 5.4.19-1
[1.979588] NIP:  c00d10ec LR: c00d10e8 CTR: c0b13510
[1.979589] REGS: c381f130 TRAP: 0700   Tainted: GE  
(5.4.0-4-powerpc64 Debian 5.4.19-1)
[1.979590] MSR:  90021032   CR: 28002282  XER: 
2000
[1.979594] CFAR: c0157d2c IRQMASK: 3 
[1.979595] GPR00: c00d10e8 c381f3c0 c1618700 
0022 
[1.979598] GPR04: c0c95df2 0002 414c5f434f4e534f 
4c455f464c555348 
[1.979601] GPR08: 0003 0003 0001 
90001032 
[1.979604] GPR12: c00d0818 c182  
c14342a8 
[1.979607] GPR16: c173b850 c148b218 00011a2d5db8 
 
[1.979609] GPR20:  c4b50e00  
c173e208 
[1.979612] GPR24: c173bde8  c148b1d8 
c16620e0 
[1.979615] GPR28: c17f7c40   
 
[1.979618] NIP [c00d10ec] .__opal_flush_console+0xfc/0x110
[1.979618] LR [c00d10e8] .__opal_flush_console+0xf8/0x110
[1.979619] Call Trace:
[1.979620] [c381f3c0] [c00d10e8] 
.__opal_flush_console+0xf8/0x110 (unreliable)
[1.979621] [c381f450] [c00d1428] .opal_flush_chars+0x38/0xc0
[1.979623] [c381f4d0] [c07680a8] 
.hvc_console_print+0x188/0x2d0
[1.979624] [c381f5b0] [c01eff08] .console_unlock+0x348/0x720
[1.979625] [c381f6c0] [c01f268c] .vprintk_emit+0x27c/0x3a0
[1.979626] [c381f780] [c07af2f4] 
.dev_vprintk_emit+0x208/0x258
[1.979628] [c381f8e0] [c07af38c] .dev_printk_emit+0x48/0x58
[1.979629] [c381f950] [c07af748] ._dev_err+0x6c/0x9c
[1.979630] [c381fa00] [c07aaff8] .uevent_store+0x78/0x80
[1.979631] [c381fa90] [c07a8ce4] .dev_attr_store+0x64/0x90
[1.979633] [c381fb20] [c054becc] .sysfs_kf_write+0x7c/0xa0
[1.979634] [c381fbb0] [c054b294] 
.kernfs_fop_write+0x114/0x270
[1.979635] [c381fc50] [c0456b58] .__vfs_write+0x68/0xe0
[1.979636] [c381fce0] [c0457e44] .vfs_write+0xc4/0x270
[1.979638] [c381fd80] [c045adc4] .ksys_write+0x84/0x140
[1.979639] [c381fe20] [c000c050] system_call+0x5c/0x68
[1.979640] Instruction dump:
[1.979641] 3be0fffe 4bffb581 6000 4b90 6000 3c62ff68 3921 
3d42ffea 
[1.979644] 3863d6d0 992a9d98 48086be1 6000 <0fe0> 4b50 480867ad 
6000 
[1.979648] ---[ end trace 34198c4c2c15e0e2 ]---


Re: [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-14 Thread Alexey Kardashevskiy



On 15/07/2020 11:38, Oliver O'Halloran wrote:
> On Tue, Jul 14, 2020 at 5:21 PM Alexey Kardashevskiy  wrote:
>>
>> On 14/07/2020 15:58, Oliver O'Halloran wrote:
>>> On Tue, Jul 14, 2020 at 3:37 PM Alexey Kardashevskiy  wrote:

 On 10/07/2020 15:23, Oliver O'Halloran wrote:
> This also means the only remaining user of the old "DMA Weight" code is
> the IODA1 DMA setup code that it was originally added for, which is good.


 Is ditching IODA1 in the plan? :)
>>>
>>> That or separating out the pci_controller_ops for IODA1 and IODA2 so
>>> we can stop any IODA2 specific changes from breaking it.
>>
>> Is IODA1 tested at all these days? Or, is anyone running upstream
>> kernels anywhere and keeps shouting when it does not work on IODA1? Thanks,
> 
> Cedric has a P7 with OPAL. That's probably the one left though.

Has he tried these patches on that box? Or we hope for the best here? :)



-- 
Alexey


Re: [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-14 Thread Oliver O'Halloran
On Tue, Jul 14, 2020 at 5:21 PM Alexey Kardashevskiy  wrote:
>
> On 14/07/2020 15:58, Oliver O'Halloran wrote:
> > On Tue, Jul 14, 2020 at 3:37 PM Alexey Kardashevskiy  wrote:
> >>
> >> On 10/07/2020 15:23, Oliver O'Halloran wrote:
> >>> This also means the only remaining user of the old "DMA Weight" code is
> >>> the IODA1 DMA setup code that it was originally added for, which is good.
> >>
> >>
> >> Is ditching IODA1 in the plan? :)
> >
> > That or separating out the pci_controller_ops for IODA1 and IODA2 so
> > we can stop any IODA2 specific changes from breaking it.
>
> Is IODA1 tested at all these days? Or, is anyone running upstream
> kernels anywhere and keeps shouting when it does not work on IODA1? Thanks,

Cedric has a P7 with OPAL. That's probably the one left though.


Re: [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-14 Thread Alexey Kardashevskiy



On 14/07/2020 17:21, Alexey Kardashevskiy wrote:
> 
> 
> On 14/07/2020 15:58, Oliver O'Halloran wrote:
>> On Tue, Jul 14, 2020 at 3:37 PM Alexey Kardashevskiy  wrote:
>>>
>>> On 10/07/2020 15:23, Oliver O'Halloran wrote:
 There's an optimisation in the PE setup which skips performing DMA
 setup for a PE if we only have bridges in a PE. The assumption being
 that only "real" devices will DMA to system memory, which is probably
 fair. However, if we start off with only bridge devices in a PE then
 add a non-bridge device the new device won't be able to use DMA  because
 we never configured it.

 Fix this (admittedly pretty weird) edge case by tracking whether we've done
 the DMA setup for the PE or not. If a non-bridge device is added to the PE
 (via rescan or hotplug, or whatever) we can set up DMA on demand.
>>>
>>> So hotplug does not work on powernv then, right? I thought you tested it
>>> a while ago, or this patch is the result of that attempt? If it is, then
>>
>> It mostly works. Just the really niche case of hot plugging a bridge,
>> then later on hot plugging a device into the same bus which wouldn't
>> work.
> 
> Do not you have to have a slot (which is a bridge) for hotplug in the
> first place, to hotplug the bridge?


As discussed elsewhere, I missed that it is a non bridge device on the
same bus with previously plugged bridge. Now it all makes sense and


Reviewed-by: Alexey Kardashevskiy 


-- 
Alexey


Re: [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-14 Thread Alexey Kardashevskiy



On 14/07/2020 15:58, Oliver O'Halloran wrote:
> On Tue, Jul 14, 2020 at 3:37 PM Alexey Kardashevskiy  wrote:
>>
>> On 10/07/2020 15:23, Oliver O'Halloran wrote:
>>> There's an optimisation in the PE setup which skips performing DMA
>>> setup for a PE if we only have bridges in a PE. The assumption being
>>> that only "real" devices will DMA to system memory, which is probably
>>> fair. However, if we start off with only bridge devices in a PE then
>>> add a non-bridge device the new device won't be able to use DMA  because
>>> we never configured it.
>>>
>>> Fix this (admittedly pretty weird) edge case by tracking whether we've done
>>> the DMA setup for the PE or not. If a non-bridge device is added to the PE
>>> (via rescan or hotplug, or whatever) we can set up DMA on demand.
>>
>> So hotplug does not work on powernv then, right? I thought you tested it
>> a while ago, or this patch is the result of that attempt? If it is, then
> 
> It mostly works. Just the really niche case of hot plugging a bridge,
> then later on hot plugging a device into the same bus which wouldn't
> work.

Do not you have to have a slot (which is a bridge) for hotplug in the
first place, to hotplug the bridge?

> 
>> Reviewed-by: Alexey Kardashevskiy 
>>
>>
>>> This also means the only remaining user of the old "DMA Weight" code is
>>> the IODA1 DMA setup code that it was originally added for, which is good.
>>
>>
>> Is ditching IODA1 in the plan? :)
> 
> That or separating out the pci_controller_ops for IODA1 and IODA2 so
> we can stop any IODA2 specific changes from breaking it.

Is IODA1 tested at all these days? Or, is anyone running upstream
kernels anywhere and keeps shouting when it does not work on IODA1? Thanks,



> For the most
> part keeping around IODA1 support isn't hurting anyone, but I wanted
> to re-work how the BDFN->PE assignment works so that we'd delay
> assigning a BDFN to a PE until the device is probed. Right now when
> we're configuring the PE for a bus we map all 255 devfn's to that PE.
> This is mostly fine, but if you do a bus rescan and there's no device
> present we'll get a spurious EEH on that PE since the PHB sees that
> there's no device responding to the CFG cycle. We stop the spurious
> EEH freeze today by only allowing config cycles if we can find a
> pci_dn for that bdfn, but I want to get rid of pci_dn.
> 
> Mapping each BDFN to a PE after the device is probed is easy enough to
> do on PHB3 and above since the mapping is handled by an in-memory
> table which is indexed by the BDFN. Earlier PHBs (i.e. IODA1) use a
> table of bask & mask values which match on the BDFN, so assigning a
> whole bus at once is easy, but adding individual BDFNs is hard. It's
> still possible to do in the HW, but the way the OPAL API works makes
> it impossible.
> 
>>>
>>> Cc: Alexey Kardashevskiy 
>>> Signed-off-by: Oliver O'Halloran 
>>> ---
>>> Alexey, do we need to have the IOMMU API stuff set/clear this flag?
>>
>>
>> I'd say no as that API only cares if a device is in a PE and for those
>> the PE DMA setup  optimization is skipped. Thanks,
> 
> Ok cool.
> 

-- 
Alexey


Re: [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-13 Thread Oliver O'Halloran
On Tue, Jul 14, 2020 at 3:37 PM Alexey Kardashevskiy  wrote:
>
> On 10/07/2020 15:23, Oliver O'Halloran wrote:
> > There's an optimisation in the PE setup which skips performing DMA
> > setup for a PE if we only have bridges in a PE. The assumption being
> > that only "real" devices will DMA to system memory, which is probably
> > fair. However, if we start off with only bridge devices in a PE then
> > add a non-bridge device the new device won't be able to use DMA  because
> > we never configured it.
> >
> > Fix this (admittedly pretty weird) edge case by tracking whether we've done
> > the DMA setup for the PE or not. If a non-bridge device is added to the PE
> > (via rescan or hotplug, or whatever) we can set up DMA on demand.
>
> So hotplug does not work on powernv then, right? I thought you tested it
> a while ago, or this patch is the result of that attempt? If it is, then

It mostly works. Just the really niche case of hot plugging a bridge,
then later on hot plugging a device into the same bus which wouldn't
work.

> Reviewed-by: Alexey Kardashevskiy 
>
>
> > This also means the only remaining user of the old "DMA Weight" code is
> > the IODA1 DMA setup code that it was originally added for, which is good.
>
>
> Is ditching IODA1 in the plan? :)

That or separating out the pci_controller_ops for IODA1 and IODA2 so
we can stop any IODA2 specific changes from breaking it. For the most
part keeping around IODA1 support isn't hurting anyone, but I wanted
to re-work how the BDFN->PE assignment works so that we'd delay
assigning a BDFN to a PE until the device is probed. Right now when
we're configuring the PE for a bus we map all 255 devfn's to that PE.
This is mostly fine, but if you do a bus rescan and there's no device
present we'll get a spurious EEH on that PE since the PHB sees that
there's no device responding to the CFG cycle. We stop the spurious
EEH freeze today by only allowing config cycles if we can find a
pci_dn for that bdfn, but I want to get rid of pci_dn.

Mapping each BDFN to a PE after the device is probed is easy enough to
do on PHB3 and above since the mapping is handled by an in-memory
table which is indexed by the BDFN. Earlier PHBs (i.e. IODA1) use a
table of bask & mask values which match on the BDFN, so assigning a
whole bus at once is easy, but adding individual BDFNs is hard. It's
still possible to do in the HW, but the way the OPAL API works makes
it impossible.

> >
> > Cc: Alexey Kardashevskiy 
> > Signed-off-by: Oliver O'Halloran 
> > ---
> > Alexey, do we need to have the IOMMU API stuff set/clear this flag?
>
>
> I'd say no as that API only cares if a device is in a PE and for those
> the PE DMA setup  optimization is skipped. Thanks,

Ok cool.


Re: [PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-13 Thread Alexey Kardashevskiy



On 10/07/2020 15:23, Oliver O'Halloran wrote:
> There's an optimisation in the PE setup which skips performing DMA
> setup for a PE if we only have bridges in a PE. The assumption being
> that only "real" devices will DMA to system memory, which is probably
> fair. However, if we start off with only bridge devices in a PE then
> add a non-bridge device the new device won't be able to use DMA  because
> we never configured it.
> 
> Fix this (admittedly pretty weird) edge case by tracking whether we've done
> the DMA setup for the PE or not. If a non-bridge device is added to the PE
> (via rescan or hotplug, or whatever) we can set up DMA on demand.

So hotplug does not work on powernv then, right? I thought you tested it
a while ago, or this patch is the result of that attempt? If it is, then

Reviewed-by: Alexey Kardashevskiy 


> This also means the only remaining user of the old "DMA Weight" code is
> the IODA1 DMA setup code that it was originally added for, which is good.


Is ditching IODA1 in the plan? :)

> 
> Cc: Alexey Kardashevskiy 
> Signed-off-by: Oliver O'Halloran 
> ---
> Alexey, do we need to have the IOMMU API stuff set/clear this flag?


I'd say no as that API only cares if a device is in a PE and for those
the PE DMA setup  optimization is skipped. Thanks,




> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 48 ++-
>  arch/powerpc/platforms/powernv/pci.h  |  7 
>  2 files changed, 36 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index bfb40607aa0e..bb9c1cc60c33 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -141,6 +141,7 @@ static struct pnv_ioda_pe *pnv_ioda_init_pe(struct 
> pnv_phb *phb, int pe_no)
>  
>   phb->ioda.pe_array[pe_no].phb = phb;
>   phb->ioda.pe_array[pe_no].pe_number = pe_no;
> + phb->ioda.pe_array[pe_no].dma_setup_done = false;
>  
>   /*
>* Clear the PE frozen state as it might be put into frozen state
> @@ -1685,6 +1686,12 @@ static int pnv_pcibios_sriov_enable(struct pci_dev 
> *pdev, u16 num_vfs)
>  }
>  #endif /* CONFIG_PCI_IOV */
>  
> +static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
> +struct pnv_ioda_pe *pe);
> +
> +static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> +struct pnv_ioda_pe *pe);
> +
>  static void pnv_pci_ioda_dma_dev_setup(struct pci_dev *pdev)
>  {
>   struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
> @@ -1713,6 +1720,24 @@ static void pnv_pci_ioda_dma_dev_setup(struct pci_dev 
> *pdev)
>   pci_info(pdev, "Added to existing PE#%x\n", pe->pe_number);
>   }
>  
> + /*
> +  * We assume that bridges *probably* don't need to do any DMA so we can
> +  * skip allocating a TCE table, etc unless we get a non-bridge device.
> +  */
> + if (!pe->dma_setup_done && !pci_is_bridge(pdev)) {
> + switch (phb->type) {
> + case PNV_PHB_IODA1:
> + pnv_pci_ioda1_setup_dma_pe(phb, pe);
> + break;
> + case PNV_PHB_IODA2:
> + pnv_pci_ioda2_setup_dma_pe(phb, pe);
> + break;
> + default:
> + pr_warn("%s: No DMA for PHB#%x (type %d)\n",
> + __func__, phb->hose->global_number, phb->type);
> + }
> + }
> +
>   if (pdn)
>   pdn->pe_number = pe->pe_number;
>   pe->device_count++;
> @@ -,6 +2247,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
> *phb,
>   pe->table_group.tce32_size = tbl->it_size << tbl->it_page_shift;
>   iommu_init_table(tbl, phb->hose->node, 0, 0);
>  
> + pe->dma_setup_done = true;
>   return;
>   fail:
>   /* XXX Failure: Try to fallback to 64-bit only ? */
> @@ -2536,9 +2562,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
> *phb,
>  {
>   int64_t rc;
>  
> - if (!pnv_pci_ioda_pe_dma_weight(pe))
> - return;
> -
>   /* TVE #1 is selected by PCI address bit 59 */
>   pe->tce_bypass_base = 1ull << 59;
>  
> @@ -2563,6 +2586,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
> *phb,
>   iommu_register_group(>table_group, phb->hose->global_number,
>pe->pe_number);
>  #endif
> + pe->dma_setup_done = true;
>  }
>  
>  int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq)
> @@ -3136,7 +3160,6 @@ static void pnv_pci_fixup_bridge_resources(struct 
> pci_bus *bus,
>  
>  static void pnv_pci_configure_bus(struct pci_bus *bus)
>  {
> - struct pnv_phb *phb = pci_bus_to_pnvhb(bus);
>   struct pci_dev *bridge = bus->self;
>   struct pnv_ioda_pe *pe;
>   bool all = (bridge && pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
> @@ -3160,17 +3183,6 @@ static void 

[PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

2020-07-09 Thread Oliver O'Halloran
There's an optimisation in the PE setup which skips performing DMA
setup for a PE if we only have bridges in a PE. The assumption being
that only "real" devices will DMA to system memory, which is probably
fair. However, if we start off with only bridge devices in a PE then
add a non-bridge device the new device won't be able to use DMA because
we never configured it.

Fix this (admittedly pretty weird) edge case by tracking whether we've done
the DMA setup for the PE or not. If a non-bridge device is added to the PE
(via rescan or hotplug, or whatever) we can set up DMA on demand.

This also means the only remaining user of the old "DMA Weight" code is
the IODA1 DMA setup code that it was originally added for, which is good.

Cc: Alexey Kardashevskiy 
Signed-off-by: Oliver O'Halloran 
---
Alexey, do we need to have the IOMMU API stuff set/clear this flag?
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 48 ++-
 arch/powerpc/platforms/powernv/pci.h  |  7 
 2 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index bfb40607aa0e..bb9c1cc60c33 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -141,6 +141,7 @@ static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb 
*phb, int pe_no)
 
phb->ioda.pe_array[pe_no].phb = phb;
phb->ioda.pe_array[pe_no].pe_number = pe_no;
+   phb->ioda.pe_array[pe_no].dma_setup_done = false;
 
/*
 * Clear the PE frozen state as it might be put into frozen state
@@ -1685,6 +1686,12 @@ static int pnv_pcibios_sriov_enable(struct pci_dev 
*pdev, u16 num_vfs)
 }
 #endif /* CONFIG_PCI_IOV */
 
+static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
+  struct pnv_ioda_pe *pe);
+
+static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
+  struct pnv_ioda_pe *pe);
+
 static void pnv_pci_ioda_dma_dev_setup(struct pci_dev *pdev)
 {
struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
@@ -1713,6 +1720,24 @@ static void pnv_pci_ioda_dma_dev_setup(struct pci_dev 
*pdev)
pci_info(pdev, "Added to existing PE#%x\n", pe->pe_number);
}
 
+   /*
+* We assume that bridges *probably* don't need to do any DMA so we can
+* skip allocating a TCE table, etc unless we get a non-bridge device.
+*/
+   if (!pe->dma_setup_done && !pci_is_bridge(pdev)) {
+   switch (phb->type) {
+   case PNV_PHB_IODA1:
+   pnv_pci_ioda1_setup_dma_pe(phb, pe);
+   break;
+   case PNV_PHB_IODA2:
+   pnv_pci_ioda2_setup_dma_pe(phb, pe);
+   break;
+   default:
+   pr_warn("%s: No DMA for PHB#%x (type %d)\n",
+   __func__, phb->hose->global_number, phb->type);
+   }
+   }
+
if (pdn)
pdn->pe_number = pe->pe_number;
pe->device_count++;
@@ -,6 +2247,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
pe->table_group.tce32_size = tbl->it_size << tbl->it_page_shift;
iommu_init_table(tbl, phb->hose->node, 0, 0);
 
+   pe->dma_setup_done = true;
return;
  fail:
/* XXX Failure: Try to fallback to 64-bit only ? */
@@ -2536,9 +2562,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 {
int64_t rc;
 
-   if (!pnv_pci_ioda_pe_dma_weight(pe))
-   return;
-
/* TVE #1 is selected by PCI address bit 59 */
pe->tce_bypass_base = 1ull << 59;
 
@@ -2563,6 +2586,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
iommu_register_group(>table_group, phb->hose->global_number,
 pe->pe_number);
 #endif
+   pe->dma_setup_done = true;
 }
 
 int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq)
@@ -3136,7 +3160,6 @@ static void pnv_pci_fixup_bridge_resources(struct pci_bus 
*bus,
 
 static void pnv_pci_configure_bus(struct pci_bus *bus)
 {
-   struct pnv_phb *phb = pci_bus_to_pnvhb(bus);
struct pci_dev *bridge = bus->self;
struct pnv_ioda_pe *pe;
bool all = (bridge && pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE);
@@ -3160,17 +3183,6 @@ static void pnv_pci_configure_bus(struct pci_bus *bus)
return;
 
pnv_ioda_setup_pe_seg(pe);
-   switch (phb->type) {
-   case PNV_PHB_IODA1:
-   pnv_pci_ioda1_setup_dma_pe(phb, pe);
-   break;
-   case PNV_PHB_IODA2:
-   pnv_pci_ioda2_setup_dma_pe(phb, pe);
-   break;
-   default:
-   pr_warn("%s: No DMA for PHB#%x (type %d)\n",
-   __func__, phb->hose->global_number, phb->type);
-   }
 }
 
 static resource_size_t