Re: [PATCH v4] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-04-05 Thread Niklas Schnelle
On Thu, 2023-03-23 at 17:33 +0100, Niklas Schnelle wrote:
> We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
> Port access. In a future patch HAS_IOPORT=n will disable compilation of
> the I/O accessor functions inb()/outb() and friends on architectures
> which can not meaningfully support legacy I/O spaces such as s390.
> 
> The following architectures do not select HAS_IOPORT:
> 
> * ARC
> * C-SKY
> * Hexagon
> * Nios II
> * OpenRISC
> * s390
> * User-Mode Linux
> * Xtensa
> 
> All other architectures select HAS_IOPORT at least conditionally.
> 
> The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
> for HAS_IOPORT specific sections will be added in subsequent patches on
> a per subsystem basis.
> 
> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Arnd Bergmann 
> Acked-by: Johannes Berg  # for ARCH=um
> Acked-by: Geert Uytterhoeven 
> Signed-off-by: Niklas Schnelle 
> ---
> Note: This patch is the initial patch of a larger series[0]. This patch
> introduces the HAS_IOPORT config option while the rest of the series adds
> driver dependencies and the final patch removes inb() / outb() and friends on
> platforms that don't support them. 
> 
> Thus each of the per-subsystem patches is independent from each other but
> depends on this patch while the final patch depends on the whole series. Thus
> splitting this initial patch off allows the per-subsytem HAS_IOPORT dependency
> addition be merged separately via different trees without breaking the build.
> 
> [0] 
> https://lore.kernel.org/lkml/20230314121216.413434-1-schne...@linux.ibm.com/
> 
> Changes since v3:
> - List archs without HAS_IOPORT in commit message (Arnd)
> - Select HAS_IOPORT for LoongArch (Arnd)
> - Use "select HAS_IOPORT if (E)ISA || .." instead of a "depends on" for (E)ISA
>   for m68k and parisc
> - Select HAS_IOPORT with config GSC on parisc (Arnd)
> - Drop "depends on HAS_IOPORT" for um's config ISA (Johannes)
> - Drop "depends on HAS_IOPORT" for config ISA on x86 and parisc where it is
>   always selected (Arnd)
> 

Gentle ping. As far as I can tell this hasn't been picked to any tree
sp far but also hasn't seen complains so I'm wondering if I should send
a new version of the combined series of this patch plus the added
HAS_IOPORT dependencies per subsystem or wait until this is picked up.

Thanks,
Niklas



[PATCH v4] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-03-23 Thread Niklas Schnelle
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
Port access. In a future patch HAS_IOPORT=n will disable compilation of
the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390.

The following architectures do not select HAS_IOPORT:

* ARC
* C-SKY
* Hexagon
* Nios II
* OpenRISC
* s390
* User-Mode Linux
* Xtensa

All other architectures select HAS_IOPORT at least conditionally.

The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
for HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis.

Co-developed-by: Arnd Bergmann 
Signed-off-by: Arnd Bergmann 
Acked-by: Johannes Berg  # for ARCH=um
Acked-by: Geert Uytterhoeven 
Signed-off-by: Niklas Schnelle 
---
Note: This patch is the initial patch of a larger series[0]. This patch
introduces the HAS_IOPORT config option while the rest of the series adds
driver dependencies and the final patch removes inb() / outb() and friends on
platforms that don't support them. 

Thus each of the per-subsystem patches is independent from each other but
depends on this patch while the final patch depends on the whole series. Thus
splitting this initial patch off allows the per-subsytem HAS_IOPORT dependency
addition be merged separately via different trees without breaking the build.

[0] https://lore.kernel.org/lkml/20230314121216.413434-1-schne...@linux.ibm.com/

Changes since v3:
- List archs without HAS_IOPORT in commit message (Arnd)
- Select HAS_IOPORT for LoongArch (Arnd)
- Use "select HAS_IOPORT if (E)ISA || .." instead of a "depends on" for (E)ISA
  for m68k and parisc
- Select HAS_IOPORT with config GSC on parisc (Arnd)
- Drop "depends on HAS_IOPORT" for um's config ISA (Johannes)
- Drop "depends on HAS_IOPORT" for config ISA on x86 and parisc where it is
  always selected (Arnd)

 arch/alpha/Kconfig  | 1 +
 arch/arm/Kconfig| 1 +
 arch/arm64/Kconfig  | 1 +
 arch/ia64/Kconfig   | 1 +
 arch/loongarch/Kconfig  | 1 +
 arch/m68k/Kconfig   | 1 +
 arch/microblaze/Kconfig | 1 +
 arch/mips/Kconfig   | 1 +
 arch/parisc/Kconfig | 1 +
 arch/powerpc/Kconfig| 1 +
 arch/riscv/Kconfig  | 1 +
 arch/sh/Kconfig | 1 +
 arch/sparc/Kconfig  | 1 +
 arch/x86/Kconfig| 1 +
 drivers/bus/Kconfig | 2 +-
 drivers/parisc/Kconfig  | 1 +
 lib/Kconfig | 4 
 17 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 780d4673c3ca..a5c2b1aa46b0 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -27,6 +27,7 @@ config ALPHA
select AUDIT_ARCH
select GENERIC_CPU_VULNERABILITIES
select GENERIC_SMP_IDLE_THREAD
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_RELA
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e24a9820e12f..4acb5bc4b52a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -70,6 +70,7 @@ config ARM
select GENERIC_SCHED_CLOCK
select GENERIC_SMP_IDLE_THREAD
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1023e896d46b..b740019c4aee 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -145,6 +145,7 @@ config ARM64
select GENERIC_GETTIMEOFDAY
select GENERIC_VDSO_TIME_NS
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_PCI
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index d7e4a24e8644..2e13ec8263b9 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
select PCI_DOMAINS if PCI
select PCI_MSI
select PCI_SYSCALL if PCI
+   select HAS_IOPORT
select HAVE_ASM_MODVERSIONS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_EXIT_THREAD
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 7fd51257e0ed..e1615dfb5437 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -80,6 +80,7 @@ config LOONGARCH
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select GPIOLIB
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_SECCOMP_FILTER
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 82154952e574..40198a1ebe27 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -18,6 +18,7 @@ config M68K
select GENERIC_CPU_DEVICES
select GENERIC_IOMAP
select GENERI

Re: [PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-03-23 Thread Niklas Schnelle
On Tue, 2023-03-14 at 13:37 +0100, Johannes Berg wrote:
> On Tue, 2023-03-14 at 13:11 +0100, Niklas Schnelle wrote:
> > --- a/arch/um/Kconfig
> > +++ b/arch/um/Kconfig
> > @@ -56,6 +56,7 @@ config NO_IOPORT_MAP
> >  
> >  config ISA
> > bool
> > +   depends on HAS_IOPORT
> > 
> 
> config ISA here is already unselectable, and nothing ever does "select
> ISA" (only in some other architectures), so is there much point in this?
> 
> I'm not even sure why this exists at all.

You're right there's not much point and I dropped this for v4. I agree
that probably the whole "config ISA" could be removed if it's always
false anyway but that seems out of scope for this patch.

> 
> But anyway, adding a dependency to a always-false symbol doesn't make it
> less always-false :-)
> 
> Acked-by: Johannes Berg  # for ARCH=um

Thanks

> 
> 
> Certainly will be nice to get rid of this cruft for architectures that
> don't have it.
> 
> johannes

Yes, also, for s390 the broken NULL + port number access in the generic
inb()/outb() currently causes the only remaining clang warning on
defconfig builds.


Re: [PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-03-14 Thread Niklas Schnelle
On Tue, 2023-03-14 at 13:11 +0100, Niklas Schnelle wrote:
> We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
> Port access. In a future patch HAS_IOPORT=n will disable compilation of
> the I/O accessor functions inb()/outb() and friends on architectures
> which can not meaningfully support legacy I/O spaces such as s390. Also
> add dependencies on HAS_IOPORT for the ISA and HAVE_EISA config options
> as these busses always go along with HAS_IOPORT.
> 
> The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
> for HAS_IOPORT specific sections will be added in subsequent patches on
> a per subsystem basis.
> 
> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Niklas Schnelle 
> ---
> 

@Arnd, I swear I asked you and then added Signed-off-bys for all these
Co-developed-bys as suggested by checkpatch. Sadly that must have been
during my failed attempt of converting to b4 prep / b4 send before
sending this last Friday and then it got lost. It almost worked and is
a very nice work flow except that b4 currently can only use a single
list of To/Cc fields and for this treewide series that would probably
hit mail server limits. Added it now.

Thanks,
Niklas



Re: [PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-03-14 Thread Niklas Schnelle
On Tue, 2023-03-14 at 14:29 +0100, Arnd Bergmann wrote:
> On Tue, Mar 14, 2023, at 13:11, Niklas Schnelle wrote:
> > We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
> > Port access. In a future patch HAS_IOPORT=n will disable compilation of
> > the I/O accessor functions inb()/outb() and friends on architectures
> > which can not meaningfully support legacy I/O spaces such as s390. Also
> > add dependencies on HAS_IOPORT for the ISA and HAVE_EISA config options
> > as these busses always go along with HAS_IOPORT.
> > 
> > The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
> > for HAS_IOPORT specific sections will be added in subsequent patches on
> > a per subsystem basis.
> 
> I think it would be helpful to enumerate which architectures
> do not get HAS_IOPORT added, as they will be affected more.
> 
> > Co-developed-by: Arnd Bergmann 
> > Signed-off-by: Niklas Schnelle 
> 
> If there are no objections, I could send this first patch for the
> asm-generic tree as a preparation for 6.3, so we are able to merge
> the other patches through subsystem maintainer tree for 6.4.
> 
> arch/loongarch/ will now also need to select HAS_IOPORT
> uncontitionally, this architecture was added after you
> sent v2.

Ah right. Added "select HAS_IOPORT" for LoongArch.

> 
> > diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
> > index a98940e64243..5eeacc72e4da 100644
> > --- a/arch/parisc/Kconfig
> > +++ b/arch/parisc/Kconfig
> > @@ -47,6 +47,7 @@ config PARISC
> > select MODULES_USE_ELF_RELA
> > select CLONE_BACKWARDS
> > select TTY # Needed for pdc_cons.c
> > +   select HAS_IOPORT if PCI
> 
> It's also needed for EISA and I think you should select it
> from CONFIG_GSC in drivers/parisc/Kconfig for this purpose.
> 
> This could also be 'select HAS_IOPORT if PCI || EISA', but
> that would require removing the 'depends on HAS_IOPORT'
> under drivers/eisa/.

I did use "select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA" in m68k so
I think ideally we would handle both in the same way. I don't have a
strong preference but I think the "select HAS_IOPORT if ..." puts it
all in a single place which is nice. Also I think this would make it
more similar architectures with unconditional HAS_IOPORT that thus
don't need "depends on HAS_IOPORT" in their "config ISA" either. As
also pointed by your comment below for x86. So will try to go this
route.

> 
> > select HAVE_DEBUG_STACKOVERFLOW
> > select HAVE_ARCH_AUDITSYSCALL
> > select HAVE_ARCH_HASH
> > @@ -131,6 +132,7 @@ config STACKTRACE_SUPPORT
> > 
> >  config ISA_DMA_API
> > bool
> > +   depends on HAS_IOPORT
> > 
> 
> This line is not really needed since there is no way to
> enable ISA_DMA_API.

Removed

> 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index a6c4407d3ec8..f7de646c074a 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -188,6 +188,7 @@ config PPC
> > select GENERIC_SMP_IDLE_THREAD
> > select GENERIC_TIME_VSYSCALL
> > select GENERIC_VDSO_TIME_NS
> > +   select HAS_IOPORT   if PCI
> > select HAVE_ARCH_AUDITSYSCALL
> > select HAVE_ARCH_HUGE_VMALLOC   if HAVE_ARCH_HUGE_VMAP
> > select HAVE_ARCH_HUGE_VMAP  if PPC_RADIX_MMU || PPC_8xx
> > @@ -1070,7 +1071,6 @@ menu "Bus options"
> > 
> >  config ISA
> > bool "Support for ISA-bus hardware"
> > -   depends on PPC_CHRP
> > select PPC_I8259
> > help
> >   Find out whether you have ISA slots on your motherboard.  ISA is the
> 
> This line looks wrong, I think we should keep that dependency.
> Did you get a circular dependency if you leave it in?

I don't recall why this was removed. I guess it happened when I was
experimenting with adding "depends on HAS_IOPORT" for the ISA config
options but that ultimately lead to circular dependencies, must have
messed up when removing this here.

> 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index a825bf031f49..634dd42532f3 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -162,6 +162,7 @@ config X86
> > select GUP_GET_PXX_LOW_HIGH if X86_PAE
> > select HARDIRQS_SW_RESEND
> > select HARDLOCKUP_CHECK_TIMESTAMP   if X86_64
> > +   select HAS_IOPORT
> > select HAVE_ACPI_APEI   if ACPI
> > select HAVE_ACPI_APEI_NMI   if ACPI
> > select HAVE_ALIGNED_STRUCT_PAG

[PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-03-14 Thread Niklas Schnelle
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
Port access. In a future patch HAS_IOPORT=n will disable compilation of
the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390. Also
add dependencies on HAS_IOPORT for the ISA and HAVE_EISA config options
as these busses always go along with HAS_IOPORT.

The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
for HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis.

Co-developed-by: Arnd Bergmann 
Signed-off-by: Niklas Schnelle 
---
 arch/alpha/Kconfig  | 1 +
 arch/arm/Kconfig| 1 +
 arch/arm64/Kconfig  | 1 +
 arch/ia64/Kconfig   | 1 +
 arch/m68k/Kconfig   | 1 +
 arch/microblaze/Kconfig | 1 +
 arch/mips/Kconfig   | 2 ++
 arch/parisc/Kconfig | 2 ++
 arch/powerpc/Kconfig| 2 +-
 arch/riscv/Kconfig  | 1 +
 arch/sh/Kconfig | 1 +
 arch/sparc/Kconfig  | 1 +
 arch/um/Kconfig | 1 +
 arch/x86/Kconfig| 2 ++
 drivers/bus/Kconfig | 2 +-
 drivers/eisa/Kconfig| 1 +
 lib/Kconfig | 4 
 lib/Kconfig.kgdb| 3 ++-
 18 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 780d4673c3ca..a5c2b1aa46b0 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -27,6 +27,7 @@ config ALPHA
select AUDIT_ARCH
select GENERIC_CPU_VULNERABILITIES
select GENERIC_SMP_IDLE_THREAD
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_RELA
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e24a9820e12f..4acb5bc4b52a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -70,6 +70,7 @@ config ARM
select GENERIC_SCHED_CLOCK
select GENERIC_SMP_IDLE_THREAD
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1023e896d46b..b740019c4aee 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -145,6 +145,7 @@ config ARM64
select GENERIC_GETTIMEOFDAY
select GENERIC_VDSO_TIME_NS
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_PCI
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index d7e4a24e8644..2e13ec8263b9 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
select PCI_DOMAINS if PCI
select PCI_MSI
select PCI_SYSCALL if PCI
+   select HAS_IOPORT
select HAVE_ASM_MODVERSIONS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_EXIT_THREAD
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 82154952e574..40198a1ebe27 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -18,6 +18,7 @@ config M68K
select GENERIC_CPU_DEVICES
select GENERIC_IOMAP
select GENERIC_IRQ_SHOW
+   select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA
select HAVE_ARCH_SECCOMP
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ASM_MODVERSIONS
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index cc88af6fa7a4..211f338d6235 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -21,6 +21,7 @@ config MICROBLAZE
select GENERIC_IRQ_SHOW
select GENERIC_PCI_IOMAP
select GENERIC_SCHED_CLOCK
+   select HAS_IOPORT if PCI
select HAVE_ARCH_HASH
select HAVE_ARCH_KGDB
select HAVE_ARCH_SECCOMP
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index e2f3ca73f40d..64760fcd7b52 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -47,6 +47,7 @@ config MIPS
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select GUP_GET_PXX_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT
+   select HAS_IOPORT if !NO_IOPORT_MAP
select HAVE_ARCH_COMPILER_H
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB if MIPS_FP_SUPPORT
@@ -3120,6 +3121,7 @@ config PCI_DRIVERS_LEGACY
 # users to choose the right thing ...
 #
 config ISA
+   depends on HAS_IOPORT
bool
 
 config TC
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index a98940e64243..5eeacc72e4da 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -47,6 +47,7 @@ config PARISC
select MODULES_USE_ELF_RELA
select CLONE_BACKWARDS
select TTY # Needed for pdc_cons.c
+   select HAS_IOPORT if PCI
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HASH
@@ -131,6 +132,7 @@ config STA

Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary

2022-05-06 Thread Niklas Schnelle
On Fri, 2022-05-06 at 14:53 +0200, Arnd Bergmann wrote:
> On Fri, May 6, 2022 at 2:27 PM Maciej W. Rozycki  wrote:
> > On Fri, 6 May 2022, Arnd Bergmann wrote:
> > 
> > > >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > > > pattern put on the bus/in the TLP in the address phase.  So what is 
> > > > there
> > > > inherent to the s390 architecture that prevents that different bit 
> > > > pattern
> > > > from being used?
> > > 
> > > The hardware design for PCI on s390 is very different from any other
> > > architecture, and more abstract. Rather than implementing MMIO register
> > > access as pointer dereference, this is a separate CPU instruction that
> > > takes a device/bar plus offset as arguments rather than a pointer, and
> > > Linux encodes this back into a fake __iomem token.
> > 
> >  OK, that seems to me like a reasonable and quite a clean design (on the
> > hardware side).
> > 
> >  So what happens if the instruction is given an I/O rather than memory BAR
> > as the relevant argument?  Is the address space indicator bit (bit #0)
> > simply ignored or what?
> 
> Not sure. My best guess is that it would actually work as you'd expect,
> but is deliberately left out of the architecture specification so they don't
> have to to validate the correctness.  Note that only a small number of
> PCIe cards are actually supported by IBM, and I think the firmware
> only passes devices to the OS if they are whitelisted.
> 
> Arnd

Yes, though in Linux we do try hard to work with whatever is plugged
in. We did benefit from this in the past working with a new NIC from a
different vendor with 0 additional changes. Also you can use vfio-pci
to pass-through arbitrary PCI devices to a QEMU emulating s390x.



Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary

2022-05-06 Thread Niklas Schnelle
On Fri, 2022-05-06 at 13:27 +0100, Maciej W. Rozycki wrote:
> On Fri, 6 May 2022, Arnd Bergmann wrote:
> 
> > >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > > pattern put on the bus/in the TLP in the address phase.  So what is there
> > > inherent to the s390 architecture that prevents that different bit pattern
> > > from being used?
> > 
> > The hardware design for PCI on s390 is very different from any other
> > architecture, and more abstract. Rather than implementing MMIO register
> > access as pointer dereference, this is a separate CPU instruction that
> > takes a device/bar plus offset as arguments rather than a pointer, and
> > Linux encodes this back into a fake __iomem token.
> 
>  OK, that seems to me like a reasonable and quite a clean design (on the 
> hardware side).
> 
>  So what happens if the instruction is given an I/O rather than memory BAR 
> as the relevant argument?  Is the address space indicator bit (bit #0) 
> simply ignored or what?

See my answer to Arnd for some more background but there simply isn't a
way to formulate an I/O access. In the old style PCI instructions the
BAR number and the function handle are put in a register before the
access. BAR number 15 is used to access config space. If there is no
BAR for that number the instruction fails with a non-zero CC.

> 
> > >  But that has nothing to do with the presence or absence of any specific
> > > processor instructions.  It's just a limitation of bus glue.  So I guess
> > > it's just that all PCI/PCIe glue logic implementations for s390 have such
> > > a limitation, right?
> > 
> > There are separate instructions for PCI memory and config space, but
> > no instructions for I/O space, or for non-PCI MMIO that it could be mapped
> > into.
> 
>  The PCI configuration space was retrofitted into x86 systems (and is 
> accessed in an awkward manner with them), but with a new design such a 
> clean approach is most welcome IMHO.  Thank you for your explanation.
> 
>   Maciej

Well our design is a retrofit too considering s390x is a direct
decendent of IBM System/360 which one could argue to have been the
first ISA. But yes as PCI support was only added with PCIe and with a
machine level hypervisor already in place we do get shielded a lot from
the gritty details.



Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary

2022-05-06 Thread Niklas Schnelle
On Fri, 2022-05-06 at 13:33 +0200, Arnd Bergmann wrote:
> On Fri, May 6, 2022 at 12:20 PM Maciej W. Rozycki  wrote:
> > On Thu, 5 May 2022, Arnd Bergmann wrote:
> >  I think I'm missing something here.  IIUC we're talking about a PCI/PCIe
> > bus used with s390 hardware, right?
> > 
> >  (It has to be PCI/PCIe, because other than x86/IA-64 host buses there are
> > only PCI/PCIe and EISA/ISA buses out there that define I/O access cycles
> > and EISA/ISA have long been obsoleted except perhaps from some niche use.)
> > 
> >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > pattern put on the bus/in the TLP in the address phase.  So what is there
> > inherent to the s390 architecture that prevents that different bit pattern
> > from being used?
> 
> The hardware design for PCI on s390 is very different from any other
> architecture, and more abstract. Rather than implementing MMIO register
> access as pointer dereference, this is a separate CPU instruction that
> takes a device/bar plus offset as arguments rather than a pointer, and
> Linux encodes this back into a fake __iomem token.

Correct, we have since gained new PCI load/store instructions that
actually do take a virtual address that gets address translated to a
"physical address" for the PCI BARs. The "physical address" we get from
the platform (again via special instructions). I put "physical address"
in quotes because while they are conceptually physical addresses and
they are translated to by MMU translation tables, accessing their
virtual mapping with any instruction other then the special PCI
load/store instructions will cause addressing exceptions. So we still
don't have real MMIO but something that looks a lot more like MMIO than
we used to.

> 
> >  If anything, I could imagine the same limitation as with current POWER9
> > implementations, that is whatever glue is used to wire PCI/PCIe to the
> > rest of the system does not implement a way to use said bit pattern (which
> > has nothing to do with the POWER9 processor instruction set).
> > 
> >  But that has nothing to do with the presence or absence of any specific
> > processor instructions.  It's just a limitation of bus glue.  So I guess
> > it's just that all PCI/PCIe glue logic implementations for s390 have such
> > a limitation, right?
> 
> There are separate instructions for PCI memory and config space, but
> no instructions for I/O space, or for non-PCI MMIO that it could be mapped
> into.
> 
>Arnd

The config space is still accessed with the old style PCI load/store
instructions via a special extra BAR.

But yes overall on s390x we can only access PCI(e) devices via special
instructions not via real MMIO and also the OS has no direct access to
the registers of the PHB which are only accessible to firmware. 

Maybe as a bit of further background it's also important to note that
on s390x all Operating Systems run inside a hypervisor. On the lowest
level any OS can run this is a non-paging machine level hypervisor. For
PCI that means that we always have a kind of pass-through access though
without paging and hardware support for the memory partitioning this is
of course relatively simple.



Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary

2022-05-06 Thread Niklas Schnelle
On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote:
> 
> On Thu, 5 May 2022, Bjorn Helgaas wrote:
> 
> > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote:
> > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas  wrote:
> > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> > > > > The main goal is to avoid c), which is what happens on s390, but
> > > > > can also happen elsewhere. Catching b) would be nice as well,
> > > > > but is much harder to do from generic code as you'd need an
> > > > > architecture specific inline asm statement to insert a ex_table
> > > > > fixup, or a runtime conditional on each access.
> > > > 
> > > > Or s390 could implement its own inb().
> > > > 
> > > > I'm hearing that generic powerpc kernels have to run both on machines
> > > > that have I/O port space and those that don't.  That makes me think
> > > > s390 could do something similar.
> > > 
> > > No, this is actually the current situation, and it makes absolutely no
> > > sense. s390 has no way of implementing inb()/outb() because there
> > > are no instructions for it and it cannot tunnel them through a virtual
> > > address mapping like on most of the other architectures. (it has special
> > > instructions for accessing memory space, which is not the same as
> > > a pointer dereference here).
> > > 
> > > The existing implementation gets flagged as a NULL pointer dereference
> > > by a compiler warning because it effectively is.
> > 
> > I think s390 currently uses the inb() in asm-generic/io.h, i.e.,
> > "__raw_readb(PCI_IOBASE + addr)".  I understand that's a NULL pointer
> > dereference because the default PCI_IOBASE is 0.
> > 
> > I mooted a s390 inb() implementation like "return ~0" because that's
> > what happens on most arches when there's no device to respond to the
> > inb().
> > 
> > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter
> > drivers that use I/O ports in some cases but not others.  But maybe
> > it's the most practical way.
> > 
> 
> Do you mean, "the most practical way to avoid a compiler warning on s390"? 
> What about "#pragma GCC diagnostic ignored"?

This actually happens with clang. Apart from that, I think this would
also fall under the same argument as the original patch Linus unpulled.
We would just paint over someting that we know at compile time won't
work:

https://lore.kernel.org/lkml/CAHk-=wg80je=k7madf4e7wrrnp37e3qh6y10svhdc7o8sz_...@mail.gmail.com/



Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary

2022-05-06 Thread Niklas Schnelle
On Thu, 2022-05-05 at 14:53 -0500, Bjorn Helgaas wrote:
> On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote:
> > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas  wrote:
> > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> > > > The main goal is to avoid c), which is what happens on s390, but
> > > > can also happen elsewhere. Catching b) would be nice as well,
> > > > but is much harder to do from generic code as you'd need an
> > > > architecture specific inline asm statement to insert a ex_table
> > > > fixup, or a runtime conditional on each access.
> > > 
> > > Or s390 could implement its own inb().
> > > 
> > > I'm hearing that generic powerpc kernels have to run both on machines
> > > that have I/O port space and those that don't.  That makes me think
> > > s390 could do something similar.
> > 
> > No, this is actually the current situation, and it makes absolutely no
> > sense. s390 has no way of implementing inb()/outb() because there
> > are no instructions for it and it cannot tunnel them through a virtual
> > address mapping like on most of the other architectures. (it has special
> > instructions for accessing memory space, which is not the same as
> > a pointer dereference here).
> > 
> > The existing implementation gets flagged as a NULL pointer dereference
> > by a compiler warning because it effectively is.
> 
> I think s390 currently uses the inb() in asm-generic/io.h, i.e.,
> "__raw_readb(PCI_IOBASE + addr)".  I understand that's a NULL pointer
> dereference because the default PCI_IOBASE is 0.
> 
> I mooted a s390 inb() implementation like "return ~0" because that's
> what happens on most arches when there's no device to respond to the
> inb().
> 
> The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter
> drivers that use I/O ports in some cases but not others.  But maybe
> it's the most practical way.
> 
> Bjorn

I fear such stubs are kind of equivalent to my previous patch doing the
same in asm-generic/io.h that was pulled and then unpulled by Linus.
Maybe it would be slightly different if instead of a warning outX()
would just be a NOP and inX() just returned ~0 but we're in essence
pretending that we have these functions when we know they are nonsense.

Another argument I see is that as shown by POWER9 we might start to see
more platforms that just can't do I/O port access. E.g. I would also be
surprised if Apple's M1 has I/O port access. Sooner or later I expect
distributions on some platforms to only support such systems. For
example on ppc a server distribution might only support IBM POWER
without I/O port support before too long. Then having HAS_IOPORT allows
to get rid of drivers that won't work anyway.

There are also reports of probing a driver with I/O ports causing a
system crash on systems without I/O port support. For example in this
answer by John Garry (added so he may supply more information):

https://lore.kernel.org/lkml/db043b76-880d-5fad-69cf-96abcd9cd...@huawei.com/



Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary

2022-05-05 Thread Niklas Schnelle
On Wed, 2022-05-04 at 23:31 +0200, Arnd Bergmann wrote:
> On Wed, May 4, 2022 at 11:08 PM Bjorn Helgaas  wrote:
> > On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote:
> > > We introduce a new HAS_IOPORT Kconfig option to indicate support for
> > > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation
> > > of the I/O accessor functions inb()/outb() and friends on architectures
> > > which can not meaningfully support legacy I/O spaces such as s390 or
> > > where such support is optional.
> > 
> > So you plan to drop inb()/outb() on architectures where I/O port space
> > is optional?  So even platforms that have I/O port space may not be
> > able to use it?
> > 
> > This feels like a lot of work where the main benefit is to keep
> > Kconfig from offering drivers that aren't of interest on s390.
> > 
> > Granted, there may be issues where inb()/outb() does the wrong thing
> > such as dereferencing null pointers when I/O port space isn't
> > implemented.  I think that's a defect in inb()/outb() and could be
> > fixed there.
> 
> The current implementation in asm-generic/io.h implements inb()/outb()
> using readb()/writeb() with a fixed architecture specific offset.
> 
> There are three possible things that can happen here:
> 
> a) there is a host bridge driver that maps its I/O ports to this window,
> and everything works
> b) the address range is reserved and accessible but no host bridge
>driver has mapped its registers there, so an access causes a
>page fault
> c) the architecture does not define an offset, and accessing low I/O
> ports ends up as a NULL pointer dereference
> 
> The main goal is to avoid c), which is what happens on s390, but
> can also happen elsewhere. Catching b) would be nice as well,
> but is much harder to do from generic code as you'd need an
> architecture specific inline asm statement to insert a ex_table
> fixup, or a runtime conditional on each access.
> 
>  Arnd

Yes and to add to this, we did try a local solution in inb()/outb()
before. This added a warning when they are used and we know at compile
time that we're dealing with case c). This approach was nacked by Linus
though as we were turning a compile time known broken case into a
runtime one:

https://lore.kernel.org/lkml/CAHk-=wg80je=k7madf4e7wrrnp37e3qh6y10svhdc7o8sz_...@mail.gmail.com/

I do agree with this assesment and think this is the rightâ„¢ approach
but it is more churn as can be seen by the size of this series. I think
longer term it could be valuable though especially if more platforms
phase out I/O port support like POWER9 for which this also allows
filtering what drivers will never work.



[RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary

2022-04-29 Thread Niklas Schnelle
We introduce a new HAS_IOPORT Kconfig option to indicate support for
I/O Port access. In a future patch HAS_IOPORT=n will disable compilation
of the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390 or
where such support is optional. The "depends on" relations on HAS_IOPORT
in drivers as well as ifdefs for HAS_IOPORT specific sections will be
added in subsequent patches on a per subsystem basis.

Co-developed-by: Arnd Bergmann 
Signed-off-by: Niklas Schnelle 
---
 arch/alpha/Kconfig  | 1 +
 arch/arm/Kconfig| 1 +
 arch/arm64/Kconfig  | 1 +
 arch/ia64/Kconfig   | 1 +
 arch/m68k/Kconfig   | 1 +
 arch/microblaze/Kconfig | 1 +
 arch/mips/Kconfig   | 1 +
 arch/parisc/Kconfig | 1 +
 arch/powerpc/Kconfig| 1 +
 arch/riscv/Kconfig  | 1 +
 arch/sh/Kconfig | 1 +
 arch/sparc/Kconfig  | 1 +
 arch/x86/Kconfig| 1 +
 drivers/bus/Kconfig | 2 +-
 lib/Kconfig | 4 
 lib/Kconfig.kgdb| 1 +
 16 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 7d0d26b5b3f5..2b9cf1b0bdb8 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -27,6 +27,7 @@ config ALPHA
select AUDIT_ARCH
select GENERIC_CPU_VULNERABILITIES
select GENERIC_SMP_IDLE_THREAD
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_RELA
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2e8091e2d8a8..603ce00033a5 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -69,6 +69,7 @@ config ARM
select GENERIC_SCHED_CLOCK
select GENERIC_SMP_IDLE_THREAD
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 20ea89d9ac2f..234dc89a7654 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -136,6 +136,7 @@ config ARM64
select GENERIC_GETTIMEOFDAY
select GENERIC_VDSO_TIME_NS
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_PCI
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index cb93769a9f2a..0fffe5130a80 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
select PCI_DOMAINS if PCI
select PCI_MSI
select PCI_SYSCALL if PCI
+   select HAS_IOPORT
select HAVE_ASM_MODVERSIONS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_EXIT_THREAD
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 936cce42ae9a..54bf0a40c2f0 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -18,6 +18,7 @@ config M68K
select GENERIC_CPU_DEVICES
select GENERIC_IOMAP
select GENERIC_IRQ_SHOW
+   select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA
select HAVE_ASM_MODVERSIONS
select HAVE_DEBUG_BUGVERBOSE
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_HAS_NO_UNALIGNED
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 8cf429ad1c84..966a6682f1fc 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -21,6 +21,7 @@ config MICROBLAZE
select GENERIC_IRQ_SHOW
select GENERIC_PCI_IOMAP
select GENERIC_SCHED_CLOCK
+   select HAS_IOPORT if PCI
select HAVE_ARCH_HASH
select HAVE_ARCH_KGDB
select HAVE_ARCH_SECCOMP
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index de3b32a507d2..4c55df08d6f1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -47,6 +47,7 @@ config MIPS
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT
+   select HAS_IOPORT
select HAVE_ARCH_COMPILER_H
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB if MIPS_FP_SUPPORT
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 52e550b45692..741c5c64c173 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -46,6 +46,7 @@ config PARISC
select MODULES_USE_ELF_RELA
select CLONE_BACKWARDS
select TTY # Needed for pdc_cons.c
+   select HAS_IOPORT if PCI || EISA
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HASH
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 174edabb74fa..7133cc35b777 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -182,6 +182,7 @@ config PPC
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select GENERIC_VDSO_TIME_NS
+   select HAS_

Re: [RFC 02/32] Kconfig: introduce HAS_IOPORT option and select it as necessary

2021-12-31 Thread Niklas Schnelle
On Thu, 2021-12-30 at 16:44 +1300, Michael Schmitz wrote:
> Hi Arnd,
> 
> Am 30.12.2021 um 14:48 schrieb Arnd Bergmann:
> > On Tue, Dec 28, 2021 at 11:15 PM Michael Schmitz  
> > wrote:
> > > Am 29.12.2021 um 16:41 schrieb Arnd Bergmann:
> > > > On Tue, Dec 28, 2021 at 8:20 PM Michael Schmitz  
> > > > wrote:
---8<---
> 
> > What some other architectures do is to rely on inb()/outb() to have a
> > zero-based offset, and use an io_offset in PCI buses to ensure that a
> > low port number on the bus gets translated into a pointer value for the
> > virtual mapping in the kernel, which is then represented as an unsigned
> > int.
> 
> M54xx does just that for Coldfire:
> 
> arch/m68k/include/asm/io_no.h:
> #define PCI_IO_PA 0xf800  /* Host physical address */
> 
> (used to set PCI BAR mappings, so matches your definition above).
> 
> All other (MMU) m68k users of inb()/outb() apply an io_offset in the 
> platform specific address translation:
> 
> 
---8<---
> So as long as support for any of the m68k PCI or ISA bridges is selected 
> in the kernel config, the appropriate IO space mapping is applied. If no 
> support for PCI or ISA bridges is selected, we already fall back to zero 
> offset mapping (but as far as I can tell, it shouldn't be possible to 
> build a kernel without bridge support but drivers that require it).
> 
> > As this is indistinguishable from architectures that just don't have
> > a base address for I/O ports (we unfortunately picked 0 as the default
> > PCI_IOBASE value), my suggestion was to start marking architectures
> > that may have this problem as using HAS_IOPORT in order to keep
> > the existing behavior unchanged. If m68k does not suffer from this,
> > making HAS_IOPORT conditional on those config options that actually
> > need it would of course be best.
> 
> Following your description, HAS_IOPORT would be required for neither of 
> PCI, ISA or ATARI_ROM_ISA ??
> 

No, HAS_IOPORT being set just means that inb() etc. exist and are
functional be it as special instructions like on x86 or via an I/O
address offset. As I understand it if you do have PCI, ISA or
ATARI_ROM_ISA they are functional. If none of them are set and your
zero offset mapping means these accessors can't actually be used you
could make the declerations ifdeffed on CONFIG_HAS_IOPORT to detect the
cases where somone managed to build drivers that require them and that
would result in a compile time error instead of silently, or with a
NULL pointer warning, compiling code that won't work.



[RFC 02/32] Kconfig: introduce HAS_IOPORT option and select it as necessary

2021-12-27 Thread Niklas Schnelle
We introduce a new HAS_IOPORT Kconfig option to gate support for
I/O port access. In a future patch HAS_IOPORT=n will disable compilation
of the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces. On these platforms
inb()/outb() etc are currently just stubs in asm-generic/io.h which when
called will cause a NULL pointer access which some compilers actually
detect and warn about.

The dependencies on HAS_IOPORT in drivers as well as ifdefs for
HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis. Then a final patch will ifdef the I/O access
functions on HAS_IOPORT thus turning any use not gated by HAS_IOPORT
into a compile-time warning.

Link: 
https://lore.kernel.org/lkml/CAHk-=wg80je=k7madf4e7wrrnp37e3qh6y10svhdc7o8sz_...@mail.gmail.com/
Co-developed-by: Arnd Bergmann 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Niklas Schnelle 
---
 arch/alpha/Kconfig  | 1 +
 arch/arc/Kconfig| 1 +
 arch/arm/Kconfig| 1 +
 arch/arm64/Kconfig  | 1 +
 arch/ia64/Kconfig   | 1 +
 arch/m68k/Kconfig   | 1 +
 arch/microblaze/Kconfig | 1 +
 arch/mips/Kconfig   | 1 +
 arch/parisc/Kconfig | 1 +
 arch/powerpc/Kconfig| 1 +
 arch/riscv/Kconfig  | 1 +
 arch/sh/Kconfig | 1 +
 arch/sparc/Kconfig  | 1 +
 arch/x86/Kconfig| 1 +
 drivers/bus/Kconfig | 2 +-
 lib/Kconfig | 4 
 lib/Kconfig.kgdb| 1 +
 17 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 4e87783c90ad..472a0c5e4c2f 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -28,6 +28,7 @@ config ALPHA
select AUDIT_ARCH
select GENERIC_CPU_VULNERABILITIES
select GENERIC_SMP_IDLE_THREAD
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_RELA
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index b4ae6058902a..b3911ebbd237 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -27,6 +27,7 @@ config ARC
select GENERIC_PENDING_IRQ if SMP
select GENERIC_SCHED_CLOCK
select GENERIC_SMP_IDLE_THREAD
+   select HAS_IOPORT
select HAVE_ARCH_KGDB
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARC_MMU_V4
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c2724d986fa0..605709b6eecb 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -66,6 +66,7 @@ config ARM
select GENERIC_SCHED_CLOCK
select GENERIC_SMP_IDLE_THREAD
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..a8b199a40c8f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -135,6 +135,7 @@ config ARM64
select GENERIC_GETTIMEOFDAY
select GENERIC_VDSO_TIME_NS
select HARDIRQS_SW_RESEND
+   select HAS_IOPORT
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_PCI
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 1e33666fa679..672aa2a88b19 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -24,6 +24,7 @@ config IA64
select PCI_DOMAINS if PCI
select PCI_MSI
select PCI_SYSCALL if PCI
+   select HAS_IOPORT
select HAVE_ASM_MODVERSIONS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_EXIT_THREAD
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 0b50da08a9c5..926d97c33828 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -16,6 +16,7 @@ config M68K
select GENERIC_CPU_DEVICES
select GENERIC_IOMAP
select GENERIC_IRQ_SHOW
+   select HAS_IOPORT
select HAVE_AOUT if MMU
select HAVE_ASM_MODVERSIONS
select HAVE_DEBUG_BUGVERBOSE
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 59798e43cdb0..213ef2940079 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -21,6 +21,7 @@ config MICROBLAZE
select GENERIC_IRQ_SHOW
select GENERIC_PCI_IOMAP
select GENERIC_SCHED_CLOCK
+   select HAS_IOPORT if PCI
select HAVE_ARCH_HASH
select HAVE_ARCH_KGDB
select HAVE_ARCH_SECCOMP
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 0215dc1529e9..87e6e7c29493 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -47,6 +47,7 @@ config MIPS
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT
+   select HAS_IOPORT
select HAVE_ARCH_COMPILER_H
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB if 

Re: [PATCH] pci: Rename pcibios_add_device to match

2021-09-15 Thread Niklas Schnelle
On Tue, 2021-09-14 at 01:27 +1000, Oliver O'Halloran wrote:
> The general convention for pcibios_* hooks is that they're named after
> the corresponding pci_* function they provide a hook for. The exception
> is pcibios_add_device() which provides a hook for pci_device_add(). This
> has been irritating me for years so rename it.
> 
> Also, remove the export of the microblaze version. The only caller
> must be compiled as a built-in so there's no reason for the export.
> 
> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/microblaze/pci/pci-common.c   | 3 +--
>  arch/powerpc/kernel/pci-common.c   | 2 +-
>  arch/powerpc/platforms/powernv/pci-sriov.c | 2 +-
>  arch/s390/pci/pci.c| 2 +-
>  arch/sparc/kernel/pci.c| 2 +-
>  arch/x86/pci/common.c  | 2 +-
>  drivers/pci/pci.c  | 4 ++--
>  drivers/pci/probe.c| 4 ++--
>  include/linux/pci.h| 2 +-
>  9 files changed, 11 insertions(+), 12 deletions(-)
> 
> 
.. snip ..
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index e7e6788d75a8..ded3321b7208 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -561,7 +561,7 @@ static void zpci_cleanup_bus_resources(struct zpci_dev 
> *zdev)
>   zdev->has_resources = 0;
>  }
>  
> -int pcibios_add_device(struct pci_dev *pdev)
> +int pcibios_device_add(struct pci_dev *pdev)
>  {
>   struct zpci_dev *zdev = to_zpci(pdev);
>   struct resource *res;
> 
.. snip ..
>


I agree with your assesment this is indeed confusing. Interestingly
pcibios_release_device() also doesn't follow the convention exactly as
it is called from pci_release_dev() but at least that isn't very
confusing.

So for the arch/s390/pci bit:

Acked-by: Niklas Schnelle 



[PATCH 1/1] powerpc: Drop superfluous pci_dev_is_added() calls

2021-09-10 Thread Niklas Schnelle
On powerpc, pci_dev_is_added() is called as part of SR-IOV fixups
that are done under pcibios_add_device() which in turn is only called in
pci_device_add() whih is called when a PCI device is scanned.

Now pci_dev_assign_added() is called in pci_bus_add_device() which is
only called after scanning the device. Thus pci_dev_is_added() is always
false and can be dropped.

Signed-off-by: Niklas Schnelle 
---
 arch/powerpc/platforms/powernv/pci-sriov.c | 6 --
 arch/powerpc/platforms/pseries/setup.c | 3 +--
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..deddbb233fde 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -9,9 +9,6 @@
 
 #include "pci.h"
 
-/* for pci_dev_is_added() */
-#include "../../../../drivers/pci/pci.h"
-
 /*
  * The majority of the complexity in supporting SR-IOV on PowerNV comes from
  * the need to put the MMIO space for each VF into a separate PE. Internally
@@ -228,9 +225,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev 
*pdev)
 
 void pnv_pci_ioda_fixup_iov(struct pci_dev *pdev)
 {
-   if (WARN_ON(pci_dev_is_added(pdev)))
-   return;
-
if (pdev->is_virtfn) {
struct pnv_ioda_pe *pe = pnv_ioda_get_pe(pdev);
 
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index f79126f16258..2188054470c1 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,7 +74,6 @@
 #include 
 
 #include "pseries.h"
-#include "../../../../drivers/pci/pci.h"
 
 DEFINE_STATIC_KEY_FALSE(shared_processor);
 EXPORT_SYMBOL(shared_processor);
@@ -750,7 +749,7 @@ static void pseries_pci_fixup_iov_resources(struct pci_dev 
*pdev)
const int *indexes;
struct device_node *dn = pci_device_to_OF_node(pdev);
 
-   if (!pdev->is_physfn || pci_dev_is_added(pdev))
+   if (!pdev->is_physfn)
return;
/*Firmware must support open sriov otherwise dont configure*/
indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL);
-- 
2.25.1



[PATCH 0/1] Arch use of pci_dev_is_added()

2021-09-10 Thread Niklas Schnelle
Hi Bjorn, Hi Michael,

In my proposal to make pci_dev_is_added() more regularly usable by arch code
you mentioned[0] that you believe the uses in arch/powerpc are not necessary
anymore. From code reading I agree and so does Oliver O'Halloran[1].

So as promised here is a patch removing them. I only compile tested this as
I don't have access to a powerpc system.

I've also looked a bit more into our use in s390 and as dicussed previously
I don't think we can cleanly get rid of the existing one in
arch/s390/pci_sysfs.c:recover_store() because we need to distinguish an already
removed pdev just by looking at the pdev itself.

As for new uses I think in the upcoming automatic recovery code we can rely on
the fact that a removed device has pdev->driver == NULL and don't need
pci_dev_is_added() but it would make things clearer. I also noticed that before
commit 44bda4b7d26e9 ("PCI: Fix is_added/is_busmaster race condition") there
was simply a pdev->is_added flag that was cleanly accessible by arch code. So
I wanted to ask for your advice.

Thanks,
Niklas

[0] https://lore.kernel.org/lkml/20210825190444.GA3593752@bjorn-Precision-5520/
[1] 
https://lore.kernel.org/lkml/caosf1cfyuf9faesnparj+7w0mktpvtcm8vxjhdsfsndc6k_...@mail.gmail.com/

Niklas Schnelle (1):
  powerpc: Drop superfluous pci_dev_is_added() calls

 arch/powerpc/platforms/powernv/pci-sriov.c | 6 --
 arch/powerpc/platforms/pseries/setup.c | 3 +--
 2 files changed, 1 insertion(+), 8 deletions(-)

-- 
2.25.1



Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-08 Thread Niklas Schnelle
On Wed, 2021-09-08 at 11:37 +1000, Oliver O'Halloran wrote:
> On Tue, Sep 7, 2021 at 10:21 PM Niklas Schnelle  
> wrote:
> > On Tue, 2021-09-07 at 10:45 +0200, Niklas Schnelle wrote:
> > > On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote:
> > > > On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle  
> > > > wrote:
> > > > > Patch 3 I already sent separately resulting in the discussion below 
> > > > > but without
> > > > > a final conclusion.
> > > > > 
> > > > > https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/
> > > > > 
> > > > > I believe even though there were some doubts about the use of
> > > > > pci_dev_is_added() by arch code the existing uses as well as the use 
> > > > > in the
> > > > > final patch of this series warrant this export.
> > > > 
> > > > The use of pci_dev_is_added() in arch/powerpc was because in the past
> > > > pci_bus_add_device() could be called before pci_device_add(). That was
> > > > fixed a while ago so It should be safe to remove those calls now.
> > > 
> > > Hmm, ok that confirms Bjorns suspicion and explains how it came to be.
> > > I can certainly sent a patch for that. This would then leave only the
> > > existing use in s390 which I added because of a dead lock prevention
> > > and explained here:
> > > https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.ca...@linux.ibm.com/
> > > 
> > > Plus the need to use it in the recovery code of this series. I think in
> > > the EEH code the need for a similar check is alleviated by the checks
> > > in the beginning of
> > > arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially
> > > eeh_slot_presence_check() which checks presence via the hotplug slot.
> > > I guess we could use our own state tracking in a similar way but felt
> > > like pci_dev_is_added() is the more logical choice.
> 
> The slot check is mainly there to prevent attempts to "recover"
> devices that have been surprise removed (i.e NVMe hot-unplug). The
> actual recovery process operates off the eeh_pe tree which is frozen
> in place when an error is detected. If a pci_dev is added or removed
> it's not really a problem since those are only ever looked at when
> notifying drivers which is done with the rescan_remove lock held. 

Thanks for the explanation.

> That
> said, I wouldn't really encourage anyone to follow the EEH model since
> it's pretty byzantine.
> 
> > Looking into this again, I think we actually can't easily track this
> > state ourselves outside struct pci_dev. The reason for this is that
> > when e.g. arch/s390/pci/pci_sysfs.c:recover_store() removes the struct
> > pci_dev and scans it again the new struct pci_dev re-uses the same
> > struct zpci_dev because from a platform point of view the PCI device
> > was never removed but only disabled and re-enabled. Thus we can only
> > distinguish the stale struct pci_dev by looking at things stored in
> > struct pci_dev itself.
> 
> IMO the real problem is removing and re-adding the pci_dev. I think
> it's something that's done largely because the PCI core doesn't really
> provide any better mechanism for getting a device back into a
> known-good state so it's abused to implement error recovery. This is
> something that's always annoyed me since it conflates recovery with
> hotplug. After a hot-(un)plug we might have a different device or no
> device. In the recovery case we expect to start and end with the same
> device. Why not apply the same logic to the pci_dev?

For us there are two cases. First The existing
/sys/bus/pci/devices//recover attribute. This does the pci_dev
remove and re-add that you mention and thus we end up with a ne pci_dev
afterwards and I agree that is kind of a dumb way to recover which
(too?) closely resembles unplug/re-plug.

Secondly the automatic error recovery added in this series. Here we
only attempt recovery if we have a driver bound that supports the error
callbacks thus always keeping the same pci_dev. If there is no driver
we give up automatic recovery and are back at the situation without
this series.

> 
> Something I was tinkering with before I left IBM was re-working the
> way EEH handles recovering devices that don't have a driver with error
> handling callbacks to something like:
> 
> 1. unbind the driver
> 2. pci_save_state()
> 3. do the reset
> 4. pci_restore_state()
> 5. re-bind the driver
> 
> That would allow keeping the pci_dev around and let me delete a pile
> of conf

Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-07 Thread Niklas Schnelle
On Tue, 2021-09-07 at 10:45 +0200, Niklas Schnelle wrote:
> On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote:
> > On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle  
> > wrote:
> > > Patch 3 I already sent separately resulting in the discussion below but 
> > > without
> > > a final conclusion.
> > > 
> > > https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/
> > > 
> > > I believe even though there were some doubts about the use of
> > > pci_dev_is_added() by arch code the existing uses as well as the use in 
> > > the
> > > final patch of this series warrant this export.
> > 
> > The use of pci_dev_is_added() in arch/powerpc was because in the past
> > pci_bus_add_device() could be called before pci_device_add(). That was
> > fixed a while ago so It should be safe to remove those calls now.
> 
> Hmm, ok that confirms Bjorns suspicion and explains how it came to be.
> I can certainly sent a patch for that. This would then leave only the
> existing use in s390 which I added because of a dead lock prevention
> and explained here:
> https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.ca...@linux.ibm.com/
> 
> Plus the need to use it in the recovery code of this series. I think in
> the EEH code the need for a similar check is alleviated by the checks
> in the beginning of
> arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially
> eeh_slot_presence_check() which checks presence via the hotplug slot.
> I guess we could use our own state tracking in a similar way but felt
> like pci_dev_is_added() is the more logical choice.

Looking into this again, I think we actually can't easily track this
state ourselves outside struct pci_dev. The reason for this is that
when e.g. arch/s390/pci/pci_sysfs.c:recover_store() removes the struct
pci_dev and scans it again the new struct pci_dev re-uses the same
struct zpci_dev because from a platform point of view the PCI device
was never removed but only disabled and re-enabled. Thus we can only
distinguish the stale struct pci_dev by looking at things stored in
struct pci_dev itself.

That said, I think for the recovery case we might be able to drop the
pci_dev_is_added() and rely on pdev->driver != NULL which we check
anyway and that should catch any PCI device that was already removed.



Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-07 Thread Niklas Schnelle
On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote:
> On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle  wrote:
> > Patch 3 I already sent separately resulting in the discussion below but 
> > without
> > a final conclusion.
> > 
> > https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/
> > 
> > I believe even though there were some doubts about the use of
> > pci_dev_is_added() by arch code the existing uses as well as the use in the
> > final patch of this series warrant this export.
> 
> The use of pci_dev_is_added() in arch/powerpc was because in the past
> pci_bus_add_device() could be called before pci_device_add(). That was
> fixed a while ago so It should be safe to remove those calls now.

Hmm, ok that confirms Bjorns suspicion and explains how it came to be.
I can certainly sent a patch for that. This would then leave only the
existing use in s390 which I added because of a dead lock prevention
and explained here:
https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.ca...@linux.ibm.com/

Plus the need to use it in the recovery code of this series. I think in
the EEH code the need for a similar check is alleviated by the checks
in the beginning of
arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially
eeh_slot_presence_check() which checks presence via the hotplug slot.
I guess we could use our own state tracking in a similar way but felt
like pci_dev_is_added() is the more logical choice.

> 
> > Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit
> > e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which
> > already exported pci_dev_trylock(). In the final patch we make use of
> > pci_dev_lock() to wait for any other exclusive uses of the pdev to be 
> > finished
> > before starting recovery.
> 
> Hmm, I noticed the EEH
> (arch/powerpc/kernel/eeh_driver.c:eeh_pe_report_edev())  and the
> generic PCIe error recovery code (see
> drivers/pci/pcie/err.c:report_error_detected()) only call
> device_lock() before entering the driver's error handling callbacks. I
> wonder if they should be using pci_dev_lock() instead. The only real
> difference is that pci_dev_lock() will also block user space from
> accessing the device's config space while error recovery is in
> progress which seems sensible enough.

I agree that sounds reasonable.



Re: [PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h

2021-09-07 Thread Niklas Schnelle
On Tue, 2021-09-07 at 10:51 +0300, Andy Shevchenko wrote:
> On Tue, Sep 7, 2021 at 3:26 AM kernel test robot  wrote:
> > Hi Niklas,
> > 
> > I love your patch! Yet something to improve:
> > 
> > [auto build test ERROR on s390/features]
> > [also build test ERROR on next-20210906]
> > [cannot apply to pci/next powerpc/next v5.14]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch]
> > 
> > url:
> > https://github.com/0day-ci/linux/commits/Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git 
> > features
> > config: i386-allyesconfig (attached as .config)
> > compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> > reproduce (this is a W=1 build):
> > # 
> > https://github.com/0day-ci/linux/commit/404ed8c00a612e7ae31c50557c80c6726c464863
> >     git remote add linux-review https://github.com/0day-ci/linux
> > git fetch --no-tags linux-review 
> > Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309
> > git checkout 404ed8c00a612e7ae31c50557c80c6726c464863
> > # save the attached .config to linux build tree
> > make W=1 ARCH=i386
> > 
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot 
> > 
> > All errors (new ones prefixed by >>):
> 
> Obviously drivers/pci/pci.h is not only for the above.
> 
> When play with headers always do two test builds: allyesconfig and 
> allmodconfig.

You're right and additionally have to built on some other architectures
as well because allyesconfig and allmodconfig both run through fine on
s390. 

I'll look into it but at first glance it looks like I was over reaching
removing the include from drivers/pci/hotplug/acpiphp_glue.c in
addition it's not even the same kind of awkward relative include from
drivers into arch code. Sorry about that.

> 



Re: [PATCH 0/5] s390/pci: automatic error recovery

2021-09-07 Thread Niklas Schnelle
On Mon, 2021-09-06 at 21:05 -0500, Linas Vepstas wrote:
> On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle 
> wrote:
> 
> >  I believe we might be the first
> > implementation of PCI device recovery in a virtualized setting requiring
> > us to
> > coordinate the device reset with the hypervisor platform by issuing a
> > disable
> > and re-enable to the platform as well as starting the recovery following
> > a platform event.
> > 
> 
> I recall none of the details, but SRIOV is a standardized system for
> sharing a PCI device across multiple virtual machines. It has detailed info
> on what the hypervisor must do, and what the local OS instance must do to
> accomplish this.  

Yes and in fact on s390 we make heavy use of SR-IOV.

> It's part of the PCI standard, and its more than a decade
> old now, maybe two. Being a part of the PCI standard, it was interoperable
> with error recovery, to the best of my recollection. 

Maybe I worded things with a bit too much sensationalism and it might
even be that POWER supports error recovery also with virtualization,
though I'm not sure how far that goes.

I believe you are right in that SR-IOV supports the error recovery,
after all this patch set also has to work together with SRIOV enabled
devices. At least on s390 though until this patch set the error
recovery performed by the hypervisor stopped in the hypervisor.

The missing part added by this patch set is coordinating with device
drivers in Linux to determine where use of a recovered device can pick
up after the PCIe level error recovery is done.

As for virtualization this coordination of course needs to cross the
hypervisor/guest boundary and at least for KVM+QEMU I know for a fact
that reporting a PCI error to the guest is currently just a stub that
actually completely stops the guest, so you definitely don't get smooth
error recovery there yet.

> At the time it was
> introduced, it got pushed very aggressively.  The x86 hypervisor vendors
> were aiming at the heart of zseries, and were militant about it.

And yet we're still here, use SR-IOV ourselves and even support Linux +
KVM as a hypervisor you can use just the same on a mainframe, an x86,
POWER, or ARM system.

> 
> -- Linas
> 



[PATCH 4/5] PCI: Export pci_dev_lock()

2021-09-06 Thread Niklas Schnelle
Commit e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()")
already exported pci_dev_trylock()/pci_dev_unlock() however in some
circumstances such as during error recovery it makes sense to block
waiting to get full access to the device so also export pci_dev_lock().

Signed-off-by: Niklas Schnelle 
---
 drivers/pci/pci.c   | 3 ++-
 include/linux/pci.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aacf575c15cf..3f416c6d3b0b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5040,12 +5040,13 @@ static int pci_reset_bus_function(struct pci_dev *dev, 
int probe)
return pci_parent_bus_reset(dev, probe);
 }
 
-static void pci_dev_lock(struct pci_dev *dev)
+void pci_dev_lock(struct pci_dev *dev)
 {
pci_cfg_access_lock(dev);
/* block PM suspend, driver probe, etc. */
device_lock(>dev);
 }
+EXPORT_SYMBOL_GPL(pci_dev_lock);
 
 /* Return 1 on successful lock, 0 on contention */
 int pci_dev_trylock(struct pci_dev *dev)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ea0e23dbc8ec..efc78b8d4696 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1639,6 +1639,7 @@ void pci_cfg_access_lock(struct pci_dev *dev);
 bool pci_cfg_access_trylock(struct pci_dev *dev);
 void pci_cfg_access_unlock(struct pci_dev *dev);
 
+void pci_dev_lock(struct pci_dev *dev);
 int pci_dev_trylock(struct pci_dev *dev);
 void pci_dev_unlock(struct pci_dev *dev);
 
-- 
2.25.1



[PATCH 5/5] s390/pci: implement minimal PCI error recovery

2021-09-06 Thread Niklas Schnelle
When the platform detects an error on a PCI function or a service action
has been performed it is put in the error state and an error event
notification is provided to the OS.

Currently we treat all error event notifications the same and simply set
pdev->error_state = pci_channel_io_perm_failure requiring user
intervention such as use of the recover attribute to get the device
usable again. Despite requiring a manual step this also has the
disadvantage that the device is completely torn down and recreated
resulting in higher level devices such as a block or network device
being recreated. In case of a block device this also means that it may
need to be removed and added to a software raid even if that could
otherwise survive with a temporary degradation.

This is of course not ideal more so since an error notification with PEC
0x3A indicates that the platform already performed error recovery
successfully or that the error state was caused by a service action that
is now finished.

At least in this case we can assume that the error state can be reset
and the function made usable again. So as not to have the disadvantage
of a full tear down and recreation we need to coordinate this recovery
with the driver. Thankfully there is already a well defined recovery
flow for this described in Documentation/PCI/pci-error-recovery.rst.

The implementation of this is somewhat straight forward and simplified
by the fact that our recovery flow is defined per PCI function. As
a reset we use the newly introduced zpci_hot_reset_device() which also
takes the PCI function out of the error state.

Signed-off-by: Niklas Schnelle 
---
 arch/s390/include/asm/pci.h |   4 +-
 arch/s390/pci/pci.c |  49 ++
 arch/s390/pci/pci_event.c   | 190 +++-
 3 files changed, 241 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 2a2ed165a270..558877aff2e5 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -294,8 +294,10 @@ void zpci_debug_exit(void);
 void zpci_debug_init_device(struct zpci_dev *, const char *);
 void zpci_debug_exit_device(struct zpci_dev *);
 
-/* Error reporting */
+/* Error handling */
 int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *);
+int zpci_clear_error_state(struct zpci_dev *zdev);
+int zpci_reset_load_store_blocked(struct zpci_dev *zdev);
 
 #ifdef CONFIG_NUMA
 
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index a6322f45b5bd..77a3e85d43fb 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -954,6 +954,55 @@ int zpci_report_error(struct pci_dev *pdev,
 }
 EXPORT_SYMBOL(zpci_report_error);
 
+/**
+ * zpci_clear_error_state() - Clears the zPCI error state of the device
+ * @zdev: The zdev for which the zPCI error state should be reset
+ *
+ * Clear the zPCI error state of the device. If clearing the zPCI error state
+ * fails the device is left in the error state. In this case it may make sense
+ * to call zpci_io_perm_failure() on the associated pdev if it exists.
+ *
+ * Returns: 0 on success, -EIO otherwise
+ */
+int zpci_clear_error_state(struct zpci_dev *zdev)
+{
+   u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_RESET_ERROR);
+   struct zpci_fib fib = {0};
+   u8 status;
+   int rc;
+
+   rc = zpci_mod_fc(req, , );
+   if (rc)
+   return -EIO;
+
+   return 0;
+}
+
+/**
+ * zpci_reset_load_store_blocked() - Re-enables L/S from error state
+ * @zdev: The zdev for which to unblock load/store access
+ *
+ * R-eenables load/store access for a PCI function in the error state while
+ * keeping DMA blocked. In this state drivers can poke MMIO space to determine
+ * if error recovery is possible while catching any rogue DMA access from the
+ * device.
+ *
+ * Returns: 0 on success, -EIO otherwise
+ */
+int zpci_reset_load_store_blocked(struct zpci_dev *zdev)
+{
+   u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_RESET_BLOCK);
+   struct zpci_fib fib = {0};
+   u8 status;
+   int rc;
+
+   rc = zpci_mod_fc(req, , );
+   if (rc)
+   return -EIO;
+
+   return 0;
+}
+
 static int zpci_mem_init(void)
 {
BUILD_BUG_ON(!is_power_of_2(__alignof__(struct zpci_fmb)) ||
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index e868d996ec5b..ac9ed1572d39 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -47,15 +47,190 @@ struct zpci_ccdf_avail {
u16 pec;/* PCI event code */
 } __packed;
 
+static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res)
+{
+   switch (ers_res) {
+   case PCI_ERS_RESULT_CAN_RECOVER:
+   case PCI_ERS_RESULT_RECOVERED:
+   case PCI_ERS_RESULT_NEED_RESET:
+   return false;
+   default:
+   return true;
+   }
+}
+
+static pci_ers_result_t zpci_event_notify_error_detected(struct pci_

[PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h

2021-09-06 Thread Niklas Schnelle
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
PCI arch code of both s390 and powerpc leading to awkward relative
includes. Move it to the global include/linux/pci.h and get rid of these
includes just for that one function.

Signed-off-by: Niklas Schnelle 
---
 arch/powerpc/platforms/powernv/pci-sriov.c |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  1 -
 arch/s390/pci/pci_sysfs.c  |  2 --
 drivers/pci/hotplug/acpiphp_glue.c |  1 -
 drivers/pci/pci.h  | 15 ---
 include/linux/pci.h| 15 +++
 6 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..2e0ca5451e85 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -9,9 +9,6 @@
 
 #include "pci.h"
 
-/* for pci_dev_is_added() */
-#include "../../../../drivers/pci/pci.h"
-
 /*
  * The majority of the complexity in supporting SR-IOV on PowerNV comes from
  * the need to put the MMIO space for each VF into a separate PE. Internally
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 0dfaa6ab44cc..08e846ae1853 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,7 +74,6 @@
 #include 
 
 #include "pseries.h"
-#include "../../../../drivers/pci/pci.h"
 
 DEFINE_STATIC_KEY_FALSE(shared_processor);
 EXPORT_SYMBOL(shared_processor);
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 335c281811c7..40733b93a086 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -13,8 +13,6 @@
 #include 
 #include 
 
-#include "../../../drivers/pci/pci.h"
-
 #include 
 
 #define zpci_attr(name, fmt, member)   \
diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
b/drivers/pci/hotplug/acpiphp_glue.c
index f031302ad401..4cb963f88183 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 
-#include "../pci.h"
 #include "acpiphp.h"
 
 static LIST_HEAD(bridge_list);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 93dcdd431072..a159cd0f6f05 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
return dev->error_state == pci_channel_io_perm_failure;
 }
 
-/* pci_dev priv_flags */
-#define PCI_DEV_ADDED 0
-#define PCI_DPC_RECOVERED 1
-#define PCI_DPC_RECOVERING 2
-
-static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
-{
-   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
-}
-
-static inline bool pci_dev_is_added(const struct pci_dev *dev)
-{
-   return test_bit(PCI_DEV_ADDED, >priv_flags);
-}
-
 #ifdef CONFIG_PCIEAER
 #include 
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..ea0e23dbc8ec 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -507,6 +507,21 @@ struct pci_dev {
unsigned long   priv_flags; /* Private flags for the PCI driver */
 };
 
+/* pci_dev priv_flags */
+#define PCI_DEV_ADDED 0
+#define PCI_DPC_RECOVERED 1
+#define PCI_DPC_RECOVERING 2
+
+static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
+{
+   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
+}
+
+static inline bool pci_dev_is_added(const struct pci_dev *dev)
+{
+   return test_bit(PCI_DEV_ADDED, >priv_flags);
+}
+
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
 {
 #ifdef CONFIG_PCI_IOV
-- 
2.25.1



[PATCH 2/5] s390/pci: implement reset_slot for hotplug slot

2021-09-06 Thread Niklas Schnelle
This is done by adding a zpci_hot_reset_device() call which does a low
level reset of the PCI function without changing its higher level
function state. This way it can be used while the zPCI function is bound
to a driver and with DMA tables being controlled either through the
IOMMU or DMA APIs which is prohibited when using zpci_disable_device()
as that drop existing DMA translations.

As this reset, unlike a normal FLR, also calls zpci_clear_irq() we need
to implement arch_restore_msi_irqs() and make sure we re-enable IRQs for
the PCI function if they were previously disabled.

Signed-off-by: Niklas Schnelle 
---
 arch/s390/include/asm/pci.h|  1 +
 arch/s390/pci/pci.c| 58 ++
 arch/s390/pci/pci_irq.c|  9 +
 drivers/pci/hotplug/s390_pci_hpc.c | 24 +
 4 files changed, 92 insertions(+)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 5e6cba22a801..2a2ed165a270 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -208,6 +208,7 @@ int zpci_disable_device(struct zpci_dev *);
 int zpci_scan_configured_device(struct zpci_dev *zdev, u32 fh);
 int zpci_deconfigure_device(struct zpci_dev *zdev);
 
+int zpci_hot_reset_device(struct zpci_dev *zdev);
 int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64);
 int zpci_unregister_ioat(struct zpci_dev *, u8);
 void zpci_remove_reserved_devices(void);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index af22778551c1..a6322f45b5bd 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -723,6 +723,64 @@ int zpci_disable_device(struct zpci_dev *zdev)
return rc;
 }
 
+/**
+ * zpci_hot_reset_device - perform a reset of the given zPCI function
+ * @zdev: the slot which should be reset
+ *
+ * Performs a low level reset of the zPCI function. The reset is low level in
+ * the sense that the zPCI function can be reset without detaching it from the
+ * common PCI subsystem. The reset may be performed while under control of
+ * either DMA or IOMMU APIs in which case the existing DMA/IOMMU translation
+ * table is reinstated at the end of the reset.
+ *
+ * After the reset the functions internal state is reset to an initial state
+ * equivalent to its state during boot when first probing a driver.
+ * Consequently after reset the PCI function requires re-initialization via the
+ * common PCI code including re-enabling IRQs via pci_alloc_irq_vectors()
+ * and enabling the function via e.g.pci_enablde_device_flags().The caller
+ * must guard against concurrent reset attempts.
+ *
+ * In most cases this function should not be called directly but through
+ * pci_reset_function() or pci_reset_bus() which handle the save/restore and
+ * locking.
+ *
+ * Return: 0 on success and an error value otherwise
+ */
+int zpci_hot_reset_device(struct zpci_dev *zdev)
+{
+   int rc;
+
+   zpci_dbg(3, "reset fid:%x\n", zdev->fid);
+   if (zdev_enabled(zdev)) {
+   /* Disables device access, DMAs and IRQs (reset state) */
+   rc = zpci_disable_device(zdev);
+   /*
+* Due to a z/VM vs LPAR inconsistency in the error state the
+* FH may indicate an enabled device but disable says the
+* device is already disabled don't treat it as an error here.
+*/
+   if (rc == -EINVAL)
+   rc = 0;
+   if (rc)
+   return rc;
+   }
+
+   rc = zpci_enable_device(zdev);
+   if (rc)
+   return rc;
+
+   if (zdev->dma_table) {
+   rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+   (u64)zdev->dma_table);
+   if (rc)
+   return rc;
+   } else {
+   zpci_dma_init_device(zdev);
+   }
+
+   return 0;
+}
+
 /**
  * zpci_create_device() - Create a new zpci_dev and add it to the zbus
  * @fid: Function ID of the device to be created
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 9c7de9089939..ab98e7f5b79b 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -391,6 +391,15 @@ void arch_teardown_msi_irqs(struct pci_dev *pdev)
airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, 
zdev->msi_nr_irqs);
 }
 
+void arch_restore_msi_irqs(struct pci_dev *pdev)
+{
+   struct zpci_dev *zdev = to_zpci(pdev);
+
+   if (!zdev->irqs_registered)
+   zpci_set_irq(zdev);
+   default_restore_msi_irqs(pdev);
+}
+
 static struct airq_struct zpci_airq = {
.handler = zpci_floating_irq_handler,
.isc = PCI_ISC,
diff --git a/drivers/pci/hotplug/s390_pci_hpc.c 
b/drivers/pci/hotplug/s390_pci_hpc.c
index 014868752cd4..07f28db0eed5 100644
--- a/drivers/pci/hotplug/s390_pci_hpc.c
+++ b/drivers/pci/hotplug/s390_pci_hpc.c
@@ -57,6 +57,29 @@ static int disable_slot

[PATCH 0/5] s390/pci: automatic error recovery

2021-09-06 Thread Niklas Schnelle
Hello,

This series implements automatic error recovery for PCI devices on s390
following the scheme outlined at Documentation/PCI/pci-error-recovery.rst
it applies on top of currenct master.

The patches have are almost completely s390 specific except for two patches
exporting existing functionality for use by arch/s390/pci/ code. Nevertheless
I would also appreciate any feedback, especially on the last patch, concerning
the implementation of the error recovery flow. I believe we might be the first
implementation of PCI device recovery in a virtualized setting requiring us to
coordinate the device reset with the hypervisor platform by issuing a disable
and re-enable to the platform as well as starting the recovery following
a platform event.

The outline of the patches is as follows:

Patch 1 and 2 add s390 specific code implementing a reset mechanism that
takes the PCI function out of the platform specific error state.

Patches 3 and 4 export existing common code functions for use by the s390
specific recovery code.

Patch 3 I already sent separately resulting in the discussion below but without
a final conclusion.

https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/

I believe even though there were some doubts about the use of
pci_dev_is_added() by arch code the existing uses as well as the use in the
final patch of this series warrant this export.

Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit
e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which
already exported pci_dev_trylock(). In the final patch we make use of
pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished
before starting recovery.

Finally Patch 5 implements the recovery flow as part of the existing s390
specific PCI availability and error event mechanism. Where previously the error
case only set an error indication requiring manual intervention to make the
device usable again. Now we handle the case where firmware has already reset
a PCI function after an error was encountered informing the OS that it should
be ready to be used again. Note that the same event is also issued by the
hypervisor if the function was manually taken into a service mode for example
for firmware upgrade via the hypervisor and is now ready to be used again.

Thanks,
Niklas Schnelle

Niklas Schnelle (5):
  s390/pci: refresh function handle in iomap
  s390/pci: implement reset_slot for hotplug slot
  PCI: Move pci_dev_is/assign_added() to pci.h
  PCI: Export pci_dev_lock()
  s390/pci: implement minimal PCI error recovery

 arch/powerpc/platforms/powernv/pci-sriov.c |   3 -
 arch/powerpc/platforms/pseries/setup.c |   1 -
 arch/s390/include/asm/pci.h|   6 +-
 arch/s390/pci/pci.c| 143 ++-
 arch/s390/pci/pci_event.c  | 196 -
 arch/s390/pci/pci_insn.c   |   4 +-
 arch/s390/pci/pci_irq.c|   9 +
 arch/s390/pci/pci_sysfs.c  |   2 -
 drivers/pci/hotplug/acpiphp_glue.c |   1 -
 drivers/pci/hotplug/s390_pci_hpc.c |  24 +++
 drivers/pci/pci.c  |   3 +-
 drivers/pci/pci.h  |  15 --
 include/linux/pci.h|  16 ++
 13 files changed, 389 insertions(+), 34 deletions(-)

-- 
2.25.1



[PATCH 1/5] s390/pci: refresh function handle in iomap

2021-09-06 Thread Niklas Schnelle
The function handle of a PCI function is updated when disabling or
enabling it as well as when the function's availability changes or it
enters the error state.

Until now this only occurred either while there is no struct pci_dev
associated with the function yet or the function became unavailable.
This meant that leaving a stale function handle in the iomap either
didn't happen because there was no iomap yet or it lead to errors on PCI
access but so would the correct disabled function handle.

In the future a CLP Set PCI Function Disable/Enable cycle during PCI
device recovery may be done while the device is bound to a driver.  In
this case we must update the iomap associated with the now-stale
function handle to ensure that the resulting zPCI instruction references
an accurate function handle.

Since the function handle is accessed by the PCI accessor helpers
without locking use READ_ONCE()/WRITE_ONCE() to mark this access and
prevent compiler optimizations that would move the load/store.

With that infrastructure in place let's also properly update the
function handle in the existing cases. This makes sure that in the
future debugging of a zPCI function access through the handle will
show an up to date handle reducing the chance of confusion. Also it
makes sure we have one single place where a zPCI function handle is
updated after initialization.

Signed-off-by: Niklas Schnelle 
---
 arch/s390/include/asm/pci.h |  1 +
 arch/s390/pci/pci.c | 36 
 arch/s390/pci/pci_event.c   |  6 +++---
 arch/s390/pci/pci_insn.c|  4 ++--
 4 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index e4803ec51110..5e6cba22a801 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -211,6 +211,7 @@ int zpci_deconfigure_device(struct zpci_dev *zdev);
 int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64);
 int zpci_unregister_ioat(struct zpci_dev *, u8);
 void zpci_remove_reserved_devices(void);
+void zpci_update_fh(struct zpci_dev *zdev, u32 fh);
 
 /* CLP */
 int clp_setup_writeback_mio(void);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index e7e6788d75a8..af22778551c1 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -481,6 +481,34 @@ static void zpci_free_iomap(struct zpci_dev *zdev, int 
entry)
spin_unlock(_iomap_lock);
 }
 
+static void zpci_do_update_iomap_fh(struct zpci_dev *zdev, u32 fh)
+{
+   int bar, idx;
+
+   spin_lock(_iomap_lock);
+   for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
+   if (!zdev->bars[bar].size)
+   continue;
+   idx = zdev->bars[bar].map_idx;
+   if (!zpci_iomap_start[idx].count)
+   continue;
+   WRITE_ONCE(zpci_iomap_start[idx].fh, zdev->fh);
+   }
+   spin_unlock(_iomap_lock);
+}
+
+void zpci_update_fh(struct zpci_dev *zdev, u32 fh)
+{
+   if (!fh || zdev->fh == fh)
+   return;
+
+   zdev->fh = fh;
+   if (zpci_use_mio(zdev))
+   return;
+   if (zdev->has_resources && zdev_enabled(zdev))
+   zpci_do_update_iomap_fh(zdev, fh);
+}
+
 static struct resource *__alloc_res(struct zpci_dev *zdev, unsigned long start,
unsigned long size, unsigned long flags)
 {
@@ -668,7 +696,7 @@ int zpci_enable_device(struct zpci_dev *zdev)
if (clp_enable_fh(zdev, , ZPCI_NR_DMA_SPACES))
rc = -EIO;
else
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
return rc;
 }
 
@@ -679,14 +707,14 @@ int zpci_disable_device(struct zpci_dev *zdev)
 
cc = clp_disable_fh(zdev, );
if (!cc) {
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
} else if (cc == CLP_RC_SETPCIFN_ALRDY) {
pr_info("Disabling PCI function %08x had no effect as it was 
already disabled\n",
zdev->fid);
/* Function is already disabled - update handle */
rc = clp_refresh_fh(zdev->fid, );
if (!rc) {
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
rc = -EINVAL;
}
} else {
@@ -768,7 +796,7 @@ int zpci_scan_configured_device(struct zpci_dev *zdev, u32 
fh)
 {
int rc;
 
-   zdev->fh = fh;
+   zpci_update_fh(zdev, fh);
/* the PCI function will be scanned once function 0 appears */
if (!zdev->zbus->bus)
return 0;
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index c856f80cb21b..e868d996ec5b 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -76,7 +76,7 @@ void zpci_event_error(void *data)
 
 static void zpci_event_hard_deconfigured(struct zpci_dev *zdev, u3

Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

2021-08-26 Thread Niklas Schnelle
On Wed, 2021-08-25 at 14:04 -0500, Bjorn Helgaas wrote:
> On Mon, Aug 23, 2021 at 12:53:39PM +0200, Niklas Schnelle wrote:
> > On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote:
> > > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> > > > PCI arch code of both s390 and powerpc leading to awkward relative
> > > > includes. Move it to the global include/linux/pci.h and get rid of these
> > > > includes just for that one function.
> > > 
> > > I agree the includes are awkward.
> > > 
> > > But the arch code *using* pci_dev_is_added() seems awkward, too.
> > 
> > See below for my interpretation why s390 has some driver like
> > functionality in its arch code which isn't necessarily awkward.
> > 
> > Independent from that I have found pci_dev_is_added() as the only way
> > deal with the case that one might be looking at a struct pci_dev
> > reference that has been removed via pci_stop_and_remove_bus_device() or
> > has never been fully scanned. This is quite useful when handling error
> > events which on s390 are part of the adapter event mechanism shared
> > with channel I/O devices.
> > 
> > > AFAICS, in powerpc, pci_dev_is_added() is only used by
> > > pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources().  Those
> > > are only called from pcibios_add_device(), which is only called from
> > > pci_device_add().
> > > 
> > > Is it even possible for pci_dev_is_added() to be true in that path?
> 
> If the pci_dev_is_added() in powerpc is unreachable, we can remove it
> and at least reduce this to an s390-only problem.

Ok. I might be missing something but I agree it does look these are
called from within pcibios_add_device() only so pci_dev_is_added() can
never be true. This looks pretty clear as pci_dev_assign_added() is
called after pcibios_add_device() in pci_bus_add_device().

> 
> > > s390 uses pci_dev_is_added() in recover_store()
> > 
> > I'm actually looking into this as I'm working on an s390 implementation
> > of the PCI recovery flow described in Documentation/PCI/pci-error-
> > recovery.rst that would also call pci_dev_is_added() because when we
> > get a platform notification of a PCI reset done by firmware it may be
> > that the struct pci_dev is going away i.e. we still have a ref count
> > but it is not added to the PCI bus anymore. And pci_dev_is_added() is
> > the only way I've found to check for this state.
> > 
> > > , but I don't know what
> > > that is (looks like a sysfs file, but it's not documented) or why s390
> > > is the only arch that does this.
> > 
> > Good point about this not being documented, I'll look into adding docs.
> > 
> > This is a sysfs attribute that basically removes the pci_dev and re-
> > adds it. This has the complication that since the attribute sits at
> > /sys/bus/pci/devices//recover it deletes its own parent directory
> > which requires extra caution and means concurrent accesses block on
> > pci_lock_rescan_remove() instead of a kernfs lock.
> > Long story short when concurrently triggering the attribute one thread
> > proceeds into the pci_lock_rescan_remove() section and does the
> > removal, while others would block on pci_lock_rescan_remove(). Now when
> > the threads unblock the removal is done. In this case there is a new
> > struct pci_dev found in the rescan but the previously blocked threads
> > still have references to the old struct pci_dev which was removed and
> > as far as I could tell can only be distinguished by checking
> > pci_dev_is_added().
> 
> Is this locking issue different from concurrently writing to
> /sys/.../remove on other architectures?

In principle it is very similar except that we re-scan and thus the
removed pdev may co-exist with a new pdev for the same actual device if
there are other references to the pdev.

There is however also a significant difference in locking that fixes a
possible deadlock that I just confirmed also affects /sys/../remove
where it is hidden by a lockdep ignore, see lockdep splash below. 

For /sys/../recover this was fixed by my commit dd712f0a53ca
("s390/pci: Fix possible deadlock in  recover_store()") which also
introduced the need for pci_dev_is_added().

The change in the above commit is very similar to what is done in
drivers/scsi/scsi_sysfs.c:sdev_store_delete() which also has a self
deletion and in commit 0ee223b2e1f6 ("scsi: core: Avoid that SCSI
device removal through sysfs triggers a deadlock") fixed a very very
similar lock order p

Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

2021-08-23 Thread Niklas Schnelle
On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote:
> On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote:
> > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> > PCI arch code of both s390 and powerpc leading to awkward relative
> > includes. Move it to the global include/linux/pci.h and get rid of these
> > includes just for that one function.
> 
> I agree the includes are awkward.
> 
> But the arch code *using* pci_dev_is_added() seems awkward, too.

See below for my interpretation why s390 has some driver like
functionality in its arch code which isn't necessarily awkward.

Independent from that I have found pci_dev_is_added() as the only way
deal with the case that one might be looking at a struct pci_dev
reference that has been removed via pci_stop_and_remove_bus_device() or
has never been fully scanned. This is quite useful when handling error
events which on s390 are part of the adapter event mechanism shared
with channel I/O devices.

> 
> AFAICS, in powerpc, pci_dev_is_added() is only used by
> pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources().  Those
> are only called from pcibios_add_device(), which is only called from
> pci_device_add().
> 
> Is it even possible for pci_dev_is_added() to be true in that path?
> 
> s390 uses pci_dev_is_added() in recover_store()

I'm actually looking into this as I'm working on an s390 implementation
of the PCI recovery flow described in Documentation/PCI/pci-error-
recovery.rst that would also call pci_dev_is_added() because when we
get a platform notification of a PCI reset done by firmware it may be
that the sturct pci_dev is going away i.e. we still have a ref count
but it is not added to the PCI bus anymore. And pci_dev_is_added() is
the only way I've found to check for this state.

> , but I don't know what
> that is (looks like a sysfs file, but it's not documented) or why s390
> is the only arch that does this.

Good point about this not being documented, I'll look into adding docs.

This is a sysfs attribute that basically removes the pci_dev and re-
adds it. This has the complication that since the attribute sits at
/sys/bus/pci/devices//recover it deletes its own parent directory
which requires extra caution and means concurrent accesses block on
pci_lock_rescan_remove() instead of a kernfs lock.
Long story short when concurrently triggering the attribute one thread
proceeds into the pci_lock_rescan_remove() section and does the
removal, while others would block on pci_lock_rescan_remove(). Now when
the threads unblock the removal is done. In this case there is a new
struct pci_dev found in the rescan but the previously blocked threads
still have references to the old struct pci_dev which was removed and
as far as I could tell can only be distinguihsed by checking
pci_dev_is_added().

> 
> Maybe we should make powerpc and s390 less special?

On s390, as I see it, the reason for this is that all of the PCI
functionality is directly defined in the Architecture as special CPU
instructions which are kind of hypercalls but also an ISA extension.

These instructions range from the basic PCI memory accesses (no real
MMIO) to enumeration of the devices and on to reporting of hot-plug and
and resets/recovery events. Importantly we do not have any kind of
direct access to a real or virtual PCI controller and the architecture
has no concept of a comparable entity.

So in my opinion while there is some of the functionality of a PCI
controller in arch/s390/pci the cut off between controller
functionality and arch support isn't clear at all and exposing PCI
support as CPU instructions doesn't map well to the controller concept.

That said, in principle I'm open to moving some of that into
drivers/pci/controller/ if you think that would improve things and we
can find a good argument what should go where. One possible cut off
would be to have arch/s390/pci/ provide wrappers to the PCI
instructions but move all their uses to  e.g.
drivers/pci/controller/s390/. This would of course be a major
refactoring and none of that code would be useful on any other
architecture but it would move a lot the accesses to PCI common code
functionality out of the arch code.

> 
> > Signed-off-by: Niklas Schnelle 
> > ---
> > Since v1 (and bad v2):
> > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
> >   defines and also move these to include/linux/pci.h
> > 
> >  arch/powerpc/platforms/powernv/pci-sriov.c |  3 ---
> >  arch/powerpc/platforms/pseries/setup.c |  1 -
> >  arch/s390/pci/pci_sysfs.c  |  2 --
> >  drivers/pci/hotplug/acpiphp_glue.c |  1 -
> >  drivers/pci/pci.h  | 15 ---
> >  include/linux/pci.h| 15 +++
> >  6 files changed, 15 insert

Re: [PATCH v2 13/21] s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR

2021-08-02 Thread Niklas Schnelle
On Fri, 2021-07-23 at 11:50 -0600, Logan Gunthorpe wrote:
> Setting the ->dma_address to DMA_MAPPING_ERROR is not part of
> the ->map_sg calling convention, so remove it.
> 
> Link: https://lore.kernel.org/linux-mips/20210716063241.gc13...@lst.de/
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Logan Gunthorpe 
> Cc: Niklas Schnelle 
> Cc: Gerald Schaefer 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> ---
>  arch/s390/pci/pci_dma.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
> index c78b02012764..be48e5b5bfcf 100644
> --- a/arch/s390/pci/pci_dma.c
> +++ b/arch/s390/pci/pci_dma.c
> @@ -492,7 +492,6 @@ static int s390_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   for (i = 1; i < nr_elements; i++) {
>   s = sg_next(s);
>  
> - s->dma_address = DMA_MAPPING_ERROR;
>   s->dma_length = 0;
>  
>   if (s->offset || (size & ~PAGE_MASK) ||

Acked-by: Niklas Schnelle 

Thanks!



[PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h

2021-07-20 Thread Niklas Schnelle
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
PCI arch code of both s390 and powerpc leading to awkward relative
includes. Move it to the global include/linux/pci.h and get rid of these
includes just for that one function.

Signed-off-by: Niklas Schnelle 
---
Since v1 (and bad v2):
- Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
  defines and also move these to include/linux/pci.h

 arch/powerpc/platforms/powernv/pci-sriov.c |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  1 -
 arch/s390/pci/pci_sysfs.c  |  2 --
 drivers/pci/hotplug/acpiphp_glue.c |  1 -
 drivers/pci/pci.h  | 15 ---
 include/linux/pci.h| 15 +++
 6 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..2e0ca5451e85 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -9,9 +9,6 @@
 
 #include "pci.h"
 
-/* for pci_dev_is_added() */
-#include "../../../../drivers/pci/pci.h"
-
 /*
  * The majority of the complexity in supporting SR-IOV on PowerNV comes from
  * the need to put the MMIO space for each VF into a separate PE. Internally
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 631a0d57b6cd..17585ec9f955 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,7 +74,6 @@
 #include 
 
 #include "pseries.h"
-#include "../../../../drivers/pci/pci.h"
 
 DEFINE_STATIC_KEY_FALSE(shared_processor);
 EXPORT_SYMBOL_GPL(shared_processor);
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 6e2450c2b9c1..8dbe54ef8f8e 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -13,8 +13,6 @@
 #include 
 #include 
 
-#include "../../../drivers/pci/pci.h"
-
 #include 
 
 #define zpci_attr(name, fmt, member)   \
diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
b/drivers/pci/hotplug/acpiphp_glue.c
index f031302ad401..4cb963f88183 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 
-#include "../pci.h"
 #include "acpiphp.h"
 
 static LIST_HEAD(bridge_list);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 93dcdd431072..a159cd0f6f05 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
return dev->error_state == pci_channel_io_perm_failure;
 }
 
-/* pci_dev priv_flags */
-#define PCI_DEV_ADDED 0
-#define PCI_DPC_RECOVERED 1
-#define PCI_DPC_RECOVERING 2
-
-static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
-{
-   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
-}
-
-static inline bool pci_dev_is_added(const struct pci_dev *dev)
-{
-   return test_bit(PCI_DEV_ADDED, >priv_flags);
-}
-
 #ifdef CONFIG_PCIEAER
 #include 
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..ea0e23dbc8ec 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -507,6 +507,21 @@ struct pci_dev {
unsigned long   priv_flags; /* Private flags for the PCI driver */
 };
 
+/* pci_dev priv_flags */
+#define PCI_DEV_ADDED 0
+#define PCI_DPC_RECOVERED 1
+#define PCI_DPC_RECOVERING 2
+
+static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
+{
+   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
+}
+
+static inline bool pci_dev_is_added(const struct pci_dev *dev)
+{
+   return test_bit(PCI_DEV_ADDED, >priv_flags);
+}
+
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
 {
 #ifdef CONFIG_PCI_IOV
-- 
2.25.1



Re: [PATCH v2] PCI: Move pci_dev_is/assign_added() to pci.h

2021-07-20 Thread Niklas Schnelle
On Tue, 2021-07-20 at 11:58 +0200, Niklas Schnelle wrote:
> The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> PCI arch code of both s390 and powerpc leading to awkward relative
> includes. Move it to the global include/linux/pci.h and get rid of these
> includes just for that one function.
> 
> Signed-off-by: Niklas Schnelle 
> ---
> Since v1:
> - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
>   defines and also move these to include/linux/pci.h

Please disregard I actually sent the old patch ;-(

> 
>  arch/powerpc/platforms/powernv/pci-sriov.c |  3 ---
>  arch/powerpc/platforms/pseries/setup.c |  1 -
>  arch/s390/pci/pci_sysfs.c  |  2 --
>  drivers/pci/hotplug/acpiphp_glue.c |  1 -
>  drivers/pci/pci.h  | 15 ---
>  include/linux/pci.h| 13 +
>  6 files changed, 13 insertions(+), 22 deletions(-)
> 
> 
... snip ..



[PATCH v2] PCI: Move pci_dev_is/assign_added() to pci.h

2021-07-20 Thread Niklas Schnelle
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
PCI arch code of both s390 and powerpc leading to awkward relative
includes. Move it to the global include/linux/pci.h and get rid of these
includes just for that one function.

Signed-off-by: Niklas Schnelle 
---
Since v1:
- Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING
  defines and also move these to include/linux/pci.h

 arch/powerpc/platforms/powernv/pci-sriov.c |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  1 -
 arch/s390/pci/pci_sysfs.c  |  2 --
 drivers/pci/hotplug/acpiphp_glue.c |  1 -
 drivers/pci/pci.h  | 15 ---
 include/linux/pci.h| 13 +
 6 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..2e0ca5451e85 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -9,9 +9,6 @@
 
 #include "pci.h"
 
-/* for pci_dev_is_added() */
-#include "../../../../drivers/pci/pci.h"
-
 /*
  * The majority of the complexity in supporting SR-IOV on PowerNV comes from
  * the need to put the MMIO space for each VF into a separate PE. Internally
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 631a0d57b6cd..17585ec9f955 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,7 +74,6 @@
 #include 
 
 #include "pseries.h"
-#include "../../../../drivers/pci/pci.h"
 
 DEFINE_STATIC_KEY_FALSE(shared_processor);
 EXPORT_SYMBOL_GPL(shared_processor);
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 6e2450c2b9c1..8dbe54ef8f8e 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -13,8 +13,6 @@
 #include 
 #include 
 
-#include "../../../drivers/pci/pci.h"
-
 #include 
 
 #define zpci_attr(name, fmt, member)   \
diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
b/drivers/pci/hotplug/acpiphp_glue.c
index f031302ad401..4cb963f88183 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 
-#include "../pci.h"
 #include "acpiphp.h"
 
 static LIST_HEAD(bridge_list);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 93dcdd431072..a159cd0f6f05 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
return dev->error_state == pci_channel_io_perm_failure;
 }
 
-/* pci_dev priv_flags */
-#define PCI_DEV_ADDED 0
-#define PCI_DPC_RECOVERED 1
-#define PCI_DPC_RECOVERING 2
-
-static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
-{
-   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
-}
-
-static inline bool pci_dev_is_added(const struct pci_dev *dev)
-{
-   return test_bit(PCI_DEV_ADDED, >priv_flags);
-}
-
 #ifdef CONFIG_PCIEAER
 #include 
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..b3b7bafa17e5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -507,6 +507,19 @@ struct pci_dev {
unsigned long   priv_flags; /* Private flags for the PCI driver */
 };
 
+/* pci_dev priv_flags */
+#define PCI_DEV_ADDED 0
+
+static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
+{
+   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
+}
+
+static inline bool pci_dev_is_added(const struct pci_dev *dev)
+{
+   return test_bit(PCI_DEV_ADDED, >priv_flags);
+}
+
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
 {
 #ifdef CONFIG_PCI_IOV
-- 
2.25.1



Re: [PATCH] PCI: Move pci_dev_is/assign_added() to pci.h

2021-07-20 Thread Niklas Schnelle
On Tue, 2021-07-20 at 10:34 +0200, Niklas Schnelle wrote:
> On Mon, 2021-07-19 at 14:11 +0200, Niklas Schnelle wrote:
> > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> > PCI arch code of both s390 and powerpc leading to awkward relative
> > includes. Move it to the global include/linux/pci.h and get rid of these
> > includes just for that one function.
> > 
> > Signed-off-by: Niklas Schnelle 
> > ---
> > 
> ... snip ...
> >  
> >  static LIST_HEAD(bridge_list);
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 93dcdd431072..a159cd0f6f05 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const 
> > struct pci_dev *dev)
> > return dev->error_state == pci_channel_io_perm_failure;
> >  }
> >  
> > -/* pci_dev priv_flags */
> > -#define PCI_DEV_ADDED 0
> > -#define PCI_DPC_RECOVERED 1
> > -#define PCI_DPC_RECOVERING 2
> 
> Sorry, the above two PCI_DPC_* lines should remain in drivers/pci/pci.h
> I messed this up on rebasing to v5.14-rc1 and it didn't lead to
> problems on either s390x defconfig, nor pp64_defconfig but breaks ppc
> allyesconfig. Will resend a fixed version.

Ok looking at this more closely, maybe it would also make sense to move
the PCI_DPC_* flags also to include/linux/pci.h that would also put
them in direct proximity of the priv_flags field decleration itself
which I think also improves maintainability.

> 
> 
> .. snip ..
> 



Re: [PATCH] PCI: Move pci_dev_is/assign_added() to pci.h

2021-07-20 Thread Niklas Schnelle
On Mon, 2021-07-19 at 14:11 +0200, Niklas Schnelle wrote:
> The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
> PCI arch code of both s390 and powerpc leading to awkward relative
> includes. Move it to the global include/linux/pci.h and get rid of these
> includes just for that one function.
> 
> Signed-off-by: Niklas Schnelle 
> ---
> 
... snip ...
>  
>  static LIST_HEAD(bridge_list);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 93dcdd431072..a159cd0f6f05 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct 
> pci_dev *dev)
>   return dev->error_state == pci_channel_io_perm_failure;
>  }
>  
> -/* pci_dev priv_flags */
> -#define PCI_DEV_ADDED 0
> -#define PCI_DPC_RECOVERED 1
> -#define PCI_DPC_RECOVERING 2

Sorry, the above two PCI_DPC_* lines should remain in drivers/pci/pci.h
I messed this up on rebasing to v5.14-rc1 and it didn't lead to
problems on either s390x defconfig, nor pp64_defconfig but breaks ppc
allyesconfig. Will resend a fixed version.

> 

.. snip ..



[PATCH] PCI: Move pci_dev_is/assign_added() to pci.h

2021-07-19 Thread Niklas Schnelle
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
PCI arch code of both s390 and powerpc leading to awkward relative
includes. Move it to the global include/linux/pci.h and get rid of these
includes just for that one function.

Signed-off-by: Niklas Schnelle 
---
 arch/powerpc/platforms/powernv/pci-sriov.c |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  1 -
 arch/s390/pci/pci_sysfs.c  |  2 --
 drivers/pci/hotplug/acpiphp_glue.c |  1 -
 drivers/pci/pci.h  | 15 ---
 include/linux/pci.h| 13 +
 6 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..2e0ca5451e85 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -9,9 +9,6 @@
 
 #include "pci.h"
 
-/* for pci_dev_is_added() */
-#include "../../../../drivers/pci/pci.h"
-
 /*
  * The majority of the complexity in supporting SR-IOV on PowerNV comes from
  * the need to put the MMIO space for each VF into a separate PE. Internally
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 631a0d57b6cd..17585ec9f955 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,7 +74,6 @@
 #include 
 
 #include "pseries.h"
-#include "../../../../drivers/pci/pci.h"
 
 DEFINE_STATIC_KEY_FALSE(shared_processor);
 EXPORT_SYMBOL_GPL(shared_processor);
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 6e2450c2b9c1..8dbe54ef8f8e 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -13,8 +13,6 @@
 #include 
 #include 
 
-#include "../../../drivers/pci/pci.h"
-
 #include 
 
 #define zpci_attr(name, fmt, member)   \
diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
b/drivers/pci/hotplug/acpiphp_glue.c
index f031302ad401..4cb963f88183 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 
-#include "../pci.h"
 #include "acpiphp.h"
 
 static LIST_HEAD(bridge_list);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 93dcdd431072..a159cd0f6f05 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
return dev->error_state == pci_channel_io_perm_failure;
 }
 
-/* pci_dev priv_flags */
-#define PCI_DEV_ADDED 0
-#define PCI_DPC_RECOVERED 1
-#define PCI_DPC_RECOVERING 2
-
-static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
-{
-   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
-}
-
-static inline bool pci_dev_is_added(const struct pci_dev *dev)
-{
-   return test_bit(PCI_DEV_ADDED, >priv_flags);
-}
-
 #ifdef CONFIG_PCIEAER
 #include 
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..b3b7bafa17e5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -507,6 +507,19 @@ struct pci_dev {
unsigned long   priv_flags; /* Private flags for the PCI driver */
 };
 
+/* pci_dev priv_flags */
+#define PCI_DEV_ADDED 0
+
+static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
+{
+   assign_bit(PCI_DEV_ADDED, >priv_flags, added);
+}
+
+static inline bool pci_dev_is_added(const struct pci_dev *dev)
+{
+   return test_bit(PCI_DEV_ADDED, >priv_flags);
+}
+
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
 {
 #ifdef CONFIG_PCI_IOV
-- 
2.25.1



Re: [PATCH v1 10/16] s390/pci: return error code from s390_dma_map_sg()

2021-07-16 Thread Niklas Schnelle
On Thu, 2021-07-15 at 10:45 -0600, Logan Gunthorpe wrote:
> From: Martin Oliveira 
> 
> The .map_sg() op now expects an error code instead of zero on failure.
> 
> So propagate the error from __s390_dma_map_sg() up.
> 
> Signed-off-by: Martin Oliveira 
> Signed-off-by: Logan Gunthorpe 
> Cc: Niklas Schnelle 
> Cc: Gerald Schaefer 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> ---
>  arch/s390/pci/pci_dma.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
> index ebc9a49523aa..c78b02012764 100644
> --- a/arch/s390/pci/pci_dma.c
> +++ b/arch/s390/pci/pci_dma.c
> @@ -487,7 +487,7 @@ static int s390_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   unsigned int max = dma_get_max_seg_size(dev);
>   unsigned int size = s->offset + s->length;
>   unsigned int offset = s->offset;
> - int count = 0, i;
> + int count = 0, i, ret;
>  
>   for (i = 1; i < nr_elements; i++) {
>   s = sg_next(s);
> @@ -497,8 +497,9 @@ static int s390_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>  
>   if (s->offset || (size & ~PAGE_MASK) ||
>   size + s->length > max) {
> - if (__s390_dma_map_sg(dev, start, size,
> -   >dma_address, dir))
> + ret = __s390_dma_map_sg(dev, start, size,
> + >dma_address, dir);
> + if (ret)
>   goto unmap;
>  
>   dma->dma_address += offset;
> @@ -511,7 +512,8 @@ static int s390_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   }
>   size += s->length;
>   }
> - if (__s390_dma_map_sg(dev, start, size, >dma_address, dir))
> + ret = __s390_dma_map_sg(dev, start, size, >dma_address, dir);
> + if (ret)
>   goto unmap;
>  
>   dma->dma_address += offset;
> @@ -523,7 +525,7 @@ static int s390_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   s390_dma_unmap_pages(dev, sg_dma_address(s), sg_dma_len(s),
>dir, attrs);
>  
> - return 0;
> + return ret;
>  }
>  
>  static void s390_dma_unmap_sg(struct device *dev, struct scatterlist *sg,

So the error codes we return are -ENOMEM if allocating a DMA
translation entry fails and -EINVAL if the DMA translation table hasn't
been initialized or the caller tries to map 0 sized memory. Are these
error codes that you would expect? If yes then this change looks good
to me.

Acked-by: Niklas Schnelle 



Re: [PATCH 27/30] scm_blk: use blk_mq_alloc_disk and blk_cleanup_disk

2021-06-02 Thread Niklas Schnelle
On Wed, 2021-06-02 at 09:53 +0300, Christoph Hellwig wrote:
> Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
> request_queue allocation.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/s390/block/scm_blk.c | 21 ++---
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/s390/block/scm_blk.c b/drivers/s390/block/scm_blk.c
> index a4f6f2e62b1d..88cba6212ee2 100644
> --- a/drivers/s390/block/scm_blk.c
> +++ b/drivers/s390/block/scm_blk.c
> @@ -462,12 +462,12 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct 
> scm_device *scmdev)
>   if (ret)
>   goto out;
>  
> - rq = blk_mq_init_queue(>tag_set);
> - if (IS_ERR(rq)) {
> - ret = PTR_ERR(rq);
> + bdev->gendisk = blk_mq_alloc_disk(>tag_set, scmdev);
> + if (IS_ERR(bdev->gendisk)) {
> + ret = PTR_ERR(bdev->gendisk);
>   goto out_tag;
>   }
> - bdev->rq = rq;
> + rq = bdev->rq = bdev->gendisk->queue;
>   nr_max_blk = min(scmdev->nr_max_block,
>(unsigned int) (PAGE_SIZE / sizeof(struct aidaw)));
>  
> @@ -477,17 +477,11 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct 
> scm_device *scmdev)
>   blk_queue_flag_set(QUEUE_FLAG_NONROT, rq);
>   blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, rq);
>  
> - bdev->gendisk = alloc_disk(SCM_NR_PARTS);
> - if (!bdev->gendisk) {
> - ret = -ENOMEM;
> - goto out_queue;
> - }
> - rq->queuedata = scmdev;
>   bdev->gendisk->private_data = scmdev;
>   bdev->gendisk->fops = _blk_devops;
> - bdev->gendisk->queue = rq;
>   bdev->gendisk->major = scm_major;
>   bdev->gendisk->first_minor = devindex * SCM_NR_PARTS;
> + bdev->gendisk->minors = SCM_NR_PARTS;
>  
>   len = snprintf(bdev->gendisk->disk_name, DISK_NAME_LEN, "scm");
>   if (devindex > 25) {
> @@ -504,8 +498,6 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct 
> scm_device *scmdev)
>   device_add_disk(>dev, bdev->gendisk, NULL);
>   return 0;
>  
> -out_queue:
> - blk_cleanup_queue(rq);
>  out_tag:
>   blk_mq_free_tag_set(>tag_set);
>  out:
> @@ -516,9 +508,8 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct 
> scm_device *scmdev)
>  void scm_blk_dev_cleanup(struct scm_blk_dev *bdev)
>  {
>   del_gendisk(bdev->gendisk);
> - blk_cleanup_queue(bdev->gendisk->queue);
> + blk_cleanup_disk(bdev->gendisk);
>   blk_mq_free_tag_set(>tag_set);
> - put_disk(bdev->gendisk);
>  }
>  
>  void scm_blk_set_available(struct scm_blk_dev *bdev)

Not an expert on SCM or this code but I gave this a quick test and it
seems to work fine.

Tested-by: Niklas Schnelle 





Re: [PATCH 0/2] dma-mapping: update default segment_boundary_mask

2020-09-02 Thread Niklas Schnelle



On 9/2/20 12:16 AM, Nicolin Chen wrote:
> These two patches are to update default segment_boundary_mask.
> 
> PATCH-1 fixes overflow issues in callers of dma_get_seg_boundary.
> Previous version was a series: https://lkml.org/lkml/2020/8/31/1026
> 
> Then PATCH-2 sets default segment_boundary_mask to ULONG_MAX.
> 
> Nicolin Chen (2):
>   dma-mapping: introduce dma_get_seg_boundary_nr_pages()
>   dma-mapping: set default segment_boundary_mask to ULONG_MAX

I gave both of your patches a quick test ride on a couple of dev mainframes,
both NVMe, ConnectX and virtio-pci devices all seems to work fine.
I already commented on Christoph's mail that I like the helper approach,
so as for s390 you can add my

Acked-by: Niklas Schnelle 

> 
>  arch/alpha/kernel/pci_iommu.c|  7 +--
>  arch/ia64/hp/common/sba_iommu.c  |  3 +--
>  arch/powerpc/kernel/iommu.c  |  9 ++---
>  arch/s390/pci/pci_dma.c  |  6 ++
>  arch/sparc/kernel/iommu-common.c | 10 +++---
>  arch/sparc/kernel/iommu.c|  3 +--
>  arch/sparc/kernel/pci_sun4v.c|  3 +--
>  arch/x86/kernel/amd_gart_64.c|  3 +--
>  drivers/parisc/ccio-dma.c|  3 +--
>  drivers/parisc/sba_iommu.c   |  3 +--
>  include/linux/dma-mapping.h  | 21 -
>  11 files changed, 34 insertions(+), 37 deletions(-)
> 


Re: [RFT][PATCH 0/7] Avoid overflow at boundary_size

2020-08-25 Thread Niklas Schnelle



On 8/21/20 1:19 AM, Nicolin Chen wrote:
> We are expending the default DMA segmentation boundary to its
> possible maximum value (ULONG_MAX) to indicate that a device
> doesn't specify a boundary limit. So all dma_get_seg_boundary
> callers should take a precaution with the return values since
> it would easily get overflowed.
> 
> I scanned the entire kernel tree for all the existing callers
> and found that most of callers may get overflowed in two ways:
> either "+ 1" or passing it to ALIGN() that does "+ mask".
> 
> According to kernel defines:
> #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
> #define ALIGN(x, a)   ALIGN_MASK(x, (typeof(x))(a) - 1)
> 
> We can simplify the logic here:
>   ALIGN(boundary + 1, 1 << shift) >> shift
> = ALIGN_MASK(b + 1, (1 << s) - 1) >> s
> = {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
> = [b + 1 + (1 << s) - 1] >> s
> = [b + (1 << s)] >> s
> = (b >> s) + 1
> 
> So this series of patches fix the potential overflow with this
> overflow-free shortcut.

Hi Nicolin,

haven't seen any other feedback from other maintainers,
so I guess you will resend this?
On first glance it seems to make sense.
I'm a little confused why it is only a "potential overflow"
while this part

"We are expending the default DMA segmentation boundary to its
 possible maximum value (ULONG_MAX) to indicate that a device
 doesn't specify a boundary limit"

sounds to me like ULONG_MAX is actually used, does that
mean there are currently no devices which do not specify a
boundary limit?


> 
> As I don't think that I have these platforms, marking RFT.
> 
> Thanks
> Nic
> 
> Nicolin Chen (7):
>   powerpc/iommu: Avoid overflow at boundary_size
>   alpha: Avoid overflow at boundary_size
>   ia64/sba_iommu: Avoid overflow at boundary_size
>   s390/pci_dma: Avoid overflow at boundary_size
>   sparc: Avoid overflow at boundary_size
>   x86/amd_gart: Avoid overflow at boundary_size
>   parisc: Avoid overflow at boundary_size
> 
>  arch/alpha/kernel/pci_iommu.c| 10 --
>  arch/ia64/hp/common/sba_iommu.c  |  4 ++--
>  arch/powerpc/kernel/iommu.c  | 11 +--
>  arch/s390/pci/pci_dma.c  |  4 ++--
>  arch/sparc/kernel/iommu-common.c |  9 +++--
>  arch/sparc/kernel/iommu.c|  4 ++--
>  arch/sparc/kernel/pci_sun4v.c|  4 ++--
>  arch/x86/kernel/amd_gart_64.c|  4 ++--
>  drivers/parisc/ccio-dma.c|  4 ++--
>  drivers/parisc/sba_iommu.c   |  4 ++--
>  10 files changed, 26 insertions(+), 32 deletions(-)
>