Re: [PATCH v4] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Thu, 2023-03-23 at 17:33 +0100, Niklas Schnelle wrote: > We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O > Port access. In a future patch HAS_IOPORT=n will disable compilation of > the I/O accessor functions inb()/outb() and friends on architectures > which can not meaningfully support legacy I/O spaces such as s390. > > The following architectures do not select HAS_IOPORT: > > * ARC > * C-SKY > * Hexagon > * Nios II > * OpenRISC > * s390 > * User-Mode Linux > * Xtensa > > All other architectures select HAS_IOPORT at least conditionally. > > The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs > for HAS_IOPORT specific sections will be added in subsequent patches on > a per subsystem basis. > > Co-developed-by: Arnd Bergmann > Signed-off-by: Arnd Bergmann > Acked-by: Johannes Berg # for ARCH=um > Acked-by: Geert Uytterhoeven > Signed-off-by: Niklas Schnelle > --- > Note: This patch is the initial patch of a larger series[0]. This patch > introduces the HAS_IOPORT config option while the rest of the series adds > driver dependencies and the final patch removes inb() / outb() and friends on > platforms that don't support them. > > Thus each of the per-subsystem patches is independent from each other but > depends on this patch while the final patch depends on the whole series. Thus > splitting this initial patch off allows the per-subsytem HAS_IOPORT dependency > addition be merged separately via different trees without breaking the build. > > [0] > https://lore.kernel.org/lkml/20230314121216.413434-1-schne...@linux.ibm.com/ > > Changes since v3: > - List archs without HAS_IOPORT in commit message (Arnd) > - Select HAS_IOPORT for LoongArch (Arnd) > - Use "select HAS_IOPORT if (E)ISA || .." instead of a "depends on" for (E)ISA > for m68k and parisc > - Select HAS_IOPORT with config GSC on parisc (Arnd) > - Drop "depends on HAS_IOPORT" for um's config ISA (Johannes) > - Drop "depends on HAS_IOPORT" for config ISA on x86 and parisc where it is > always selected (Arnd) > Gentle ping. As far as I can tell this hasn't been picked to any tree sp far but also hasn't seen complains so I'm wondering if I should send a new version of the combined series of this patch plus the added HAS_IOPORT dependencies per subsystem or wait until this is picked up. Thanks, Niklas
[PATCH v4] Kconfig: introduce HAS_IOPORT option and select it as necessary
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O Port access. In a future patch HAS_IOPORT=n will disable compilation of the I/O accessor functions inb()/outb() and friends on architectures which can not meaningfully support legacy I/O spaces such as s390. The following architectures do not select HAS_IOPORT: * ARC * C-SKY * Hexagon * Nios II * OpenRISC * s390 * User-Mode Linux * Xtensa All other architectures select HAS_IOPORT at least conditionally. The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs for HAS_IOPORT specific sections will be added in subsequent patches on a per subsystem basis. Co-developed-by: Arnd Bergmann Signed-off-by: Arnd Bergmann Acked-by: Johannes Berg # for ARCH=um Acked-by: Geert Uytterhoeven Signed-off-by: Niklas Schnelle --- Note: This patch is the initial patch of a larger series[0]. This patch introduces the HAS_IOPORT config option while the rest of the series adds driver dependencies and the final patch removes inb() / outb() and friends on platforms that don't support them. Thus each of the per-subsystem patches is independent from each other but depends on this patch while the final patch depends on the whole series. Thus splitting this initial patch off allows the per-subsytem HAS_IOPORT dependency addition be merged separately via different trees without breaking the build. [0] https://lore.kernel.org/lkml/20230314121216.413434-1-schne...@linux.ibm.com/ Changes since v3: - List archs without HAS_IOPORT in commit message (Arnd) - Select HAS_IOPORT for LoongArch (Arnd) - Use "select HAS_IOPORT if (E)ISA || .." instead of a "depends on" for (E)ISA for m68k and parisc - Select HAS_IOPORT with config GSC on parisc (Arnd) - Drop "depends on HAS_IOPORT" for um's config ISA (Johannes) - Drop "depends on HAS_IOPORT" for config ISA on x86 and parisc where it is always selected (Arnd) arch/alpha/Kconfig | 1 + arch/arm/Kconfig| 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/loongarch/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/microblaze/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig| 1 + arch/riscv/Kconfig | 1 + arch/sh/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig| 1 + drivers/bus/Kconfig | 2 +- drivers/parisc/Kconfig | 1 + lib/Kconfig | 4 17 files changed, 20 insertions(+), 1 deletion(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 780d4673c3ca..a5c2b1aa46b0 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -27,6 +27,7 @@ config ALPHA select AUDIT_ARCH select GENERIC_CPU_VULNERABILITIES select GENERIC_SMP_IDLE_THREAD + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL select HAVE_MOD_ARCH_SPECIFIC select MODULES_USE_ELF_RELA diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..4acb5bc4b52a 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -70,6 +70,7 @@ config ARM select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1023e896d46b..b740019c4aee 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -145,6 +145,7 @@ config ARM64 select GENERIC_GETTIMEOFDAY select GENERIC_VDSO_TIME_NS select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_MOVE_PMD select HAVE_MOVE_PUD select HAVE_PCI diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index d7e4a24e8644..2e13ec8263b9 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -25,6 +25,7 @@ config IA64 select PCI_DOMAINS if PCI select PCI_MSI select PCI_SYSCALL if PCI + select HAS_IOPORT select HAVE_ASM_MODVERSIONS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_EXIT_THREAD diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 7fd51257e0ed..e1615dfb5437 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -80,6 +80,7 @@ config LOONGARCH select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL select GPIOLIB + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_MMAP_RND_BITS if MMU select HAVE_ARCH_SECCOMP_FILTER diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 82154952e574..40198a1ebe27 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -18,6 +18,7 @@ config M68K select GENERIC_CPU_DEVICES select GENERIC_IOMAP select GENERI
Re: [PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Tue, 2023-03-14 at 13:37 +0100, Johannes Berg wrote: > On Tue, 2023-03-14 at 13:11 +0100, Niklas Schnelle wrote: > > --- a/arch/um/Kconfig > > +++ b/arch/um/Kconfig > > @@ -56,6 +56,7 @@ config NO_IOPORT_MAP > > > > config ISA > > bool > > + depends on HAS_IOPORT > > > > config ISA here is already unselectable, and nothing ever does "select > ISA" (only in some other architectures), so is there much point in this? > > I'm not even sure why this exists at all. You're right there's not much point and I dropped this for v4. I agree that probably the whole "config ISA" could be removed if it's always false anyway but that seems out of scope for this patch. > > But anyway, adding a dependency to a always-false symbol doesn't make it > less always-false :-) > > Acked-by: Johannes Berg # for ARCH=um Thanks > > > Certainly will be nice to get rid of this cruft for architectures that > don't have it. > > johannes Yes, also, for s390 the broken NULL + port number access in the generic inb()/outb() currently causes the only remaining clang warning on defconfig builds.
Re: [PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Tue, 2023-03-14 at 13:11 +0100, Niklas Schnelle wrote: > We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O > Port access. In a future patch HAS_IOPORT=n will disable compilation of > the I/O accessor functions inb()/outb() and friends on architectures > which can not meaningfully support legacy I/O spaces such as s390. Also > add dependencies on HAS_IOPORT for the ISA and HAVE_EISA config options > as these busses always go along with HAS_IOPORT. > > The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs > for HAS_IOPORT specific sections will be added in subsequent patches on > a per subsystem basis. > > Co-developed-by: Arnd Bergmann > Signed-off-by: Niklas Schnelle > --- > @Arnd, I swear I asked you and then added Signed-off-bys for all these Co-developed-bys as suggested by checkpatch. Sadly that must have been during my failed attempt of converting to b4 prep / b4 send before sending this last Friday and then it got lost. It almost worked and is a very nice work flow except that b4 currently can only use a single list of To/Cc fields and for this treewide series that would probably hit mail server limits. Added it now. Thanks, Niklas
Re: [PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Tue, 2023-03-14 at 14:29 +0100, Arnd Bergmann wrote: > On Tue, Mar 14, 2023, at 13:11, Niklas Schnelle wrote: > > We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O > > Port access. In a future patch HAS_IOPORT=n will disable compilation of > > the I/O accessor functions inb()/outb() and friends on architectures > > which can not meaningfully support legacy I/O spaces such as s390. Also > > add dependencies on HAS_IOPORT for the ISA and HAVE_EISA config options > > as these busses always go along with HAS_IOPORT. > > > > The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs > > for HAS_IOPORT specific sections will be added in subsequent patches on > > a per subsystem basis. > > I think it would be helpful to enumerate which architectures > do not get HAS_IOPORT added, as they will be affected more. > > > Co-developed-by: Arnd Bergmann > > Signed-off-by: Niklas Schnelle > > If there are no objections, I could send this first patch for the > asm-generic tree as a preparation for 6.3, so we are able to merge > the other patches through subsystem maintainer tree for 6.4. > > arch/loongarch/ will now also need to select HAS_IOPORT > uncontitionally, this architecture was added after you > sent v2. Ah right. Added "select HAS_IOPORT" for LoongArch. > > > diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig > > index a98940e64243..5eeacc72e4da 100644 > > --- a/arch/parisc/Kconfig > > +++ b/arch/parisc/Kconfig > > @@ -47,6 +47,7 @@ config PARISC > > select MODULES_USE_ELF_RELA > > select CLONE_BACKWARDS > > select TTY # Needed for pdc_cons.c > > + select HAS_IOPORT if PCI > > It's also needed for EISA and I think you should select it > from CONFIG_GSC in drivers/parisc/Kconfig for this purpose. > > This could also be 'select HAS_IOPORT if PCI || EISA', but > that would require removing the 'depends on HAS_IOPORT' > under drivers/eisa/. I did use "select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA" in m68k so I think ideally we would handle both in the same way. I don't have a strong preference but I think the "select HAS_IOPORT if ..." puts it all in a single place which is nice. Also I think this would make it more similar architectures with unconditional HAS_IOPORT that thus don't need "depends on HAS_IOPORT" in their "config ISA" either. As also pointed by your comment below for x86. So will try to go this route. > > > select HAVE_DEBUG_STACKOVERFLOW > > select HAVE_ARCH_AUDITSYSCALL > > select HAVE_ARCH_HASH > > @@ -131,6 +132,7 @@ config STACKTRACE_SUPPORT > > > > config ISA_DMA_API > > bool > > + depends on HAS_IOPORT > > > > This line is not really needed since there is no way to > enable ISA_DMA_API. Removed > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > > index a6c4407d3ec8..f7de646c074a 100644 > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -188,6 +188,7 @@ config PPC > > select GENERIC_SMP_IDLE_THREAD > > select GENERIC_TIME_VSYSCALL > > select GENERIC_VDSO_TIME_NS > > + select HAS_IOPORT if PCI > > select HAVE_ARCH_AUDITSYSCALL > > select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP > > select HAVE_ARCH_HUGE_VMAP if PPC_RADIX_MMU || PPC_8xx > > @@ -1070,7 +1071,6 @@ menu "Bus options" > > > > config ISA > > bool "Support for ISA-bus hardware" > > - depends on PPC_CHRP > > select PPC_I8259 > > help > > Find out whether you have ISA slots on your motherboard. ISA is the > > This line looks wrong, I think we should keep that dependency. > Did you get a circular dependency if you leave it in? I don't recall why this was removed. I guess it happened when I was experimenting with adding "depends on HAS_IOPORT" for the ISA config options but that ultimately lead to circular dependencies, must have messed up when removing this here. > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index a825bf031f49..634dd42532f3 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -162,6 +162,7 @@ config X86 > > select GUP_GET_PXX_LOW_HIGH if X86_PAE > > select HARDIRQS_SW_RESEND > > select HARDLOCKUP_CHECK_TIMESTAMP if X86_64 > > + select HAS_IOPORT > > select HAVE_ACPI_APEI if ACPI > > select HAVE_ACPI_APEI_NMI if ACPI > > select HAVE_ALIGNED_STRUCT_PAG
[PATCH v3 01/38] Kconfig: introduce HAS_IOPORT option and select it as necessary
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O Port access. In a future patch HAS_IOPORT=n will disable compilation of the I/O accessor functions inb()/outb() and friends on architectures which can not meaningfully support legacy I/O spaces such as s390. Also add dependencies on HAS_IOPORT for the ISA and HAVE_EISA config options as these busses always go along with HAS_IOPORT. The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs for HAS_IOPORT specific sections will be added in subsequent patches on a per subsystem basis. Co-developed-by: Arnd Bergmann Signed-off-by: Niklas Schnelle --- arch/alpha/Kconfig | 1 + arch/arm/Kconfig| 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/microblaze/Kconfig | 1 + arch/mips/Kconfig | 2 ++ arch/parisc/Kconfig | 2 ++ arch/powerpc/Kconfig| 2 +- arch/riscv/Kconfig | 1 + arch/sh/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/um/Kconfig | 1 + arch/x86/Kconfig| 2 ++ drivers/bus/Kconfig | 2 +- drivers/eisa/Kconfig| 1 + lib/Kconfig | 4 lib/Kconfig.kgdb| 3 ++- 18 files changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 780d4673c3ca..a5c2b1aa46b0 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -27,6 +27,7 @@ config ALPHA select AUDIT_ARCH select GENERIC_CPU_VULNERABILITIES select GENERIC_SMP_IDLE_THREAD + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL select HAVE_MOD_ARCH_SPECIFIC select MODULES_USE_ELF_RELA diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..4acb5bc4b52a 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -70,6 +70,7 @@ config ARM select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1023e896d46b..b740019c4aee 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -145,6 +145,7 @@ config ARM64 select GENERIC_GETTIMEOFDAY select GENERIC_VDSO_TIME_NS select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_MOVE_PMD select HAVE_MOVE_PUD select HAVE_PCI diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index d7e4a24e8644..2e13ec8263b9 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -25,6 +25,7 @@ config IA64 select PCI_DOMAINS if PCI select PCI_MSI select PCI_SYSCALL if PCI + select HAS_IOPORT select HAVE_ASM_MODVERSIONS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_EXIT_THREAD diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 82154952e574..40198a1ebe27 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -18,6 +18,7 @@ config M68K select GENERIC_CPU_DEVICES select GENERIC_IOMAP select GENERIC_IRQ_SHOW + select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA select HAVE_ARCH_SECCOMP select HAVE_ARCH_SECCOMP_FILTER select HAVE_ASM_MODVERSIONS diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index cc88af6fa7a4..211f338d6235 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -21,6 +21,7 @@ config MICROBLAZE select GENERIC_IRQ_SHOW select GENERIC_PCI_IOMAP select GENERIC_SCHED_CLOCK + select HAS_IOPORT if PCI select HAVE_ARCH_HASH select HAVE_ARCH_KGDB select HAVE_ARCH_SECCOMP diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index e2f3ca73f40d..64760fcd7b52 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -47,6 +47,7 @@ config MIPS select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL select GUP_GET_PXX_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT + select HAS_IOPORT if !NO_IOPORT_MAP select HAVE_ARCH_COMPILER_H select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KGDB if MIPS_FP_SUPPORT @@ -3120,6 +3121,7 @@ config PCI_DRIVERS_LEGACY # users to choose the right thing ... # config ISA + depends on HAS_IOPORT bool config TC diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index a98940e64243..5eeacc72e4da 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -47,6 +47,7 @@ config PARISC select MODULES_USE_ELF_RELA select CLONE_BACKWARDS select TTY # Needed for pdc_cons.c + select HAS_IOPORT if PCI select HAVE_DEBUG_STACKOVERFLOW select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HASH @@ -131,6 +132,7 @@ config STA
Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Fri, 2022-05-06 at 14:53 +0200, Arnd Bergmann wrote: > On Fri, May 6, 2022 at 2:27 PM Maciej W. Rozycki wrote: > > On Fri, 6 May 2022, Arnd Bergmann wrote: > > > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > > > pattern put on the bus/in the TLP in the address phase. So what is > > > > there > > > > inherent to the s390 architecture that prevents that different bit > > > > pattern > > > > from being used? > > > > > > The hardware design for PCI on s390 is very different from any other > > > architecture, and more abstract. Rather than implementing MMIO register > > > access as pointer dereference, this is a separate CPU instruction that > > > takes a device/bar plus offset as arguments rather than a pointer, and > > > Linux encodes this back into a fake __iomem token. > > > > OK, that seems to me like a reasonable and quite a clean design (on the > > hardware side). > > > > So what happens if the instruction is given an I/O rather than memory BAR > > as the relevant argument? Is the address space indicator bit (bit #0) > > simply ignored or what? > > Not sure. My best guess is that it would actually work as you'd expect, > but is deliberately left out of the architecture specification so they don't > have to to validate the correctness. Note that only a small number of > PCIe cards are actually supported by IBM, and I think the firmware > only passes devices to the OS if they are whitelisted. > > Arnd Yes, though in Linux we do try hard to work with whatever is plugged in. We did benefit from this in the past working with a new NIC from a different vendor with 0 additional changes. Also you can use vfio-pci to pass-through arbitrary PCI devices to a QEMU emulating s390x.
Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Fri, 2022-05-06 at 13:27 +0100, Maciej W. Rozycki wrote: > On Fri, 6 May 2022, Arnd Bergmann wrote: > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > > pattern put on the bus/in the TLP in the address phase. So what is there > > > inherent to the s390 architecture that prevents that different bit pattern > > > from being used? > > > > The hardware design for PCI on s390 is very different from any other > > architecture, and more abstract. Rather than implementing MMIO register > > access as pointer dereference, this is a separate CPU instruction that > > takes a device/bar plus offset as arguments rather than a pointer, and > > Linux encodes this back into a fake __iomem token. > > OK, that seems to me like a reasonable and quite a clean design (on the > hardware side). > > So what happens if the instruction is given an I/O rather than memory BAR > as the relevant argument? Is the address space indicator bit (bit #0) > simply ignored or what? See my answer to Arnd for some more background but there simply isn't a way to formulate an I/O access. In the old style PCI instructions the BAR number and the function handle are put in a register before the access. BAR number 15 is used to access config space. If there is no BAR for that number the instruction fails with a non-zero CC. > > > > But that has nothing to do with the presence or absence of any specific > > > processor instructions. It's just a limitation of bus glue. So I guess > > > it's just that all PCI/PCIe glue logic implementations for s390 have such > > > a limitation, right? > > > > There are separate instructions for PCI memory and config space, but > > no instructions for I/O space, or for non-PCI MMIO that it could be mapped > > into. > > The PCI configuration space was retrofitted into x86 systems (and is > accessed in an awkward manner with them), but with a new design such a > clean approach is most welcome IMHO. Thank you for your explanation. > > Maciej Well our design is a retrofit too considering s390x is a direct decendent of IBM System/360 which one could argue to have been the first ISA. But yes as PCI support was only added with PCIe and with a machine level hypervisor already in place we do get shielded a lot from the gritty details.
Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Fri, 2022-05-06 at 13:33 +0200, Arnd Bergmann wrote: > On Fri, May 6, 2022 at 12:20 PM Maciej W. Rozycki wrote: > > On Thu, 5 May 2022, Arnd Bergmann wrote: > > I think I'm missing something here. IIUC we're talking about a PCI/PCIe > > bus used with s390 hardware, right? > > > > (It has to be PCI/PCIe, because other than x86/IA-64 host buses there are > > only PCI/PCIe and EISA/ISA buses out there that define I/O access cycles > > and EISA/ISA have long been obsoleted except perhaps from some niche use.) > > > > If this is PCI/PCIe indeed, then an I/O access is just a different bit > > pattern put on the bus/in the TLP in the address phase. So what is there > > inherent to the s390 architecture that prevents that different bit pattern > > from being used? > > The hardware design for PCI on s390 is very different from any other > architecture, and more abstract. Rather than implementing MMIO register > access as pointer dereference, this is a separate CPU instruction that > takes a device/bar plus offset as arguments rather than a pointer, and > Linux encodes this back into a fake __iomem token. Correct, we have since gained new PCI load/store instructions that actually do take a virtual address that gets address translated to a "physical address" for the PCI BARs. The "physical address" we get from the platform (again via special instructions). I put "physical address" in quotes because while they are conceptually physical addresses and they are translated to by MMU translation tables, accessing their virtual mapping with any instruction other then the special PCI load/store instructions will cause addressing exceptions. So we still don't have real MMIO but something that looks a lot more like MMIO than we used to. > > > If anything, I could imagine the same limitation as with current POWER9 > > implementations, that is whatever glue is used to wire PCI/PCIe to the > > rest of the system does not implement a way to use said bit pattern (which > > has nothing to do with the POWER9 processor instruction set). > > > > But that has nothing to do with the presence or absence of any specific > > processor instructions. It's just a limitation of bus glue. So I guess > > it's just that all PCI/PCIe glue logic implementations for s390 have such > > a limitation, right? > > There are separate instructions for PCI memory and config space, but > no instructions for I/O space, or for non-PCI MMIO that it could be mapped > into. > >Arnd The config space is still accessed with the old style PCI load/store instructions via a special extra BAR. But yes overall on s390x we can only access PCI(e) devices via special instructions not via real MMIO and also the OS has no direct access to the registers of the PHB which are only accessible to firmware. Maybe as a bit of further background it's also important to note that on s390x all Operating Systems run inside a hypervisor. On the lowest level any OS can run this is a non-paging machine level hypervisor. For PCI that means that we always have a kind of pass-through access though without paging and hardware support for the memory partitioning this is of course relatively simple.
Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote: > > On Thu, 5 May 2022, Bjorn Helgaas wrote: > > > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote: > > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas wrote: > > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > > > > > The main goal is to avoid c), which is what happens on s390, but > > > > > can also happen elsewhere. Catching b) would be nice as well, > > > > > but is much harder to do from generic code as you'd need an > > > > > architecture specific inline asm statement to insert a ex_table > > > > > fixup, or a runtime conditional on each access. > > > > > > > > Or s390 could implement its own inb(). > > > > > > > > I'm hearing that generic powerpc kernels have to run both on machines > > > > that have I/O port space and those that don't. That makes me think > > > > s390 could do something similar. > > > > > > No, this is actually the current situation, and it makes absolutely no > > > sense. s390 has no way of implementing inb()/outb() because there > > > are no instructions for it and it cannot tunnel them through a virtual > > > address mapping like on most of the other architectures. (it has special > > > instructions for accessing memory space, which is not the same as > > > a pointer dereference here). > > > > > > The existing implementation gets flagged as a NULL pointer dereference > > > by a compiler warning because it effectively is. > > > > I think s390 currently uses the inb() in asm-generic/io.h, i.e., > > "__raw_readb(PCI_IOBASE + addr)". I understand that's a NULL pointer > > dereference because the default PCI_IOBASE is 0. > > > > I mooted a s390 inb() implementation like "return ~0" because that's > > what happens on most arches when there's no device to respond to the > > inb(). > > > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter > > drivers that use I/O ports in some cases but not others. But maybe > > it's the most practical way. > > > > Do you mean, "the most practical way to avoid a compiler warning on s390"? > What about "#pragma GCC diagnostic ignored"? This actually happens with clang. Apart from that, I think this would also fall under the same argument as the original patch Linus unpulled. We would just paint over someting that we know at compile time won't work: https://lore.kernel.org/lkml/CAHk-=wg80je=k7madf4e7wrrnp37e3qh6y10svhdc7o8sz_...@mail.gmail.com/
Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Thu, 2022-05-05 at 14:53 -0500, Bjorn Helgaas wrote: > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote: > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas wrote: > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote: > > > > The main goal is to avoid c), which is what happens on s390, but > > > > can also happen elsewhere. Catching b) would be nice as well, > > > > but is much harder to do from generic code as you'd need an > > > > architecture specific inline asm statement to insert a ex_table > > > > fixup, or a runtime conditional on each access. > > > > > > Or s390 could implement its own inb(). > > > > > > I'm hearing that generic powerpc kernels have to run both on machines > > > that have I/O port space and those that don't. That makes me think > > > s390 could do something similar. > > > > No, this is actually the current situation, and it makes absolutely no > > sense. s390 has no way of implementing inb()/outb() because there > > are no instructions for it and it cannot tunnel them through a virtual > > address mapping like on most of the other architectures. (it has special > > instructions for accessing memory space, which is not the same as > > a pointer dereference here). > > > > The existing implementation gets flagged as a NULL pointer dereference > > by a compiler warning because it effectively is. > > I think s390 currently uses the inb() in asm-generic/io.h, i.e., > "__raw_readb(PCI_IOBASE + addr)". I understand that's a NULL pointer > dereference because the default PCI_IOBASE is 0. > > I mooted a s390 inb() implementation like "return ~0" because that's > what happens on most arches when there's no device to respond to the > inb(). > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter > drivers that use I/O ports in some cases but not others. But maybe > it's the most practical way. > > Bjorn I fear such stubs are kind of equivalent to my previous patch doing the same in asm-generic/io.h that was pulled and then unpulled by Linus. Maybe it would be slightly different if instead of a warning outX() would just be a NOP and inX() just returned ~0 but we're in essence pretending that we have these functions when we know they are nonsense. Another argument I see is that as shown by POWER9 we might start to see more platforms that just can't do I/O port access. E.g. I would also be surprised if Apple's M1 has I/O port access. Sooner or later I expect distributions on some platforms to only support such systems. For example on ppc a server distribution might only support IBM POWER without I/O port support before too long. Then having HAS_IOPORT allows to get rid of drivers that won't work anyway. There are also reports of probing a driver with I/O ports causing a system crash on systems without I/O port support. For example in this answer by John Garry (added so he may supply more information): https://lore.kernel.org/lkml/db043b76-880d-5fad-69cf-96abcd9cd...@huawei.com/
Re: [RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Wed, 2022-05-04 at 23:31 +0200, Arnd Bergmann wrote: > On Wed, May 4, 2022 at 11:08 PM Bjorn Helgaas wrote: > > On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote: > > > We introduce a new HAS_IOPORT Kconfig option to indicate support for > > > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation > > > of the I/O accessor functions inb()/outb() and friends on architectures > > > which can not meaningfully support legacy I/O spaces such as s390 or > > > where such support is optional. > > > > So you plan to drop inb()/outb() on architectures where I/O port space > > is optional? So even platforms that have I/O port space may not be > > able to use it? > > > > This feels like a lot of work where the main benefit is to keep > > Kconfig from offering drivers that aren't of interest on s390. > > > > Granted, there may be issues where inb()/outb() does the wrong thing > > such as dereferencing null pointers when I/O port space isn't > > implemented. I think that's a defect in inb()/outb() and could be > > fixed there. > > The current implementation in asm-generic/io.h implements inb()/outb() > using readb()/writeb() with a fixed architecture specific offset. > > There are three possible things that can happen here: > > a) there is a host bridge driver that maps its I/O ports to this window, > and everything works > b) the address range is reserved and accessible but no host bridge >driver has mapped its registers there, so an access causes a >page fault > c) the architecture does not define an offset, and accessing low I/O > ports ends up as a NULL pointer dereference > > The main goal is to avoid c), which is what happens on s390, but > can also happen elsewhere. Catching b) would be nice as well, > but is much harder to do from generic code as you'd need an > architecture specific inline asm statement to insert a ex_table > fixup, or a runtime conditional on each access. > > Arnd Yes and to add to this, we did try a local solution in inb()/outb() before. This added a warning when they are used and we know at compile time that we're dealing with case c). This approach was nacked by Linus though as we were turning a compile time known broken case into a runtime one: https://lore.kernel.org/lkml/CAHk-=wg80je=k7madf4e7wrrnp37e3qh6y10svhdc7o8sz_...@mail.gmail.com/ I do agree with this assesment and think this is the rightâ„¢ approach but it is more churn as can be seen by the size of this series. I think longer term it could be valuable though especially if more platforms phase out I/O port support like POWER9 for which this also allows filtering what drivers will never work.
[RFC v2 01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O Port access. In a future patch HAS_IOPORT=n will disable compilation of the I/O accessor functions inb()/outb() and friends on architectures which can not meaningfully support legacy I/O spaces such as s390 or where such support is optional. The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs for HAS_IOPORT specific sections will be added in subsequent patches on a per subsystem basis. Co-developed-by: Arnd Bergmann Signed-off-by: Niklas Schnelle --- arch/alpha/Kconfig | 1 + arch/arm/Kconfig| 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/microblaze/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig| 1 + arch/riscv/Kconfig | 1 + arch/sh/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig| 1 + drivers/bus/Kconfig | 2 +- lib/Kconfig | 4 lib/Kconfig.kgdb| 1 + 16 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 7d0d26b5b3f5..2b9cf1b0bdb8 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -27,6 +27,7 @@ config ALPHA select AUDIT_ARCH select GENERIC_CPU_VULNERABILITIES select GENERIC_SMP_IDLE_THREAD + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL select HAVE_MOD_ARCH_SPECIFIC select MODULES_USE_ELF_RELA diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 2e8091e2d8a8..603ce00033a5 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -69,6 +69,7 @@ config ARM select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 20ea89d9ac2f..234dc89a7654 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -136,6 +136,7 @@ config ARM64 select GENERIC_GETTIMEOFDAY select GENERIC_VDSO_TIME_NS select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_MOVE_PMD select HAVE_MOVE_PUD select HAVE_PCI diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index cb93769a9f2a..0fffe5130a80 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -25,6 +25,7 @@ config IA64 select PCI_DOMAINS if PCI select PCI_MSI select PCI_SYSCALL if PCI + select HAS_IOPORT select HAVE_ASM_MODVERSIONS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_EXIT_THREAD diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 936cce42ae9a..54bf0a40c2f0 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -18,6 +18,7 @@ config M68K select GENERIC_CPU_DEVICES select GENERIC_IOMAP select GENERIC_IRQ_SHOW + select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA select HAVE_ASM_MODVERSIONS select HAVE_DEBUG_BUGVERBOSE select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_HAS_NO_UNALIGNED diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index 8cf429ad1c84..966a6682f1fc 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -21,6 +21,7 @@ config MICROBLAZE select GENERIC_IRQ_SHOW select GENERIC_PCI_IOMAP select GENERIC_SCHED_CLOCK + select HAS_IOPORT if PCI select HAVE_ARCH_HASH select HAVE_ARCH_KGDB select HAVE_ARCH_SECCOMP diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index de3b32a507d2..4c55df08d6f1 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -47,6 +47,7 @@ config MIPS select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT + select HAS_IOPORT select HAVE_ARCH_COMPILER_H select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KGDB if MIPS_FP_SUPPORT diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 52e550b45692..741c5c64c173 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -46,6 +46,7 @@ config PARISC select MODULES_USE_ELF_RELA select CLONE_BACKWARDS select TTY # Needed for pdc_cons.c + select HAS_IOPORT if PCI || EISA select HAVE_DEBUG_STACKOVERFLOW select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HASH diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 174edabb74fa..7133cc35b777 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -182,6 +182,7 @@ config PPC select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL select GENERIC_VDSO_TIME_NS + select HAS_
Re: [RFC 02/32] Kconfig: introduce HAS_IOPORT option and select it as necessary
On Thu, 2021-12-30 at 16:44 +1300, Michael Schmitz wrote: > Hi Arnd, > > Am 30.12.2021 um 14:48 schrieb Arnd Bergmann: > > On Tue, Dec 28, 2021 at 11:15 PM Michael Schmitz > > wrote: > > > Am 29.12.2021 um 16:41 schrieb Arnd Bergmann: > > > > On Tue, Dec 28, 2021 at 8:20 PM Michael Schmitz > > > > wrote: ---8<--- > > > What some other architectures do is to rely on inb()/outb() to have a > > zero-based offset, and use an io_offset in PCI buses to ensure that a > > low port number on the bus gets translated into a pointer value for the > > virtual mapping in the kernel, which is then represented as an unsigned > > int. > > M54xx does just that for Coldfire: > > arch/m68k/include/asm/io_no.h: > #define PCI_IO_PA 0xf800 /* Host physical address */ > > (used to set PCI BAR mappings, so matches your definition above). > > All other (MMU) m68k users of inb()/outb() apply an io_offset in the > platform specific address translation: > > ---8<--- > So as long as support for any of the m68k PCI or ISA bridges is selected > in the kernel config, the appropriate IO space mapping is applied. If no > support for PCI or ISA bridges is selected, we already fall back to zero > offset mapping (but as far as I can tell, it shouldn't be possible to > build a kernel without bridge support but drivers that require it). > > > As this is indistinguishable from architectures that just don't have > > a base address for I/O ports (we unfortunately picked 0 as the default > > PCI_IOBASE value), my suggestion was to start marking architectures > > that may have this problem as using HAS_IOPORT in order to keep > > the existing behavior unchanged. If m68k does not suffer from this, > > making HAS_IOPORT conditional on those config options that actually > > need it would of course be best. > > Following your description, HAS_IOPORT would be required for neither of > PCI, ISA or ATARI_ROM_ISA ?? > No, HAS_IOPORT being set just means that inb() etc. exist and are functional be it as special instructions like on x86 or via an I/O address offset. As I understand it if you do have PCI, ISA or ATARI_ROM_ISA they are functional. If none of them are set and your zero offset mapping means these accessors can't actually be used you could make the declerations ifdeffed on CONFIG_HAS_IOPORT to detect the cases where somone managed to build drivers that require them and that would result in a compile time error instead of silently, or with a NULL pointer warning, compiling code that won't work.
[RFC 02/32] Kconfig: introduce HAS_IOPORT option and select it as necessary
We introduce a new HAS_IOPORT Kconfig option to gate support for I/O port access. In a future patch HAS_IOPORT=n will disable compilation of the I/O accessor functions inb()/outb() and friends on architectures which can not meaningfully support legacy I/O spaces. On these platforms inb()/outb() etc are currently just stubs in asm-generic/io.h which when called will cause a NULL pointer access which some compilers actually detect and warn about. The dependencies on HAS_IOPORT in drivers as well as ifdefs for HAS_IOPORT specific sections will be added in subsequent patches on a per subsystem basis. Then a final patch will ifdef the I/O access functions on HAS_IOPORT thus turning any use not gated by HAS_IOPORT into a compile-time warning. Link: https://lore.kernel.org/lkml/CAHk-=wg80je=k7madf4e7wrrnp37e3qh6y10svhdc7o8sz_...@mail.gmail.com/ Co-developed-by: Arnd Bergmann Signed-off-by: Arnd Bergmann Signed-off-by: Niklas Schnelle --- arch/alpha/Kconfig | 1 + arch/arc/Kconfig| 1 + arch/arm/Kconfig| 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/microblaze/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig| 1 + arch/riscv/Kconfig | 1 + arch/sh/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig| 1 + drivers/bus/Kconfig | 2 +- lib/Kconfig | 4 lib/Kconfig.kgdb| 1 + 17 files changed, 20 insertions(+), 1 deletion(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 4e87783c90ad..472a0c5e4c2f 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -28,6 +28,7 @@ config ALPHA select AUDIT_ARCH select GENERIC_CPU_VULNERABILITIES select GENERIC_SMP_IDLE_THREAD + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL select HAVE_MOD_ARCH_SPECIFIC select MODULES_USE_ELF_RELA diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index b4ae6058902a..b3911ebbd237 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -27,6 +27,7 @@ config ARC select GENERIC_PENDING_IRQ if SMP select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD + select HAS_IOPORT select HAVE_ARCH_KGDB select HAVE_ARCH_TRACEHOOK select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARC_MMU_V4 diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index c2724d986fa0..605709b6eecb 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -66,6 +66,7 @@ config ARM select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17..a8b199a40c8f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -135,6 +135,7 @@ config ARM64 select GENERIC_GETTIMEOFDAY select GENERIC_VDSO_TIME_NS select HARDIRQS_SW_RESEND + select HAS_IOPORT select HAVE_MOVE_PMD select HAVE_MOVE_PUD select HAVE_PCI diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 1e33666fa679..672aa2a88b19 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -24,6 +24,7 @@ config IA64 select PCI_DOMAINS if PCI select PCI_MSI select PCI_SYSCALL if PCI + select HAS_IOPORT select HAVE_ASM_MODVERSIONS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_EXIT_THREAD diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 0b50da08a9c5..926d97c33828 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -16,6 +16,7 @@ config M68K select GENERIC_CPU_DEVICES select GENERIC_IOMAP select GENERIC_IRQ_SHOW + select HAS_IOPORT select HAVE_AOUT if MMU select HAVE_ASM_MODVERSIONS select HAVE_DEBUG_BUGVERBOSE diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index 59798e43cdb0..213ef2940079 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -21,6 +21,7 @@ config MICROBLAZE select GENERIC_IRQ_SHOW select GENERIC_PCI_IOMAP select GENERIC_SCHED_CLOCK + select HAS_IOPORT if PCI select HAVE_ARCH_HASH select HAVE_ARCH_KGDB select HAVE_ARCH_SECCOMP diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 0215dc1529e9..87e6e7c29493 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -47,6 +47,7 @@ config MIPS select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT + select HAS_IOPORT select HAVE_ARCH_COMPILER_H select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KGDB if
Re: [PATCH] pci: Rename pcibios_add_device to match
On Tue, 2021-09-14 at 01:27 +1000, Oliver O'Halloran wrote: > The general convention for pcibios_* hooks is that they're named after > the corresponding pci_* function they provide a hook for. The exception > is pcibios_add_device() which provides a hook for pci_device_add(). This > has been irritating me for years so rename it. > > Also, remove the export of the microblaze version. The only caller > must be compiled as a built-in so there's no reason for the export. > > Signed-off-by: Oliver O'Halloran > --- > arch/microblaze/pci/pci-common.c | 3 +-- > arch/powerpc/kernel/pci-common.c | 2 +- > arch/powerpc/platforms/powernv/pci-sriov.c | 2 +- > arch/s390/pci/pci.c| 2 +- > arch/sparc/kernel/pci.c| 2 +- > arch/x86/pci/common.c | 2 +- > drivers/pci/pci.c | 4 ++-- > drivers/pci/probe.c| 4 ++-- > include/linux/pci.h| 2 +- > 9 files changed, 11 insertions(+), 12 deletions(-) > > .. snip .. > diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c > index e7e6788d75a8..ded3321b7208 100644 > --- a/arch/s390/pci/pci.c > +++ b/arch/s390/pci/pci.c > @@ -561,7 +561,7 @@ static void zpci_cleanup_bus_resources(struct zpci_dev > *zdev) > zdev->has_resources = 0; > } > > -int pcibios_add_device(struct pci_dev *pdev) > +int pcibios_device_add(struct pci_dev *pdev) > { > struct zpci_dev *zdev = to_zpci(pdev); > struct resource *res; > .. snip .. > I agree with your assesment this is indeed confusing. Interestingly pcibios_release_device() also doesn't follow the convention exactly as it is called from pci_release_dev() but at least that isn't very confusing. So for the arch/s390/pci bit: Acked-by: Niklas Schnelle
[PATCH 1/1] powerpc: Drop superfluous pci_dev_is_added() calls
On powerpc, pci_dev_is_added() is called as part of SR-IOV fixups that are done under pcibios_add_device() which in turn is only called in pci_device_add() whih is called when a PCI device is scanned. Now pci_dev_assign_added() is called in pci_bus_add_device() which is only called after scanning the device. Thus pci_dev_is_added() is always false and can be dropped. Signed-off-by: Niklas Schnelle --- arch/powerpc/platforms/powernv/pci-sriov.c | 6 -- arch/powerpc/platforms/pseries/setup.c | 3 +-- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c index 28aac933a439..deddbb233fde 100644 --- a/arch/powerpc/platforms/powernv/pci-sriov.c +++ b/arch/powerpc/platforms/powernv/pci-sriov.c @@ -9,9 +9,6 @@ #include "pci.h" -/* for pci_dev_is_added() */ -#include "../../../../drivers/pci/pci.h" - /* * The majority of the complexity in supporting SR-IOV on PowerNV comes from * the need to put the MMIO space for each VF into a separate PE. Internally @@ -228,9 +225,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) void pnv_pci_ioda_fixup_iov(struct pci_dev *pdev) { - if (WARN_ON(pci_dev_is_added(pdev))) - return; - if (pdev->is_virtfn) { struct pnv_ioda_pe *pe = pnv_ioda_get_pe(pdev); diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index f79126f16258..2188054470c1 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -74,7 +74,6 @@ #include #include "pseries.h" -#include "../../../../drivers/pci/pci.h" DEFINE_STATIC_KEY_FALSE(shared_processor); EXPORT_SYMBOL(shared_processor); @@ -750,7 +749,7 @@ static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev) const int *indexes; struct device_node *dn = pci_device_to_OF_node(pdev); - if (!pdev->is_physfn || pci_dev_is_added(pdev)) + if (!pdev->is_physfn) return; /*Firmware must support open sriov otherwise dont configure*/ indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL); -- 2.25.1
[PATCH 0/1] Arch use of pci_dev_is_added()
Hi Bjorn, Hi Michael, In my proposal to make pci_dev_is_added() more regularly usable by arch code you mentioned[0] that you believe the uses in arch/powerpc are not necessary anymore. From code reading I agree and so does Oliver O'Halloran[1]. So as promised here is a patch removing them. I only compile tested this as I don't have access to a powerpc system. I've also looked a bit more into our use in s390 and as dicussed previously I don't think we can cleanly get rid of the existing one in arch/s390/pci_sysfs.c:recover_store() because we need to distinguish an already removed pdev just by looking at the pdev itself. As for new uses I think in the upcoming automatic recovery code we can rely on the fact that a removed device has pdev->driver == NULL and don't need pci_dev_is_added() but it would make things clearer. I also noticed that before commit 44bda4b7d26e9 ("PCI: Fix is_added/is_busmaster race condition") there was simply a pdev->is_added flag that was cleanly accessible by arch code. So I wanted to ask for your advice. Thanks, Niklas [0] https://lore.kernel.org/lkml/20210825190444.GA3593752@bjorn-Precision-5520/ [1] https://lore.kernel.org/lkml/caosf1cfyuf9faesnparj+7w0mktpvtcm8vxjhdsfsndc6k_...@mail.gmail.com/ Niklas Schnelle (1): powerpc: Drop superfluous pci_dev_is_added() calls arch/powerpc/platforms/powernv/pci-sriov.c | 6 -- arch/powerpc/platforms/pseries/setup.c | 3 +-- 2 files changed, 1 insertion(+), 8 deletions(-) -- 2.25.1
Re: [PATCH 0/5] s390/pci: automatic error recovery
On Wed, 2021-09-08 at 11:37 +1000, Oliver O'Halloran wrote: > On Tue, Sep 7, 2021 at 10:21 PM Niklas Schnelle > wrote: > > On Tue, 2021-09-07 at 10:45 +0200, Niklas Schnelle wrote: > > > On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote: > > > > On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle > > > > wrote: > > > > > Patch 3 I already sent separately resulting in the discussion below > > > > > but without > > > > > a final conclusion. > > > > > > > > > > https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/ > > > > > > > > > > I believe even though there were some doubts about the use of > > > > > pci_dev_is_added() by arch code the existing uses as well as the use > > > > > in the > > > > > final patch of this series warrant this export. > > > > > > > > The use of pci_dev_is_added() in arch/powerpc was because in the past > > > > pci_bus_add_device() could be called before pci_device_add(). That was > > > > fixed a while ago so It should be safe to remove those calls now. > > > > > > Hmm, ok that confirms Bjorns suspicion and explains how it came to be. > > > I can certainly sent a patch for that. This would then leave only the > > > existing use in s390 which I added because of a dead lock prevention > > > and explained here: > > > https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.ca...@linux.ibm.com/ > > > > > > Plus the need to use it in the recovery code of this series. I think in > > > the EEH code the need for a similar check is alleviated by the checks > > > in the beginning of > > > arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially > > > eeh_slot_presence_check() which checks presence via the hotplug slot. > > > I guess we could use our own state tracking in a similar way but felt > > > like pci_dev_is_added() is the more logical choice. > > The slot check is mainly there to prevent attempts to "recover" > devices that have been surprise removed (i.e NVMe hot-unplug). The > actual recovery process operates off the eeh_pe tree which is frozen > in place when an error is detected. If a pci_dev is added or removed > it's not really a problem since those are only ever looked at when > notifying drivers which is done with the rescan_remove lock held. Thanks for the explanation. > That > said, I wouldn't really encourage anyone to follow the EEH model since > it's pretty byzantine. > > > Looking into this again, I think we actually can't easily track this > > state ourselves outside struct pci_dev. The reason for this is that > > when e.g. arch/s390/pci/pci_sysfs.c:recover_store() removes the struct > > pci_dev and scans it again the new struct pci_dev re-uses the same > > struct zpci_dev because from a platform point of view the PCI device > > was never removed but only disabled and re-enabled. Thus we can only > > distinguish the stale struct pci_dev by looking at things stored in > > struct pci_dev itself. > > IMO the real problem is removing and re-adding the pci_dev. I think > it's something that's done largely because the PCI core doesn't really > provide any better mechanism for getting a device back into a > known-good state so it's abused to implement error recovery. This is > something that's always annoyed me since it conflates recovery with > hotplug. After a hot-(un)plug we might have a different device or no > device. In the recovery case we expect to start and end with the same > device. Why not apply the same logic to the pci_dev? For us there are two cases. First The existing /sys/bus/pci/devices//recover attribute. This does the pci_dev remove and re-add that you mention and thus we end up with a ne pci_dev afterwards and I agree that is kind of a dumb way to recover which (too?) closely resembles unplug/re-plug. Secondly the automatic error recovery added in this series. Here we only attempt recovery if we have a driver bound that supports the error callbacks thus always keeping the same pci_dev. If there is no driver we give up automatic recovery and are back at the situation without this series. > > Something I was tinkering with before I left IBM was re-working the > way EEH handles recovering devices that don't have a driver with error > handling callbacks to something like: > > 1. unbind the driver > 2. pci_save_state() > 3. do the reset > 4. pci_restore_state() > 5. re-bind the driver > > That would allow keeping the pci_dev around and let me delete a pile > of conf
Re: [PATCH 0/5] s390/pci: automatic error recovery
On Tue, 2021-09-07 at 10:45 +0200, Niklas Schnelle wrote: > On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote: > > On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle > > wrote: > > > Patch 3 I already sent separately resulting in the discussion below but > > > without > > > a final conclusion. > > > > > > https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/ > > > > > > I believe even though there were some doubts about the use of > > > pci_dev_is_added() by arch code the existing uses as well as the use in > > > the > > > final patch of this series warrant this export. > > > > The use of pci_dev_is_added() in arch/powerpc was because in the past > > pci_bus_add_device() could be called before pci_device_add(). That was > > fixed a while ago so It should be safe to remove those calls now. > > Hmm, ok that confirms Bjorns suspicion and explains how it came to be. > I can certainly sent a patch for that. This would then leave only the > existing use in s390 which I added because of a dead lock prevention > and explained here: > https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.ca...@linux.ibm.com/ > > Plus the need to use it in the recovery code of this series. I think in > the EEH code the need for a similar check is alleviated by the checks > in the beginning of > arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially > eeh_slot_presence_check() which checks presence via the hotplug slot. > I guess we could use our own state tracking in a similar way but felt > like pci_dev_is_added() is the more logical choice. Looking into this again, I think we actually can't easily track this state ourselves outside struct pci_dev. The reason for this is that when e.g. arch/s390/pci/pci_sysfs.c:recover_store() removes the struct pci_dev and scans it again the new struct pci_dev re-uses the same struct zpci_dev because from a platform point of view the PCI device was never removed but only disabled and re-enabled. Thus we can only distinguish the stale struct pci_dev by looking at things stored in struct pci_dev itself. That said, I think for the recovery case we might be able to drop the pci_dev_is_added() and rely on pdev->driver != NULL which we check anyway and that should catch any PCI device that was already removed.
Re: [PATCH 0/5] s390/pci: automatic error recovery
On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote: > On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle wrote: > > Patch 3 I already sent separately resulting in the discussion below but > > without > > a final conclusion. > > > > https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/ > > > > I believe even though there were some doubts about the use of > > pci_dev_is_added() by arch code the existing uses as well as the use in the > > final patch of this series warrant this export. > > The use of pci_dev_is_added() in arch/powerpc was because in the past > pci_bus_add_device() could be called before pci_device_add(). That was > fixed a while ago so It should be safe to remove those calls now. Hmm, ok that confirms Bjorns suspicion and explains how it came to be. I can certainly sent a patch for that. This would then leave only the existing use in s390 which I added because of a dead lock prevention and explained here: https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.ca...@linux.ibm.com/ Plus the need to use it in the recovery code of this series. I think in the EEH code the need for a similar check is alleviated by the checks in the beginning of arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially eeh_slot_presence_check() which checks presence via the hotplug slot. I guess we could use our own state tracking in a similar way but felt like pci_dev_is_added() is the more logical choice. > > > Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit > > e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which > > already exported pci_dev_trylock(). In the final patch we make use of > > pci_dev_lock() to wait for any other exclusive uses of the pdev to be > > finished > > before starting recovery. > > Hmm, I noticed the EEH > (arch/powerpc/kernel/eeh_driver.c:eeh_pe_report_edev()) and the > generic PCIe error recovery code (see > drivers/pci/pcie/err.c:report_error_detected()) only call > device_lock() before entering the driver's error handling callbacks. I > wonder if they should be using pci_dev_lock() instead. The only real > difference is that pci_dev_lock() will also block user space from > accessing the device's config space while error recovery is in > progress which seems sensible enough. I agree that sounds reasonable.
Re: [PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h
On Tue, 2021-09-07 at 10:51 +0300, Andy Shevchenko wrote: > On Tue, Sep 7, 2021 at 3:26 AM kernel test robot wrote: > > Hi Niklas, > > > > I love your patch! Yet something to improve: > > > > [auto build test ERROR on s390/features] > > [also build test ERROR on next-20210906] > > [cannot apply to pci/next powerpc/next v5.14] > > [If your patch is applied to the wrong git tree, kindly drop us a note. > > And when submitting patch, we suggest to use '--base' as documented in > > https://git-scm.com/docs/git-format-patch] > > > > url: > > https://github.com/0day-ci/linux/commits/Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309 > > base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git > > features > > config: i386-allyesconfig (attached as .config) > > compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 > > reproduce (this is a W=1 build): > > # > > https://github.com/0day-ci/linux/commit/404ed8c00a612e7ae31c50557c80c6726c464863 > > git remote add linux-review https://github.com/0day-ci/linux > > git fetch --no-tags linux-review > > Niklas-Schnelle/s390-pci-automatic-error-recovery/20210906-175309 > > git checkout 404ed8c00a612e7ae31c50557c80c6726c464863 > > # save the attached .config to linux build tree > > make W=1 ARCH=i386 > > > > If you fix the issue, kindly add following tag as appropriate > > Reported-by: kernel test robot > > > > All errors (new ones prefixed by >>): > > Obviously drivers/pci/pci.h is not only for the above. > > When play with headers always do two test builds: allyesconfig and > allmodconfig. You're right and additionally have to built on some other architectures as well because allyesconfig and allmodconfig both run through fine on s390. I'll look into it but at first glance it looks like I was over reaching removing the include from drivers/pci/hotplug/acpiphp_glue.c in addition it's not even the same kind of awkward relative include from drivers into arch code. Sorry about that. >
Re: [PATCH 0/5] s390/pci: automatic error recovery
On Mon, 2021-09-06 at 21:05 -0500, Linas Vepstas wrote: > On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle > wrote: > > > I believe we might be the first > > implementation of PCI device recovery in a virtualized setting requiring > > us to > > coordinate the device reset with the hypervisor platform by issuing a > > disable > > and re-enable to the platform as well as starting the recovery following > > a platform event. > > > > I recall none of the details, but SRIOV is a standardized system for > sharing a PCI device across multiple virtual machines. It has detailed info > on what the hypervisor must do, and what the local OS instance must do to > accomplish this. Yes and in fact on s390 we make heavy use of SR-IOV. > It's part of the PCI standard, and its more than a decade > old now, maybe two. Being a part of the PCI standard, it was interoperable > with error recovery, to the best of my recollection. Maybe I worded things with a bit too much sensationalism and it might even be that POWER supports error recovery also with virtualization, though I'm not sure how far that goes. I believe you are right in that SR-IOV supports the error recovery, after all this patch set also has to work together with SRIOV enabled devices. At least on s390 though until this patch set the error recovery performed by the hypervisor stopped in the hypervisor. The missing part added by this patch set is coordinating with device drivers in Linux to determine where use of a recovered device can pick up after the PCIe level error recovery is done. As for virtualization this coordination of course needs to cross the hypervisor/guest boundary and at least for KVM+QEMU I know for a fact that reporting a PCI error to the guest is currently just a stub that actually completely stops the guest, so you definitely don't get smooth error recovery there yet. > At the time it was > introduced, it got pushed very aggressively. The x86 hypervisor vendors > were aiming at the heart of zseries, and were militant about it. And yet we're still here, use SR-IOV ourselves and even support Linux + KVM as a hypervisor you can use just the same on a mainframe, an x86, POWER, or ARM system. > > -- Linas >
[PATCH 4/5] PCI: Export pci_dev_lock()
Commit e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") already exported pci_dev_trylock()/pci_dev_unlock() however in some circumstances such as during error recovery it makes sense to block waiting to get full access to the device so also export pci_dev_lock(). Signed-off-by: Niklas Schnelle --- drivers/pci/pci.c | 3 ++- include/linux/pci.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index aacf575c15cf..3f416c6d3b0b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5040,12 +5040,13 @@ static int pci_reset_bus_function(struct pci_dev *dev, int probe) return pci_parent_bus_reset(dev, probe); } -static void pci_dev_lock(struct pci_dev *dev) +void pci_dev_lock(struct pci_dev *dev) { pci_cfg_access_lock(dev); /* block PM suspend, driver probe, etc. */ device_lock(>dev); } +EXPORT_SYMBOL_GPL(pci_dev_lock); /* Return 1 on successful lock, 0 on contention */ int pci_dev_trylock(struct pci_dev *dev) diff --git a/include/linux/pci.h b/include/linux/pci.h index ea0e23dbc8ec..efc78b8d4696 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1639,6 +1639,7 @@ void pci_cfg_access_lock(struct pci_dev *dev); bool pci_cfg_access_trylock(struct pci_dev *dev); void pci_cfg_access_unlock(struct pci_dev *dev); +void pci_dev_lock(struct pci_dev *dev); int pci_dev_trylock(struct pci_dev *dev); void pci_dev_unlock(struct pci_dev *dev); -- 2.25.1
[PATCH 5/5] s390/pci: implement minimal PCI error recovery
When the platform detects an error on a PCI function or a service action has been performed it is put in the error state and an error event notification is provided to the OS. Currently we treat all error event notifications the same and simply set pdev->error_state = pci_channel_io_perm_failure requiring user intervention such as use of the recover attribute to get the device usable again. Despite requiring a manual step this also has the disadvantage that the device is completely torn down and recreated resulting in higher level devices such as a block or network device being recreated. In case of a block device this also means that it may need to be removed and added to a software raid even if that could otherwise survive with a temporary degradation. This is of course not ideal more so since an error notification with PEC 0x3A indicates that the platform already performed error recovery successfully or that the error state was caused by a service action that is now finished. At least in this case we can assume that the error state can be reset and the function made usable again. So as not to have the disadvantage of a full tear down and recreation we need to coordinate this recovery with the driver. Thankfully there is already a well defined recovery flow for this described in Documentation/PCI/pci-error-recovery.rst. The implementation of this is somewhat straight forward and simplified by the fact that our recovery flow is defined per PCI function. As a reset we use the newly introduced zpci_hot_reset_device() which also takes the PCI function out of the error state. Signed-off-by: Niklas Schnelle --- arch/s390/include/asm/pci.h | 4 +- arch/s390/pci/pci.c | 49 ++ arch/s390/pci/pci_event.c | 190 +++- 3 files changed, 241 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 2a2ed165a270..558877aff2e5 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -294,8 +294,10 @@ void zpci_debug_exit(void); void zpci_debug_init_device(struct zpci_dev *, const char *); void zpci_debug_exit_device(struct zpci_dev *); -/* Error reporting */ +/* Error handling */ int zpci_report_error(struct pci_dev *, struct zpci_report_error_header *); +int zpci_clear_error_state(struct zpci_dev *zdev); +int zpci_reset_load_store_blocked(struct zpci_dev *zdev); #ifdef CONFIG_NUMA diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index a6322f45b5bd..77a3e85d43fb 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -954,6 +954,55 @@ int zpci_report_error(struct pci_dev *pdev, } EXPORT_SYMBOL(zpci_report_error); +/** + * zpci_clear_error_state() - Clears the zPCI error state of the device + * @zdev: The zdev for which the zPCI error state should be reset + * + * Clear the zPCI error state of the device. If clearing the zPCI error state + * fails the device is left in the error state. In this case it may make sense + * to call zpci_io_perm_failure() on the associated pdev if it exists. + * + * Returns: 0 on success, -EIO otherwise + */ +int zpci_clear_error_state(struct zpci_dev *zdev) +{ + u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_RESET_ERROR); + struct zpci_fib fib = {0}; + u8 status; + int rc; + + rc = zpci_mod_fc(req, , ); + if (rc) + return -EIO; + + return 0; +} + +/** + * zpci_reset_load_store_blocked() - Re-enables L/S from error state + * @zdev: The zdev for which to unblock load/store access + * + * R-eenables load/store access for a PCI function in the error state while + * keeping DMA blocked. In this state drivers can poke MMIO space to determine + * if error recovery is possible while catching any rogue DMA access from the + * device. + * + * Returns: 0 on success, -EIO otherwise + */ +int zpci_reset_load_store_blocked(struct zpci_dev *zdev) +{ + u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_RESET_BLOCK); + struct zpci_fib fib = {0}; + u8 status; + int rc; + + rc = zpci_mod_fc(req, , ); + if (rc) + return -EIO; + + return 0; +} + static int zpci_mem_init(void) { BUILD_BUG_ON(!is_power_of_2(__alignof__(struct zpci_fmb)) || diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index e868d996ec5b..ac9ed1572d39 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -47,15 +47,190 @@ struct zpci_ccdf_avail { u16 pec;/* PCI event code */ } __packed; +static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res) +{ + switch (ers_res) { + case PCI_ERS_RESULT_CAN_RECOVER: + case PCI_ERS_RESULT_RECOVERED: + case PCI_ERS_RESULT_NEED_RESET: + return false; + default: + return true; + } +} + +static pci_ers_result_t zpci_event_notify_error_detected(struct pci_
[PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in PCI arch code of both s390 and powerpc leading to awkward relative includes. Move it to the global include/linux/pci.h and get rid of these includes just for that one function. Signed-off-by: Niklas Schnelle --- arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- arch/powerpc/platforms/pseries/setup.c | 1 - arch/s390/pci/pci_sysfs.c | 2 -- drivers/pci/hotplug/acpiphp_glue.c | 1 - drivers/pci/pci.h | 15 --- include/linux/pci.h| 15 +++ 6 files changed, 15 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c index 28aac933a439..2e0ca5451e85 100644 --- a/arch/powerpc/platforms/powernv/pci-sriov.c +++ b/arch/powerpc/platforms/powernv/pci-sriov.c @@ -9,9 +9,6 @@ #include "pci.h" -/* for pci_dev_is_added() */ -#include "../../../../drivers/pci/pci.h" - /* * The majority of the complexity in supporting SR-IOV on PowerNV comes from * the need to put the MMIO space for each VF into a separate PE. Internally diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 0dfaa6ab44cc..08e846ae1853 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -74,7 +74,6 @@ #include #include "pseries.h" -#include "../../../../drivers/pci/pci.h" DEFINE_STATIC_KEY_FALSE(shared_processor); EXPORT_SYMBOL(shared_processor); diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 335c281811c7..40733b93a086 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -13,8 +13,6 @@ #include #include -#include "../../../drivers/pci/pci.h" - #include #define zpci_attr(name, fmt, member) \ diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index f031302ad401..4cb963f88183 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -38,7 +38,6 @@ #include #include -#include "../pci.h" #include "acpiphp.h" static LIST_HEAD(bridge_list); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 93dcdd431072..a159cd0f6f05 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) return dev->error_state == pci_channel_io_perm_failure; } -/* pci_dev priv_flags */ -#define PCI_DEV_ADDED 0 -#define PCI_DPC_RECOVERED 1 -#define PCI_DPC_RECOVERING 2 - -static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) -{ - assign_bit(PCI_DEV_ADDED, >priv_flags, added); -} - -static inline bool pci_dev_is_added(const struct pci_dev *dev) -{ - return test_bit(PCI_DEV_ADDED, >priv_flags); -} - #ifdef CONFIG_PCIEAER #include diff --git a/include/linux/pci.h b/include/linux/pci.h index 540b377ca8f6..ea0e23dbc8ec 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -507,6 +507,21 @@ struct pci_dev { unsigned long priv_flags; /* Private flags for the PCI driver */ }; +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 +#define PCI_DPC_RECOVERED 1 +#define PCI_DPC_RECOVERING 2 + +static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) +{ + assign_bit(PCI_DEV_ADDED, >priv_flags, added); +} + +static inline bool pci_dev_is_added(const struct pci_dev *dev) +{ + return test_bit(PCI_DEV_ADDED, >priv_flags); +} + static inline struct pci_dev *pci_physfn(struct pci_dev *dev) { #ifdef CONFIG_PCI_IOV -- 2.25.1
[PATCH 2/5] s390/pci: implement reset_slot for hotplug slot
This is done by adding a zpci_hot_reset_device() call which does a low level reset of the PCI function without changing its higher level function state. This way it can be used while the zPCI function is bound to a driver and with DMA tables being controlled either through the IOMMU or DMA APIs which is prohibited when using zpci_disable_device() as that drop existing DMA translations. As this reset, unlike a normal FLR, also calls zpci_clear_irq() we need to implement arch_restore_msi_irqs() and make sure we re-enable IRQs for the PCI function if they were previously disabled. Signed-off-by: Niklas Schnelle --- arch/s390/include/asm/pci.h| 1 + arch/s390/pci/pci.c| 58 ++ arch/s390/pci/pci_irq.c| 9 + drivers/pci/hotplug/s390_pci_hpc.c | 24 + 4 files changed, 92 insertions(+) diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 5e6cba22a801..2a2ed165a270 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -208,6 +208,7 @@ int zpci_disable_device(struct zpci_dev *); int zpci_scan_configured_device(struct zpci_dev *zdev, u32 fh); int zpci_deconfigure_device(struct zpci_dev *zdev); +int zpci_hot_reset_device(struct zpci_dev *zdev); int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64); int zpci_unregister_ioat(struct zpci_dev *, u8); void zpci_remove_reserved_devices(void); diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index af22778551c1..a6322f45b5bd 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -723,6 +723,64 @@ int zpci_disable_device(struct zpci_dev *zdev) return rc; } +/** + * zpci_hot_reset_device - perform a reset of the given zPCI function + * @zdev: the slot which should be reset + * + * Performs a low level reset of the zPCI function. The reset is low level in + * the sense that the zPCI function can be reset without detaching it from the + * common PCI subsystem. The reset may be performed while under control of + * either DMA or IOMMU APIs in which case the existing DMA/IOMMU translation + * table is reinstated at the end of the reset. + * + * After the reset the functions internal state is reset to an initial state + * equivalent to its state during boot when first probing a driver. + * Consequently after reset the PCI function requires re-initialization via the + * common PCI code including re-enabling IRQs via pci_alloc_irq_vectors() + * and enabling the function via e.g.pci_enablde_device_flags().The caller + * must guard against concurrent reset attempts. + * + * In most cases this function should not be called directly but through + * pci_reset_function() or pci_reset_bus() which handle the save/restore and + * locking. + * + * Return: 0 on success and an error value otherwise + */ +int zpci_hot_reset_device(struct zpci_dev *zdev) +{ + int rc; + + zpci_dbg(3, "reset fid:%x\n", zdev->fid); + if (zdev_enabled(zdev)) { + /* Disables device access, DMAs and IRQs (reset state) */ + rc = zpci_disable_device(zdev); + /* +* Due to a z/VM vs LPAR inconsistency in the error state the +* FH may indicate an enabled device but disable says the +* device is already disabled don't treat it as an error here. +*/ + if (rc == -EINVAL) + rc = 0; + if (rc) + return rc; + } + + rc = zpci_enable_device(zdev); + if (rc) + return rc; + + if (zdev->dma_table) { + rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma, + (u64)zdev->dma_table); + if (rc) + return rc; + } else { + zpci_dma_init_device(zdev); + } + + return 0; +} + /** * zpci_create_device() - Create a new zpci_dev and add it to the zbus * @fid: Function ID of the device to be created diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c index 9c7de9089939..ab98e7f5b79b 100644 --- a/arch/s390/pci/pci_irq.c +++ b/arch/s390/pci/pci_irq.c @@ -391,6 +391,15 @@ void arch_teardown_msi_irqs(struct pci_dev *pdev) airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, zdev->msi_nr_irqs); } +void arch_restore_msi_irqs(struct pci_dev *pdev) +{ + struct zpci_dev *zdev = to_zpci(pdev); + + if (!zdev->irqs_registered) + zpci_set_irq(zdev); + default_restore_msi_irqs(pdev); +} + static struct airq_struct zpci_airq = { .handler = zpci_floating_irq_handler, .isc = PCI_ISC, diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c index 014868752cd4..07f28db0eed5 100644 --- a/drivers/pci/hotplug/s390_pci_hpc.c +++ b/drivers/pci/hotplug/s390_pci_hpc.c @@ -57,6 +57,29 @@ static int disable_slot
[PATCH 0/5] s390/pci: automatic error recovery
Hello, This series implements automatic error recovery for PCI devices on s390 following the scheme outlined at Documentation/PCI/pci-error-recovery.rst it applies on top of currenct master. The patches have are almost completely s390 specific except for two patches exporting existing functionality for use by arch/s390/pci/ code. Nevertheless I would also appreciate any feedback, especially on the last patch, concerning the implementation of the error recovery flow. I believe we might be the first implementation of PCI device recovery in a virtualized setting requiring us to coordinate the device reset with the hypervisor platform by issuing a disable and re-enable to the platform as well as starting the recovery following a platform event. The outline of the patches is as follows: Patch 1 and 2 add s390 specific code implementing a reset mechanism that takes the PCI function out of the platform specific error state. Patches 3 and 4 export existing common code functions for use by the s390 specific recovery code. Patch 3 I already sent separately resulting in the discussion below but without a final conclusion. https://lore.kernel.org/lkml/20210720150145.640727-1-schne...@linux.ibm.com/ I believe even though there were some doubts about the use of pci_dev_is_added() by arch code the existing uses as well as the use in the final patch of this series warrant this export. Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which already exported pci_dev_trylock(). In the final patch we make use of pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished before starting recovery. Finally Patch 5 implements the recovery flow as part of the existing s390 specific PCI availability and error event mechanism. Where previously the error case only set an error indication requiring manual intervention to make the device usable again. Now we handle the case where firmware has already reset a PCI function after an error was encountered informing the OS that it should be ready to be used again. Note that the same event is also issued by the hypervisor if the function was manually taken into a service mode for example for firmware upgrade via the hypervisor and is now ready to be used again. Thanks, Niklas Schnelle Niklas Schnelle (5): s390/pci: refresh function handle in iomap s390/pci: implement reset_slot for hotplug slot PCI: Move pci_dev_is/assign_added() to pci.h PCI: Export pci_dev_lock() s390/pci: implement minimal PCI error recovery arch/powerpc/platforms/powernv/pci-sriov.c | 3 - arch/powerpc/platforms/pseries/setup.c | 1 - arch/s390/include/asm/pci.h| 6 +- arch/s390/pci/pci.c| 143 ++- arch/s390/pci/pci_event.c | 196 - arch/s390/pci/pci_insn.c | 4 +- arch/s390/pci/pci_irq.c| 9 + arch/s390/pci/pci_sysfs.c | 2 - drivers/pci/hotplug/acpiphp_glue.c | 1 - drivers/pci/hotplug/s390_pci_hpc.c | 24 +++ drivers/pci/pci.c | 3 +- drivers/pci/pci.h | 15 -- include/linux/pci.h| 16 ++ 13 files changed, 389 insertions(+), 34 deletions(-) -- 2.25.1
[PATCH 1/5] s390/pci: refresh function handle in iomap
The function handle of a PCI function is updated when disabling or enabling it as well as when the function's availability changes or it enters the error state. Until now this only occurred either while there is no struct pci_dev associated with the function yet or the function became unavailable. This meant that leaving a stale function handle in the iomap either didn't happen because there was no iomap yet or it lead to errors on PCI access but so would the correct disabled function handle. In the future a CLP Set PCI Function Disable/Enable cycle during PCI device recovery may be done while the device is bound to a driver. In this case we must update the iomap associated with the now-stale function handle to ensure that the resulting zPCI instruction references an accurate function handle. Since the function handle is accessed by the PCI accessor helpers without locking use READ_ONCE()/WRITE_ONCE() to mark this access and prevent compiler optimizations that would move the load/store. With that infrastructure in place let's also properly update the function handle in the existing cases. This makes sure that in the future debugging of a zPCI function access through the handle will show an up to date handle reducing the chance of confusion. Also it makes sure we have one single place where a zPCI function handle is updated after initialization. Signed-off-by: Niklas Schnelle --- arch/s390/include/asm/pci.h | 1 + arch/s390/pci/pci.c | 36 arch/s390/pci/pci_event.c | 6 +++--- arch/s390/pci/pci_insn.c| 4 ++-- 4 files changed, 38 insertions(+), 9 deletions(-) diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index e4803ec51110..5e6cba22a801 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -211,6 +211,7 @@ int zpci_deconfigure_device(struct zpci_dev *zdev); int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64); int zpci_unregister_ioat(struct zpci_dev *, u8); void zpci_remove_reserved_devices(void); +void zpci_update_fh(struct zpci_dev *zdev, u32 fh); /* CLP */ int clp_setup_writeback_mio(void); diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index e7e6788d75a8..af22778551c1 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -481,6 +481,34 @@ static void zpci_free_iomap(struct zpci_dev *zdev, int entry) spin_unlock(_iomap_lock); } +static void zpci_do_update_iomap_fh(struct zpci_dev *zdev, u32 fh) +{ + int bar, idx; + + spin_lock(_iomap_lock); + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) { + if (!zdev->bars[bar].size) + continue; + idx = zdev->bars[bar].map_idx; + if (!zpci_iomap_start[idx].count) + continue; + WRITE_ONCE(zpci_iomap_start[idx].fh, zdev->fh); + } + spin_unlock(_iomap_lock); +} + +void zpci_update_fh(struct zpci_dev *zdev, u32 fh) +{ + if (!fh || zdev->fh == fh) + return; + + zdev->fh = fh; + if (zpci_use_mio(zdev)) + return; + if (zdev->has_resources && zdev_enabled(zdev)) + zpci_do_update_iomap_fh(zdev, fh); +} + static struct resource *__alloc_res(struct zpci_dev *zdev, unsigned long start, unsigned long size, unsigned long flags) { @@ -668,7 +696,7 @@ int zpci_enable_device(struct zpci_dev *zdev) if (clp_enable_fh(zdev, , ZPCI_NR_DMA_SPACES)) rc = -EIO; else - zdev->fh = fh; + zpci_update_fh(zdev, fh); return rc; } @@ -679,14 +707,14 @@ int zpci_disable_device(struct zpci_dev *zdev) cc = clp_disable_fh(zdev, ); if (!cc) { - zdev->fh = fh; + zpci_update_fh(zdev, fh); } else if (cc == CLP_RC_SETPCIFN_ALRDY) { pr_info("Disabling PCI function %08x had no effect as it was already disabled\n", zdev->fid); /* Function is already disabled - update handle */ rc = clp_refresh_fh(zdev->fid, ); if (!rc) { - zdev->fh = fh; + zpci_update_fh(zdev, fh); rc = -EINVAL; } } else { @@ -768,7 +796,7 @@ int zpci_scan_configured_device(struct zpci_dev *zdev, u32 fh) { int rc; - zdev->fh = fh; + zpci_update_fh(zdev, fh); /* the PCI function will be scanned once function 0 appears */ if (!zdev->zbus->bus) return 0; diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index c856f80cb21b..e868d996ec5b 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -76,7 +76,7 @@ void zpci_event_error(void *data) static void zpci_event_hard_deconfigured(struct zpci_dev *zdev, u3
Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h
On Wed, 2021-08-25 at 14:04 -0500, Bjorn Helgaas wrote: > On Mon, Aug 23, 2021 at 12:53:39PM +0200, Niklas Schnelle wrote: > > On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote: > > > On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote: > > > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > > > > PCI arch code of both s390 and powerpc leading to awkward relative > > > > includes. Move it to the global include/linux/pci.h and get rid of these > > > > includes just for that one function. > > > > > > I agree the includes are awkward. > > > > > > But the arch code *using* pci_dev_is_added() seems awkward, too. > > > > See below for my interpretation why s390 has some driver like > > functionality in its arch code which isn't necessarily awkward. > > > > Independent from that I have found pci_dev_is_added() as the only way > > deal with the case that one might be looking at a struct pci_dev > > reference that has been removed via pci_stop_and_remove_bus_device() or > > has never been fully scanned. This is quite useful when handling error > > events which on s390 are part of the adapter event mechanism shared > > with channel I/O devices. > > > > > AFAICS, in powerpc, pci_dev_is_added() is only used by > > > pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those > > > are only called from pcibios_add_device(), which is only called from > > > pci_device_add(). > > > > > > Is it even possible for pci_dev_is_added() to be true in that path? > > If the pci_dev_is_added() in powerpc is unreachable, we can remove it > and at least reduce this to an s390-only problem. Ok. I might be missing something but I agree it does look these are called from within pcibios_add_device() only so pci_dev_is_added() can never be true. This looks pretty clear as pci_dev_assign_added() is called after pcibios_add_device() in pci_bus_add_device(). > > > > s390 uses pci_dev_is_added() in recover_store() > > > > I'm actually looking into this as I'm working on an s390 implementation > > of the PCI recovery flow described in Documentation/PCI/pci-error- > > recovery.rst that would also call pci_dev_is_added() because when we > > get a platform notification of a PCI reset done by firmware it may be > > that the struct pci_dev is going away i.e. we still have a ref count > > but it is not added to the PCI bus anymore. And pci_dev_is_added() is > > the only way I've found to check for this state. > > > > > , but I don't know what > > > that is (looks like a sysfs file, but it's not documented) or why s390 > > > is the only arch that does this. > > > > Good point about this not being documented, I'll look into adding docs. > > > > This is a sysfs attribute that basically removes the pci_dev and re- > > adds it. This has the complication that since the attribute sits at > > /sys/bus/pci/devices//recover it deletes its own parent directory > > which requires extra caution and means concurrent accesses block on > > pci_lock_rescan_remove() instead of a kernfs lock. > > Long story short when concurrently triggering the attribute one thread > > proceeds into the pci_lock_rescan_remove() section and does the > > removal, while others would block on pci_lock_rescan_remove(). Now when > > the threads unblock the removal is done. In this case there is a new > > struct pci_dev found in the rescan but the previously blocked threads > > still have references to the old struct pci_dev which was removed and > > as far as I could tell can only be distinguished by checking > > pci_dev_is_added(). > > Is this locking issue different from concurrently writing to > /sys/.../remove on other architectures? In principle it is very similar except that we re-scan and thus the removed pdev may co-exist with a new pdev for the same actual device if there are other references to the pdev. There is however also a significant difference in locking that fixes a possible deadlock that I just confirmed also affects /sys/../remove where it is hidden by a lockdep ignore, see lockdep splash below. For /sys/../recover this was fixed by my commit dd712f0a53ca ("s390/pci: Fix possible deadlock in recover_store()") which also introduced the need for pci_dev_is_added(). The change in the above commit is very similar to what is done in drivers/scsi/scsi_sysfs.c:sdev_store_delete() which also has a self deletion and in commit 0ee223b2e1f6 ("scsi: core: Avoid that SCSI device removal through sysfs triggers a deadlock") fixed a very very similar lock order p
Re: [PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h
On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote: > On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote: > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > > PCI arch code of both s390 and powerpc leading to awkward relative > > includes. Move it to the global include/linux/pci.h and get rid of these > > includes just for that one function. > > I agree the includes are awkward. > > But the arch code *using* pci_dev_is_added() seems awkward, too. See below for my interpretation why s390 has some driver like functionality in its arch code which isn't necessarily awkward. Independent from that I have found pci_dev_is_added() as the only way deal with the case that one might be looking at a struct pci_dev reference that has been removed via pci_stop_and_remove_bus_device() or has never been fully scanned. This is quite useful when handling error events which on s390 are part of the adapter event mechanism shared with channel I/O devices. > > AFAICS, in powerpc, pci_dev_is_added() is only used by > pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those > are only called from pcibios_add_device(), which is only called from > pci_device_add(). > > Is it even possible for pci_dev_is_added() to be true in that path? > > s390 uses pci_dev_is_added() in recover_store() I'm actually looking into this as I'm working on an s390 implementation of the PCI recovery flow described in Documentation/PCI/pci-error- recovery.rst that would also call pci_dev_is_added() because when we get a platform notification of a PCI reset done by firmware it may be that the sturct pci_dev is going away i.e. we still have a ref count but it is not added to the PCI bus anymore. And pci_dev_is_added() is the only way I've found to check for this state. > , but I don't know what > that is (looks like a sysfs file, but it's not documented) or why s390 > is the only arch that does this. Good point about this not being documented, I'll look into adding docs. This is a sysfs attribute that basically removes the pci_dev and re- adds it. This has the complication that since the attribute sits at /sys/bus/pci/devices//recover it deletes its own parent directory which requires extra caution and means concurrent accesses block on pci_lock_rescan_remove() instead of a kernfs lock. Long story short when concurrently triggering the attribute one thread proceeds into the pci_lock_rescan_remove() section and does the removal, while others would block on pci_lock_rescan_remove(). Now when the threads unblock the removal is done. In this case there is a new struct pci_dev found in the rescan but the previously blocked threads still have references to the old struct pci_dev which was removed and as far as I could tell can only be distinguihsed by checking pci_dev_is_added(). > > Maybe we should make powerpc and s390 less special? On s390, as I see it, the reason for this is that all of the PCI functionality is directly defined in the Architecture as special CPU instructions which are kind of hypercalls but also an ISA extension. These instructions range from the basic PCI memory accesses (no real MMIO) to enumeration of the devices and on to reporting of hot-plug and and resets/recovery events. Importantly we do not have any kind of direct access to a real or virtual PCI controller and the architecture has no concept of a comparable entity. So in my opinion while there is some of the functionality of a PCI controller in arch/s390/pci the cut off between controller functionality and arch support isn't clear at all and exposing PCI support as CPU instructions doesn't map well to the controller concept. That said, in principle I'm open to moving some of that into drivers/pci/controller/ if you think that would improve things and we can find a good argument what should go where. One possible cut off would be to have arch/s390/pci/ provide wrappers to the PCI instructions but move all their uses to e.g. drivers/pci/controller/s390/. This would of course be a major refactoring and none of that code would be useful on any other architecture but it would move a lot the accesses to PCI common code functionality out of the arch code. > > > Signed-off-by: Niklas Schnelle > > --- > > Since v1 (and bad v2): > > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING > > defines and also move these to include/linux/pci.h > > > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- > > arch/powerpc/platforms/pseries/setup.c | 1 - > > arch/s390/pci/pci_sysfs.c | 2 -- > > drivers/pci/hotplug/acpiphp_glue.c | 1 - > > drivers/pci/pci.h | 15 --- > > include/linux/pci.h| 15 +++ > > 6 files changed, 15 insert
Re: [PATCH v2 13/21] s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR
On Fri, 2021-07-23 at 11:50 -0600, Logan Gunthorpe wrote: > Setting the ->dma_address to DMA_MAPPING_ERROR is not part of > the ->map_sg calling convention, so remove it. > > Link: https://lore.kernel.org/linux-mips/20210716063241.gc13...@lst.de/ > Suggested-by: Christoph Hellwig > Signed-off-by: Logan Gunthorpe > Cc: Niklas Schnelle > Cc: Gerald Schaefer > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Christian Borntraeger > --- > arch/s390/pci/pci_dma.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c > index c78b02012764..be48e5b5bfcf 100644 > --- a/arch/s390/pci/pci_dma.c > +++ b/arch/s390/pci/pci_dma.c > @@ -492,7 +492,6 @@ static int s390_dma_map_sg(struct device *dev, struct > scatterlist *sg, > for (i = 1; i < nr_elements; i++) { > s = sg_next(s); > > - s->dma_address = DMA_MAPPING_ERROR; > s->dma_length = 0; > > if (s->offset || (size & ~PAGE_MASK) || Acked-by: Niklas Schnelle Thanks!
[PATCH v3] PCI: Move pci_dev_is/assign_added() to pci.h
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in PCI arch code of both s390 and powerpc leading to awkward relative includes. Move it to the global include/linux/pci.h and get rid of these includes just for that one function. Signed-off-by: Niklas Schnelle --- Since v1 (and bad v2): - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING defines and also move these to include/linux/pci.h arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- arch/powerpc/platforms/pseries/setup.c | 1 - arch/s390/pci/pci_sysfs.c | 2 -- drivers/pci/hotplug/acpiphp_glue.c | 1 - drivers/pci/pci.h | 15 --- include/linux/pci.h| 15 +++ 6 files changed, 15 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c index 28aac933a439..2e0ca5451e85 100644 --- a/arch/powerpc/platforms/powernv/pci-sriov.c +++ b/arch/powerpc/platforms/powernv/pci-sriov.c @@ -9,9 +9,6 @@ #include "pci.h" -/* for pci_dev_is_added() */ -#include "../../../../drivers/pci/pci.h" - /* * The majority of the complexity in supporting SR-IOV on PowerNV comes from * the need to put the MMIO space for each VF into a separate PE. Internally diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 631a0d57b6cd..17585ec9f955 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -74,7 +74,6 @@ #include #include "pseries.h" -#include "../../../../drivers/pci/pci.h" DEFINE_STATIC_KEY_FALSE(shared_processor); EXPORT_SYMBOL_GPL(shared_processor); diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 6e2450c2b9c1..8dbe54ef8f8e 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -13,8 +13,6 @@ #include #include -#include "../../../drivers/pci/pci.h" - #include #define zpci_attr(name, fmt, member) \ diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index f031302ad401..4cb963f88183 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -38,7 +38,6 @@ #include #include -#include "../pci.h" #include "acpiphp.h" static LIST_HEAD(bridge_list); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 93dcdd431072..a159cd0f6f05 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) return dev->error_state == pci_channel_io_perm_failure; } -/* pci_dev priv_flags */ -#define PCI_DEV_ADDED 0 -#define PCI_DPC_RECOVERED 1 -#define PCI_DPC_RECOVERING 2 - -static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) -{ - assign_bit(PCI_DEV_ADDED, >priv_flags, added); -} - -static inline bool pci_dev_is_added(const struct pci_dev *dev) -{ - return test_bit(PCI_DEV_ADDED, >priv_flags); -} - #ifdef CONFIG_PCIEAER #include diff --git a/include/linux/pci.h b/include/linux/pci.h index 540b377ca8f6..ea0e23dbc8ec 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -507,6 +507,21 @@ struct pci_dev { unsigned long priv_flags; /* Private flags for the PCI driver */ }; +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 +#define PCI_DPC_RECOVERED 1 +#define PCI_DPC_RECOVERING 2 + +static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) +{ + assign_bit(PCI_DEV_ADDED, >priv_flags, added); +} + +static inline bool pci_dev_is_added(const struct pci_dev *dev) +{ + return test_bit(PCI_DEV_ADDED, >priv_flags); +} + static inline struct pci_dev *pci_physfn(struct pci_dev *dev) { #ifdef CONFIG_PCI_IOV -- 2.25.1
Re: [PATCH v2] PCI: Move pci_dev_is/assign_added() to pci.h
On Tue, 2021-07-20 at 11:58 +0200, Niklas Schnelle wrote: > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > PCI arch code of both s390 and powerpc leading to awkward relative > includes. Move it to the global include/linux/pci.h and get rid of these > includes just for that one function. > > Signed-off-by: Niklas Schnelle > --- > Since v1: > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING > defines and also move these to include/linux/pci.h Please disregard I actually sent the old patch ;-( > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- > arch/powerpc/platforms/pseries/setup.c | 1 - > arch/s390/pci/pci_sysfs.c | 2 -- > drivers/pci/hotplug/acpiphp_glue.c | 1 - > drivers/pci/pci.h | 15 --- > include/linux/pci.h| 13 + > 6 files changed, 13 insertions(+), 22 deletions(-) > > ... snip ..
[PATCH v2] PCI: Move pci_dev_is/assign_added() to pci.h
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in PCI arch code of both s390 and powerpc leading to awkward relative includes. Move it to the global include/linux/pci.h and get rid of these includes just for that one function. Signed-off-by: Niklas Schnelle --- Since v1: - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING defines and also move these to include/linux/pci.h arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- arch/powerpc/platforms/pseries/setup.c | 1 - arch/s390/pci/pci_sysfs.c | 2 -- drivers/pci/hotplug/acpiphp_glue.c | 1 - drivers/pci/pci.h | 15 --- include/linux/pci.h| 13 + 6 files changed, 13 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c index 28aac933a439..2e0ca5451e85 100644 --- a/arch/powerpc/platforms/powernv/pci-sriov.c +++ b/arch/powerpc/platforms/powernv/pci-sriov.c @@ -9,9 +9,6 @@ #include "pci.h" -/* for pci_dev_is_added() */ -#include "../../../../drivers/pci/pci.h" - /* * The majority of the complexity in supporting SR-IOV on PowerNV comes from * the need to put the MMIO space for each VF into a separate PE. Internally diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 631a0d57b6cd..17585ec9f955 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -74,7 +74,6 @@ #include #include "pseries.h" -#include "../../../../drivers/pci/pci.h" DEFINE_STATIC_KEY_FALSE(shared_processor); EXPORT_SYMBOL_GPL(shared_processor); diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 6e2450c2b9c1..8dbe54ef8f8e 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -13,8 +13,6 @@ #include #include -#include "../../../drivers/pci/pci.h" - #include #define zpci_attr(name, fmt, member) \ diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index f031302ad401..4cb963f88183 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -38,7 +38,6 @@ #include #include -#include "../pci.h" #include "acpiphp.h" static LIST_HEAD(bridge_list); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 93dcdd431072..a159cd0f6f05 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) return dev->error_state == pci_channel_io_perm_failure; } -/* pci_dev priv_flags */ -#define PCI_DEV_ADDED 0 -#define PCI_DPC_RECOVERED 1 -#define PCI_DPC_RECOVERING 2 - -static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) -{ - assign_bit(PCI_DEV_ADDED, >priv_flags, added); -} - -static inline bool pci_dev_is_added(const struct pci_dev *dev) -{ - return test_bit(PCI_DEV_ADDED, >priv_flags); -} - #ifdef CONFIG_PCIEAER #include diff --git a/include/linux/pci.h b/include/linux/pci.h index 540b377ca8f6..b3b7bafa17e5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -507,6 +507,19 @@ struct pci_dev { unsigned long priv_flags; /* Private flags for the PCI driver */ }; +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 + +static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) +{ + assign_bit(PCI_DEV_ADDED, >priv_flags, added); +} + +static inline bool pci_dev_is_added(const struct pci_dev *dev) +{ + return test_bit(PCI_DEV_ADDED, >priv_flags); +} + static inline struct pci_dev *pci_physfn(struct pci_dev *dev) { #ifdef CONFIG_PCI_IOV -- 2.25.1
Re: [PATCH] PCI: Move pci_dev_is/assign_added() to pci.h
On Tue, 2021-07-20 at 10:34 +0200, Niklas Schnelle wrote: > On Mon, 2021-07-19 at 14:11 +0200, Niklas Schnelle wrote: > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > > PCI arch code of both s390 and powerpc leading to awkward relative > > includes. Move it to the global include/linux/pci.h and get rid of these > > includes just for that one function. > > > > Signed-off-by: Niklas Schnelle > > --- > > > ... snip ... > > > > static LIST_HEAD(bridge_list); > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > > index 93dcdd431072..a159cd0f6f05 100644 > > --- a/drivers/pci/pci.h > > +++ b/drivers/pci/pci.h > > @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const > > struct pci_dev *dev) > > return dev->error_state == pci_channel_io_perm_failure; > > } > > > > -/* pci_dev priv_flags */ > > -#define PCI_DEV_ADDED 0 > > -#define PCI_DPC_RECOVERED 1 > > -#define PCI_DPC_RECOVERING 2 > > Sorry, the above two PCI_DPC_* lines should remain in drivers/pci/pci.h > I messed this up on rebasing to v5.14-rc1 and it didn't lead to > problems on either s390x defconfig, nor pp64_defconfig but breaks ppc > allyesconfig. Will resend a fixed version. Ok looking at this more closely, maybe it would also make sense to move the PCI_DPC_* flags also to include/linux/pci.h that would also put them in direct proximity of the priv_flags field decleration itself which I think also improves maintainability. > > > .. snip .. >
Re: [PATCH] PCI: Move pci_dev_is/assign_added() to pci.h
On Mon, 2021-07-19 at 14:11 +0200, Niklas Schnelle wrote: > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > PCI arch code of both s390 and powerpc leading to awkward relative > includes. Move it to the global include/linux/pci.h and get rid of these > includes just for that one function. > > Signed-off-by: Niklas Schnelle > --- > ... snip ... > > static LIST_HEAD(bridge_list); > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 93dcdd431072..a159cd0f6f05 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct > pci_dev *dev) > return dev->error_state == pci_channel_io_perm_failure; > } > > -/* pci_dev priv_flags */ > -#define PCI_DEV_ADDED 0 > -#define PCI_DPC_RECOVERED 1 > -#define PCI_DPC_RECOVERING 2 Sorry, the above two PCI_DPC_* lines should remain in drivers/pci/pci.h I messed this up on rebasing to v5.14-rc1 and it didn't lead to problems on either s390x defconfig, nor pp64_defconfig but breaks ppc allyesconfig. Will resend a fixed version. > .. snip ..
[PATCH] PCI: Move pci_dev_is/assign_added() to pci.h
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in PCI arch code of both s390 and powerpc leading to awkward relative includes. Move it to the global include/linux/pci.h and get rid of these includes just for that one function. Signed-off-by: Niklas Schnelle --- arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- arch/powerpc/platforms/pseries/setup.c | 1 - arch/s390/pci/pci_sysfs.c | 2 -- drivers/pci/hotplug/acpiphp_glue.c | 1 - drivers/pci/pci.h | 15 --- include/linux/pci.h| 13 + 6 files changed, 13 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c index 28aac933a439..2e0ca5451e85 100644 --- a/arch/powerpc/platforms/powernv/pci-sriov.c +++ b/arch/powerpc/platforms/powernv/pci-sriov.c @@ -9,9 +9,6 @@ #include "pci.h" -/* for pci_dev_is_added() */ -#include "../../../../drivers/pci/pci.h" - /* * The majority of the complexity in supporting SR-IOV on PowerNV comes from * the need to put the MMIO space for each VF into a separate PE. Internally diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 631a0d57b6cd..17585ec9f955 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -74,7 +74,6 @@ #include #include "pseries.h" -#include "../../../../drivers/pci/pci.h" DEFINE_STATIC_KEY_FALSE(shared_processor); EXPORT_SYMBOL_GPL(shared_processor); diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 6e2450c2b9c1..8dbe54ef8f8e 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -13,8 +13,6 @@ #include #include -#include "../../../drivers/pci/pci.h" - #include #define zpci_attr(name, fmt, member) \ diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index f031302ad401..4cb963f88183 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -38,7 +38,6 @@ #include #include -#include "../pci.h" #include "acpiphp.h" static LIST_HEAD(bridge_list); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 93dcdd431072..a159cd0f6f05 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) return dev->error_state == pci_channel_io_perm_failure; } -/* pci_dev priv_flags */ -#define PCI_DEV_ADDED 0 -#define PCI_DPC_RECOVERED 1 -#define PCI_DPC_RECOVERING 2 - -static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) -{ - assign_bit(PCI_DEV_ADDED, >priv_flags, added); -} - -static inline bool pci_dev_is_added(const struct pci_dev *dev) -{ - return test_bit(PCI_DEV_ADDED, >priv_flags); -} - #ifdef CONFIG_PCIEAER #include diff --git a/include/linux/pci.h b/include/linux/pci.h index 540b377ca8f6..b3b7bafa17e5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -507,6 +507,19 @@ struct pci_dev { unsigned long priv_flags; /* Private flags for the PCI driver */ }; +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 + +static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) +{ + assign_bit(PCI_DEV_ADDED, >priv_flags, added); +} + +static inline bool pci_dev_is_added(const struct pci_dev *dev) +{ + return test_bit(PCI_DEV_ADDED, >priv_flags); +} + static inline struct pci_dev *pci_physfn(struct pci_dev *dev) { #ifdef CONFIG_PCI_IOV -- 2.25.1
Re: [PATCH v1 10/16] s390/pci: return error code from s390_dma_map_sg()
On Thu, 2021-07-15 at 10:45 -0600, Logan Gunthorpe wrote: > From: Martin Oliveira > > The .map_sg() op now expects an error code instead of zero on failure. > > So propagate the error from __s390_dma_map_sg() up. > > Signed-off-by: Martin Oliveira > Signed-off-by: Logan Gunthorpe > Cc: Niklas Schnelle > Cc: Gerald Schaefer > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Christian Borntraeger > --- > arch/s390/pci/pci_dma.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c > index ebc9a49523aa..c78b02012764 100644 > --- a/arch/s390/pci/pci_dma.c > +++ b/arch/s390/pci/pci_dma.c > @@ -487,7 +487,7 @@ static int s390_dma_map_sg(struct device *dev, struct > scatterlist *sg, > unsigned int max = dma_get_max_seg_size(dev); > unsigned int size = s->offset + s->length; > unsigned int offset = s->offset; > - int count = 0, i; > + int count = 0, i, ret; > > for (i = 1; i < nr_elements; i++) { > s = sg_next(s); > @@ -497,8 +497,9 @@ static int s390_dma_map_sg(struct device *dev, struct > scatterlist *sg, > > if (s->offset || (size & ~PAGE_MASK) || > size + s->length > max) { > - if (__s390_dma_map_sg(dev, start, size, > - >dma_address, dir)) > + ret = __s390_dma_map_sg(dev, start, size, > + >dma_address, dir); > + if (ret) > goto unmap; > > dma->dma_address += offset; > @@ -511,7 +512,8 @@ static int s390_dma_map_sg(struct device *dev, struct > scatterlist *sg, > } > size += s->length; > } > - if (__s390_dma_map_sg(dev, start, size, >dma_address, dir)) > + ret = __s390_dma_map_sg(dev, start, size, >dma_address, dir); > + if (ret) > goto unmap; > > dma->dma_address += offset; > @@ -523,7 +525,7 @@ static int s390_dma_map_sg(struct device *dev, struct > scatterlist *sg, > s390_dma_unmap_pages(dev, sg_dma_address(s), sg_dma_len(s), >dir, attrs); > > - return 0; > + return ret; > } > > static void s390_dma_unmap_sg(struct device *dev, struct scatterlist *sg, So the error codes we return are -ENOMEM if allocating a DMA translation entry fails and -EINVAL if the DMA translation table hasn't been initialized or the caller tries to map 0 sized memory. Are these error codes that you would expect? If yes then this change looks good to me. Acked-by: Niklas Schnelle
Re: [PATCH 27/30] scm_blk: use blk_mq_alloc_disk and blk_cleanup_disk
On Wed, 2021-06-02 at 09:53 +0300, Christoph Hellwig wrote: > Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and > request_queue allocation. > > Signed-off-by: Christoph Hellwig > --- > drivers/s390/block/scm_blk.c | 21 ++--- > 1 file changed, 6 insertions(+), 15 deletions(-) > > diff --git a/drivers/s390/block/scm_blk.c b/drivers/s390/block/scm_blk.c > index a4f6f2e62b1d..88cba6212ee2 100644 > --- a/drivers/s390/block/scm_blk.c > +++ b/drivers/s390/block/scm_blk.c > @@ -462,12 +462,12 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct > scm_device *scmdev) > if (ret) > goto out; > > - rq = blk_mq_init_queue(>tag_set); > - if (IS_ERR(rq)) { > - ret = PTR_ERR(rq); > + bdev->gendisk = blk_mq_alloc_disk(>tag_set, scmdev); > + if (IS_ERR(bdev->gendisk)) { > + ret = PTR_ERR(bdev->gendisk); > goto out_tag; > } > - bdev->rq = rq; > + rq = bdev->rq = bdev->gendisk->queue; > nr_max_blk = min(scmdev->nr_max_block, >(unsigned int) (PAGE_SIZE / sizeof(struct aidaw))); > > @@ -477,17 +477,11 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct > scm_device *scmdev) > blk_queue_flag_set(QUEUE_FLAG_NONROT, rq); > blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, rq); > > - bdev->gendisk = alloc_disk(SCM_NR_PARTS); > - if (!bdev->gendisk) { > - ret = -ENOMEM; > - goto out_queue; > - } > - rq->queuedata = scmdev; > bdev->gendisk->private_data = scmdev; > bdev->gendisk->fops = _blk_devops; > - bdev->gendisk->queue = rq; > bdev->gendisk->major = scm_major; > bdev->gendisk->first_minor = devindex * SCM_NR_PARTS; > + bdev->gendisk->minors = SCM_NR_PARTS; > > len = snprintf(bdev->gendisk->disk_name, DISK_NAME_LEN, "scm"); > if (devindex > 25) { > @@ -504,8 +498,6 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct > scm_device *scmdev) > device_add_disk(>dev, bdev->gendisk, NULL); > return 0; > > -out_queue: > - blk_cleanup_queue(rq); > out_tag: > blk_mq_free_tag_set(>tag_set); > out: > @@ -516,9 +508,8 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct > scm_device *scmdev) > void scm_blk_dev_cleanup(struct scm_blk_dev *bdev) > { > del_gendisk(bdev->gendisk); > - blk_cleanup_queue(bdev->gendisk->queue); > + blk_cleanup_disk(bdev->gendisk); > blk_mq_free_tag_set(>tag_set); > - put_disk(bdev->gendisk); > } > > void scm_blk_set_available(struct scm_blk_dev *bdev) Not an expert on SCM or this code but I gave this a quick test and it seems to work fine. Tested-by: Niklas Schnelle
Re: [PATCH 0/2] dma-mapping: update default segment_boundary_mask
On 9/2/20 12:16 AM, Nicolin Chen wrote: > These two patches are to update default segment_boundary_mask. > > PATCH-1 fixes overflow issues in callers of dma_get_seg_boundary. > Previous version was a series: https://lkml.org/lkml/2020/8/31/1026 > > Then PATCH-2 sets default segment_boundary_mask to ULONG_MAX. > > Nicolin Chen (2): > dma-mapping: introduce dma_get_seg_boundary_nr_pages() > dma-mapping: set default segment_boundary_mask to ULONG_MAX I gave both of your patches a quick test ride on a couple of dev mainframes, both NVMe, ConnectX and virtio-pci devices all seems to work fine. I already commented on Christoph's mail that I like the helper approach, so as for s390 you can add my Acked-by: Niklas Schnelle > > arch/alpha/kernel/pci_iommu.c| 7 +-- > arch/ia64/hp/common/sba_iommu.c | 3 +-- > arch/powerpc/kernel/iommu.c | 9 ++--- > arch/s390/pci/pci_dma.c | 6 ++ > arch/sparc/kernel/iommu-common.c | 10 +++--- > arch/sparc/kernel/iommu.c| 3 +-- > arch/sparc/kernel/pci_sun4v.c| 3 +-- > arch/x86/kernel/amd_gart_64.c| 3 +-- > drivers/parisc/ccio-dma.c| 3 +-- > drivers/parisc/sba_iommu.c | 3 +-- > include/linux/dma-mapping.h | 21 - > 11 files changed, 34 insertions(+), 37 deletions(-) >
Re: [RFT][PATCH 0/7] Avoid overflow at boundary_size
On 8/21/20 1:19 AM, Nicolin Chen wrote: > We are expending the default DMA segmentation boundary to its > possible maximum value (ULONG_MAX) to indicate that a device > doesn't specify a boundary limit. So all dma_get_seg_boundary > callers should take a precaution with the return values since > it would easily get overflowed. > > I scanned the entire kernel tree for all the existing callers > and found that most of callers may get overflowed in two ways: > either "+ 1" or passing it to ALIGN() that does "+ mask". > > According to kernel defines: > #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) > #define ALIGN(x, a) ALIGN_MASK(x, (typeof(x))(a) - 1) > > We can simplify the logic here: > ALIGN(boundary + 1, 1 << shift) >> shift > = ALIGN_MASK(b + 1, (1 << s) - 1) >> s > = {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s > = [b + 1 + (1 << s) - 1] >> s > = [b + (1 << s)] >> s > = (b >> s) + 1 > > So this series of patches fix the potential overflow with this > overflow-free shortcut. Hi Nicolin, haven't seen any other feedback from other maintainers, so I guess you will resend this? On first glance it seems to make sense. I'm a little confused why it is only a "potential overflow" while this part "We are expending the default DMA segmentation boundary to its possible maximum value (ULONG_MAX) to indicate that a device doesn't specify a boundary limit" sounds to me like ULONG_MAX is actually used, does that mean there are currently no devices which do not specify a boundary limit? > > As I don't think that I have these platforms, marking RFT. > > Thanks > Nic > > Nicolin Chen (7): > powerpc/iommu: Avoid overflow at boundary_size > alpha: Avoid overflow at boundary_size > ia64/sba_iommu: Avoid overflow at boundary_size > s390/pci_dma: Avoid overflow at boundary_size > sparc: Avoid overflow at boundary_size > x86/amd_gart: Avoid overflow at boundary_size > parisc: Avoid overflow at boundary_size > > arch/alpha/kernel/pci_iommu.c| 10 -- > arch/ia64/hp/common/sba_iommu.c | 4 ++-- > arch/powerpc/kernel/iommu.c | 11 +-- > arch/s390/pci/pci_dma.c | 4 ++-- > arch/sparc/kernel/iommu-common.c | 9 +++-- > arch/sparc/kernel/iommu.c| 4 ++-- > arch/sparc/kernel/pci_sun4v.c| 4 ++-- > arch/x86/kernel/amd_gart_64.c| 4 ++-- > drivers/parisc/ccio-dma.c| 4 ++-- > drivers/parisc/sba_iommu.c | 4 ++-- > 10 files changed, 26 insertions(+), 32 deletions(-) >