[PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules
This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules (descriptions taken from Kconfig file) Signed-off-by: Mamatha Inamdar --- drivers/pci/hotplug/rpadlpar_core.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c index f979b70..bac65ed 100644 --- a/drivers/pci/hotplug/rpadlpar_core.c +++ b/drivers/pci/hotplug/rpadlpar_core.c @@ -478,3 +478,4 @@ static void __exit rpadlpar_io_exit(void) module_init(rpadlpar_io_init); module_exit(rpadlpar_io_exit); MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("RPA Dynamic Logical Partitioning driver for I/O slots");
Re: [PATCH] cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
* Daniel Axtens [2020-09-24 12:35:05]: > Hi Srikar, > > > This looks a lot like commit d95fe371ecd2 ("cpufreq: powernv: Fix > frame-size-overflow in powernv_cpufreq_work_fn"). > Yes, very very similar. > As with that patch, I have checked for matching puts/gets and that all > uses of '&' check out. > > I tried to look at the snowpatch tests: they apparently reported a > checkpatch warning but the file has since disappeared so I can't see > what it was. Running checkpatch locally: > > $ scripts/checkpatch.pl -g HEAD -strict > WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per > line) > #15: > make[3]: *** [./scripts/Makefile.build:316: > drivers/cpufreq/powernv-cpufreq.o] Error 1 > > This is benign and you shouldn't wrap that line anyway. > > On that basis: > > Reviewed-by: Daniel Axtens > Thanks Daniel. > Kind regards, > Daniel > -- Thanks and Regards Srikar Dronamraju
Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
On 23/09/2020 17:06, Cédric Le Goater wrote: > On 9/23/20 2:33 AM, Qian Cai wrote: >> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote: >>> When a passthrough IO adapter is removed from a pseries machine using >>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the >>> guest OS to clear all page table entries related to the adapter. If >>> some are still present, the RTAS call which isolates the PCI slot >>> returns error 9001 "valid outstanding translations" and the removal of >>> the IO adapter fails. This is because when the PHBs are scanned, Linux >>> maps automatically the INTx interrupts in the Linux interrupt number >>> space but these are never removed. >>> >>> To solve this problem, we introduce a PPC platform specific >>> pcibios_remove_bus() routine which clears all interrupt mappings when >>> the bus is removed. This also clears the associated page table entries >>> of the ESB pages when using XIVE. >>> >>> For this purpose, we record the logical interrupt numbers of the >>> mapped interrupt under the PHB structure and let pcibios_remove_bus() >>> do the clean up. >>> >>> Since some PCI adapters, like GPUs, use the "interrupt-map" property >>> to describe interrupt mappings other than the legacy INTx interrupts, >>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The >>> number of interrupt mappings is computed from the "interrupt-map" >>> property and the mapping array is allocated accordingly. >>> >>> Cc: "Oliver O'Halloran" >>> Cc: Alexey Kardashevskiy >>> Signed-off-by: Cédric Le Goater >> >> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed >> to >> this patch. >> >> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config > > OK. The patch is missing a NULL assignement after kfree() and that > might be the issue. > > I did try PHB removal under PowerNV, so I would like to understand > how we managed to remove twice the PCI bus and possibly reproduce. > Any chance we could grab what the syscall fuzzer (syzkaller) did ? My guess would be it is doing this in parallel to provoke races. -- Alexey
[PATCH] powerpc: switch 85xx defconfigs from legacy ide to libata
Switch the 85xx defconfigs from the soon to be removed legacy ide driver to libata. Signed-off-by: Christoph Hellwig --- arch/powerpc/configs/85xx/mpc85xx_cds_defconfig | 6 +++--- arch/powerpc/configs/85xx/tqm8540_defconfig | 6 +++--- arch/powerpc/configs/85xx/tqm8541_defconfig | 6 +++--- arch/powerpc/configs/85xx/tqm8555_defconfig | 6 +++--- arch/powerpc/configs/85xx/tqm8560_defconfig | 6 +++--- 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/configs/85xx/mpc85xx_cds_defconfig b/arch/powerpc/configs/85xx/mpc85xx_cds_defconfig index 0683d8c292a89b..cea72e85ed2614 100644 --- a/arch/powerpc/configs/85xx/mpc85xx_cds_defconfig +++ b/arch/powerpc/configs/85xx/mpc85xx_cds_defconfig @@ -29,9 +29,9 @@ CONFIG_SYN_COOKIES=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=32768 -CONFIG_IDE=y -CONFIG_BLK_DEV_GENERIC=y -CONFIG_BLK_DEV_VIA82CXXX=y +CONFIG_ATA=y +CONFIG_ATA_GENERIC=y +CONFIG_PATA_VIA=y CONFIG_NETDEVICES=y CONFIG_GIANFAR=y CONFIG_E1000=y diff --git a/arch/powerpc/configs/85xx/tqm8540_defconfig b/arch/powerpc/configs/85xx/tqm8540_defconfig index 98982a0e82d804..bbf040aa1f9aa7 100644 --- a/arch/powerpc/configs/85xx/tqm8540_defconfig +++ b/arch/powerpc/configs/85xx/tqm8540_defconfig @@ -30,9 +30,9 @@ CONFIG_MTD_CFI_AMDSTD=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=32768 -CONFIG_IDE=y -CONFIG_BLK_DEV_GENERIC=y -CONFIG_BLK_DEV_VIA82CXXX=y +CONFIG_ATA=y +CONFIG_ATA_GENERIC=y +CONFIG_PATA_VIA=y CONFIG_NETDEVICES=y CONFIG_GIANFAR=y CONFIG_E100=y diff --git a/arch/powerpc/configs/85xx/tqm8541_defconfig b/arch/powerpc/configs/85xx/tqm8541_defconfig index a6e21db1dafe3a..523ad8dcfd9d08 100644 --- a/arch/powerpc/configs/85xx/tqm8541_defconfig +++ b/arch/powerpc/configs/85xx/tqm8541_defconfig @@ -30,9 +30,9 @@ CONFIG_MTD_CFI_AMDSTD=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=32768 -CONFIG_IDE=y -CONFIG_BLK_DEV_GENERIC=y -CONFIG_BLK_DEV_VIA82CXXX=y +CONFIG_ATA=y +CONFIG_ATA_GENERIC=y +CONFIG_PATA_VIA=y CONFIG_NETDEVICES=y CONFIG_GIANFAR=y CONFIG_E100=y diff --git a/arch/powerpc/configs/85xx/tqm8555_defconfig b/arch/powerpc/configs/85xx/tqm8555_defconfig index ca1de3979474ba..0032ce1e8c9cbf 100644 --- a/arch/powerpc/configs/85xx/tqm8555_defconfig +++ b/arch/powerpc/configs/85xx/tqm8555_defconfig @@ -30,9 +30,9 @@ CONFIG_MTD_CFI_AMDSTD=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=32768 -CONFIG_IDE=y -CONFIG_BLK_DEV_GENERIC=y -CONFIG_BLK_DEV_VIA82CXXX=y +CONFIG_ATA=y +CONFIG_ATA_GENERIC=y +CONFIG_PATA_VIA=y CONFIG_NETDEVICES=y CONFIG_GIANFAR=y CONFIG_E100=y diff --git a/arch/powerpc/configs/85xx/tqm8560_defconfig b/arch/powerpc/configs/85xx/tqm8560_defconfig index ca3b8c8ef30f9c..a80b971f7d6e0b 100644 --- a/arch/powerpc/configs/85xx/tqm8560_defconfig +++ b/arch/powerpc/configs/85xx/tqm8560_defconfig @@ -30,9 +30,9 @@ CONFIG_MTD_CFI_AMDSTD=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=32768 -CONFIG_IDE=y -CONFIG_BLK_DEV_GENERIC=y -CONFIG_BLK_DEV_VIA82CXXX=y +CONFIG_ATA=y +CONFIG_ATA_GENERIC=y +CONFIG_PATA_VIA=y CONFIG_NETDEVICES=y CONFIG_GIANFAR=y CONFIG_E100=y -- 2.28.0
Re: [PATCH -next v2] powerpc/perf: Fix symbol undeclared warning
> On 23-Sep-2020, at 12:44 PM, Wang Wensheng wrote: > > Build kernel with `C=2`: > arch/powerpc/perf/isa207-common.c:24:18: warning: symbol > 'isa207_pmu_format_attr' was not declared. Should it be static? > arch/powerpc/perf/power9-pmu.c:101:5: warning: symbol 'p9_dd21_bl_ev' > was not declared. Should it be static? > arch/powerpc/perf/power9-pmu.c:115:5: warning: symbol 'p9_dd22_bl_ev' > was not declared. Should it be static? > > Those symbols are used only in the files that define them so we declare > them as static to fix the warnings. Hi, Looks fine to me. Reviewed-by: Athira Rajeev Thanks Athira > > Signed-off-by: Wang Wensheng > --- > arch/powerpc/perf/isa207-common.c | 2 +- > arch/powerpc/perf/power9-pmu.c| 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/perf/isa207-common.c > b/arch/powerpc/perf/isa207-common.c > index 964437adec18..85dc860b265b 100644 > --- a/arch/powerpc/perf/isa207-common.c > +++ b/arch/powerpc/perf/isa207-common.c > @@ -21,7 +21,7 @@ PMU_FORMAT_ATTR(thresh_stop,"config:32-35"); > PMU_FORMAT_ATTR(thresh_start, "config:36-39"); > PMU_FORMAT_ATTR(thresh_cmp, "config:40-49"); > > -struct attribute *isa207_pmu_format_attr[] = { > +static struct attribute *isa207_pmu_format_attr[] = { > _attr_event.attr, > _attr_pmcxsel.attr, > _attr_mark.attr, > diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c > index 2a57e93a79dc..4a315fad1f99 100644 > --- a/arch/powerpc/perf/power9-pmu.c > +++ b/arch/powerpc/perf/power9-pmu.c > @@ -98,7 +98,7 @@ extern u64 PERF_REG_EXTENDED_MASK; > /* PowerISA v2.07 format attribute structure*/ > extern struct attribute_group isa207_pmu_format_group; > > -int p9_dd21_bl_ev[] = { > +static int p9_dd21_bl_ev[] = { > PM_MRK_ST_DONE_L2, > PM_RADIX_PWC_L1_HIT, > PM_FLOP_CMPL, > @@ -112,7 +112,7 @@ int p9_dd21_bl_ev[] = { > PM_DISP_HELD_SYNC_HOLD, > }; > > -int p9_dd22_bl_ev[] = { > +static int p9_dd22_bl_ev[] = { > PM_DTLB_MISS_16G, > PM_DERAT_MISS_2M, > PM_DTLB_MISS_2M, > -- > 2.25.0 >
Re: [PATCH V2] powerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraints
> On 22-Sep-2020, at 4:16 PM, Paul A. Clarke wrote: > > Just one nit in a comment below... > (and this is not worthy of tags like "reviewed-by" ;-) > > On Mon, Sep 21, 2020 at 03:10:04AM -0400, Athira Rajeev wrote: >> PMU counter support functions enforces event constraints for group of >> events to check if all events in a group can be monitored. Incase of >> event codes using PMC5 and PMC6 ( 500fa and 600f4 respectively ), >> not all constraints are applicable, say the threshold or sample bits. >> But current code includes pmc5 and pmc6 in some group constraints (like >> IC_DC Qualifier bits) which is actually not applicable and hence results >> in those events not getting counted when scheduled along with group of >> other events. Patch fixes this by excluding PMC5/6 from constraints >> which are not relevant for it. >> >> Fixes: 7ffd948 ("powerpc/perf: factor out power8 pmu functions") >> Signed-off-by: Athira Rajeev >> --- > >> diff --git a/arch/powerpc/perf/isa207-common.c >> b/arch/powerpc/perf/isa207-common.c >> index 964437a..12153da 100644 >> --- a/arch/powerpc/perf/isa207-common.c >> +++ b/arch/powerpc/perf/isa207-common.c >> @@ -288,6 +288,15 @@ int isa207_get_constraint(u64 event, unsigned long >> *maskp, unsigned long *valp) >> >> mask |= CNST_PMC_MASK(pmc); >> value |= CNST_PMC_VAL(pmc); >> + >> +/* >> + * PMC5 and PMC6 are used to count cycles and instructions >> + * and these doesnot support most of the constraint bits. > > s/doesnot/do not/ Hi Paul, Thanks for checking the patch and sharing the comment. Athira > >> + * Add a check to exclude PMC5/6 from most of the constraints >> + * except for ebb/bhrb. >> + */ >> +if (pmc >= 5) >> +goto ebb_bhrb; > > PC
Re: [PATCH] cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
Hi Srikar, > The patch avoids allocating cpufreq_policy on stack hence fixing frame > size overflow in 'powernv_cpufreq_reboot_notifier' > > ./drivers/cpufreq/powernv-cpufreq.c: In function > _powernv_cpufreq_reboot_notifier_: > ./drivers/cpufreq/powernv-cpufreq.c:906:1: error: the frame size of 2064 > bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > } > ^ > cc1: all warnings being treated as errors > make[3]: *** [./scripts/Makefile.build:316: > drivers/cpufreq/powernv-cpufreq.o] Error 1 > make[2]: *** [./scripts/Makefile.build:556: drivers/cpufreq] Error 2 > make[1]: *** [./Makefile:1072: drivers] Error 2 > make[1]: *** Waiting for unfinished jobs > make: *** [Makefile:157: sub-make] Error 2 > This looks a lot like commit d95fe371ecd2 ("cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn"). As with that patch, I have checked for matching puts/gets and that all uses of '&' check out. I tried to look at the snowpatch tests: they apparently reported a checkpatch warning but the file has since disappeared so I can't see what it was. Running checkpatch locally: $ scripts/checkpatch.pl -g HEAD -strict WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) #15: make[3]: *** [./scripts/Makefile.build:316: drivers/cpufreq/powernv-cpufreq.o] Error 1 This is benign and you shouldn't wrap that line anyway. On that basis: Reviewed-by: Daniel Axtens Kind regards, Daniel > Fixes: cf30af76 ("cpufreq: powernv: Set the cpus to nominal frequency during > reboot/kexec") > Cc: Pratik Rajesh Sampat > Cc: Daniel Axtens > Cc: Michael Ellerman > Cc: linuxppc-dev > Signed-off-by: Srikar Dronamraju > --- > drivers/cpufreq/powernv-cpufreq.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/cpufreq/powernv-cpufreq.c > b/drivers/cpufreq/powernv-cpufreq.c > index a9af15e..e439b43 100644 > --- a/drivers/cpufreq/powernv-cpufreq.c > +++ b/drivers/cpufreq/powernv-cpufreq.c > @@ -885,12 +885,15 @@ static int powernv_cpufreq_reboot_notifier(struct > notifier_block *nb, > unsigned long action, void *unused) > { > int cpu; > - struct cpufreq_policy cpu_policy; > + struct cpufreq_policy *cpu_policy; > > rebooting = true; > for_each_online_cpu(cpu) { > - cpufreq_get_policy(_policy, cpu); > - powernv_cpufreq_target_index(_policy, get_nominal_index()); > + cpu_policy = cpufreq_cpu_get(cpu); > + if (!cpu_policy) > + continue; > + powernv_cpufreq_target_index(cpu_policy, get_nominal_index()); > + cpufreq_cpu_put(cpu_policy); > } > > return NOTIFY_DONE; > -- > 1.8.3.1
[PATCH] powerpc: PPC_SECURE_BOOT should not require PowerNV
In commit 61f879d97ce4 ("powerpc/pseries: Detect secure and trusted boot state of the system.") we taught the kernel how to understand the secure-boot parameters used by a pseries guest. However, CONFIG_PPC_SECURE_BOOT still requires PowerNV. I didn't catch this because pseries_le_defconfig includes support for PowerNV and so everything still worked. Indeed, most configs will. Nonetheless, technically PPC_SECURE_BOOT doesn't require PowerNV any more. The secure variables support (PPC_SECVAR_SYSFS) doesn't do anything on pSeries yet, but I don't think it's worth adding a new condition - at some stage we'll want to add a backend for pSeries anyway. Fixes: 61f879d97ce4 ("powerpc/pseries: Detect secure and trusted boot state of the system.") Cc: Nayna Jain Signed-off-by: Daniel Axtens --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4b33477dafb8..f645fa934853 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -983,7 +983,7 @@ config PPC_MEM_KEYS config PPC_SECURE_BOOT prompt "Enable secure boot support" bool - depends on PPC_POWERNV + depends on PPC_POWERNV || PPC_PSERIES depends on IMA_ARCH_POLICY imply IMA_SECURE_AND_OR_TRUSTED_BOOT help -- 2.25.1
RE: [PATCH 5/9] fs: remove various compat readv/writev helpers
From: Arnd Bergmann > Sent: 23 September 2020 19:46 ... > Regardless of that, another advantage of having the SYSCALL_DECLAREx() > would be the ability to include that header file from elsewhere with a > different > macro definition to create a machine-readable version of the interface when > combined with the syscall.tbl files. This could be used to create a user > space stub for calling into the low-level syscall regardless of the > libc interfaces, > or for synchronizing the interfaces with strace, qemu-user, or anything that > needs to deal with the low-level interface. A similar 'trick' (that probably won't work here) is to pass the name of a #define function as a parameter to another define. Useful for defining constants and error strings together. eg: #define TRAFFIC_LIGHTS(x) \ x(RED, 0, "Red") \ x(YELLOW, 1, "Yellow) \ x(GREEN, 2, "GREEN) You can then do thing like: #define x(token, value, string) token = value, enum {TRAFFIC_LIGHTS(x) NUM_LIGHTS}; #undef x #define x(token, value, string) [value] = string, const char *colours[] = {TRAFFIC_LIGHTS(x)}; #undef x to initialise constants and a name table that are always in sync. It is also a good way to generate source lines that are over 1MB. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Wed, 23 Sep 2020 22:55:54 +0200 Thomas Gleixner wrote: > > Perhaps make migrate_disable() an anonymous local_lock()? > > > > This should lower the SHC in theory, if you can't have stacked migrate > > disables on the same CPU. > > I'm pretty sure this ends up in locking hell pretty fast and aside of > that it's not working for scenarios like: > > kmap_local(); >migrate_disable(); >... > > copy_from_user() > -> #PF >-> schedule() > > which brought us into that discussion in the first place. You would stop > any other migrate disable user from running until the page fault is > resolved... Then scratch the idea of having anonymous local_lock() and just bring local_lock in directly? Then have a kmap local lock, which would only block those that need to do a kmap. Now as for migration disabled nesting, at least now we would have groupings of this, and perhaps the theorists can handle that. I mean, how is this much different that having a bunch of tasks blocked on a mutex with the owner is pinned on a CPU? migrate_disable() is a BKL of pinning affinity. If we only have local_lock() available (even on !RT), then it makes the blocking in groups. At least this way you could grep for all the different local_locks in the system and plug that into the algorithm for WCS, just like one would with a bunch of mutexes. -- Steve
Re: [PATCH] spi: fsl-espi: Only process interrupts for expected events
On 24/09/20 8:27 am, Heiner Kallweit wrote: > On 04.09.2020 02:28, Chris Packham wrote: >> The SPIE register contains counts for the TX FIFO so any time the irq >> handler was invoked we would attempt to process the RX/TX fifos. Use the >> SPIM value to mask the events so that we only process interrupts that >> were expected. >> >> This was a latent issue exposed by commit 3282a3da25bd ("powerpc/64: >> Implement soft interrupt replay in C"). >> >> Signed-off-by: Chris Packham >> Cc: sta...@vger.kernel.org >> --- >> >> Notes: >> I've tested this on a T2080RDB and a custom board using the T2081 SoC. >> With >> this change I don't see any spurious instances of the "Transfer done but >> SPIE_DON isn't set!" or "Transfer done but rx/tx fifo's aren't empty!" >> messages >> and the updates to spi flash are successful. >> >> I think this should go into the stable trees that contain 3282a3da25bd >> but I >> haven't added a Fixes: tag because I think 3282a3da25bd exposed the >> issue as >> opposed to causing it. >> >> drivers/spi/spi-fsl-espi.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c >> index 7e7c92cafdbb..cb120b68c0e2 100644 >> --- a/drivers/spi/spi-fsl-espi.c >> +++ b/drivers/spi/spi-fsl-espi.c >> @@ -574,13 +574,14 @@ static void fsl_espi_cpu_irq(struct fsl_espi *espi, >> u32 events) >> static irqreturn_t fsl_espi_irq(s32 irq, void *context_data) >> { >> struct fsl_espi *espi = context_data; >> -u32 events; >> +u32 events, mask; >> >> spin_lock(>lock); >> >> /* Get interrupt events(tx/rx) */ >> events = fsl_espi_read_reg(espi, ESPI_SPIE); >> -if (!events) { >> +mask = fsl_espi_read_reg(espi, ESPI_SPIM); >> +if (!(events & mask)) { >> spin_unlock(>lock); >> return IRQ_NONE; > Sorry, I was on vacation and therefore couldn't comment earlier. > I'm fine with the change, just one thing could be improved IMO. > If we skip an unneeded interrupt now, then returning IRQ_NONE > causes reporting this interrupt as spurious. This isn't too nice > as spurious interrupts typically are seen as a problem indicator. > Therefore returning IRQ_HANDLED should be more appropriate. > This would just require a comment in the code explaining why we > do this, and why it can happen that we receive interrupts > we're not interested in. I'd be happy to send a follow-up to change IRQ_NONE to IRQ_HANDLED. I don't think the old code could have ever hit the IRQ_NONE (because event will always be non-zero) so it won't really be a change in behaviour. With the patch (that is now in spi/for-next) so far I do see a low number of spurious interrupts on the test setup where previously I would have seen failure to talk to the spi-flash.
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Wed, Sep 23 2020 at 11:52, Steven Rostedt wrote: > On Wed, 23 Sep 2020 10:40:32 +0200 > pet...@infradead.org wrote: > >> However, with migrate_disable() we can have each task preempted in a >> migrate_disable() region, worse we can stack them all on the _same_ CPU >> (super ridiculous odds, sure). And then we end up only able to run one >> task, with the rest of the CPUs picking their nose. > > What if we just made migrate_disable() a local_lock() available for !RT? > > I mean make it a priority inheritance PER CPU lock. > > That is, no two tasks could do a migrate_disable() on the same CPU? If > one task does a migrate_disable() and then gets preempted and the > preempting task does a migrate_disable() on the same CPU, it will block > and wait for the first task to do a migrate_enable(). > > No two tasks on the same CPU could enter the migrate_disable() section > simultaneously, just like no two tasks could enter a preempt_disable() > section. > > In essence, we just allow local_lock() to be used for both RT and !RT. > > Perhaps make migrate_disable() an anonymous local_lock()? > > This should lower the SHC in theory, if you can't have stacked migrate > disables on the same CPU. I'm pretty sure this ends up in locking hell pretty fast and aside of that it's not working for scenarios like: kmap_local(); migrate_disable(); ... copy_from_user() -> #PF -> schedule() which brought us into that discussion in the first place. You would stop any other migrate disable user from running until the page fault is resolved... Thanks, tglx
Re: [PATCH] spi: fsl-espi: Only process interrupts for expected events
On 04.09.2020 02:28, Chris Packham wrote: > The SPIE register contains counts for the TX FIFO so any time the irq > handler was invoked we would attempt to process the RX/TX fifos. Use the > SPIM value to mask the events so that we only process interrupts that > were expected. > > This was a latent issue exposed by commit 3282a3da25bd ("powerpc/64: > Implement soft interrupt replay in C"). > > Signed-off-by: Chris Packham > Cc: sta...@vger.kernel.org > --- > > Notes: > I've tested this on a T2080RDB and a custom board using the T2081 SoC. > With > this change I don't see any spurious instances of the "Transfer done but > SPIE_DON isn't set!" or "Transfer done but rx/tx fifo's aren't empty!" > messages > and the updates to spi flash are successful. > > I think this should go into the stable trees that contain 3282a3da25bd > but I > haven't added a Fixes: tag because I think 3282a3da25bd exposed the issue > as > opposed to causing it. > > drivers/spi/spi-fsl-espi.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c > index 7e7c92cafdbb..cb120b68c0e2 100644 > --- a/drivers/spi/spi-fsl-espi.c > +++ b/drivers/spi/spi-fsl-espi.c > @@ -574,13 +574,14 @@ static void fsl_espi_cpu_irq(struct fsl_espi *espi, u32 > events) > static irqreturn_t fsl_espi_irq(s32 irq, void *context_data) > { > struct fsl_espi *espi = context_data; > - u32 events; > + u32 events, mask; > > spin_lock(>lock); > > /* Get interrupt events(tx/rx) */ > events = fsl_espi_read_reg(espi, ESPI_SPIE); > - if (!events) { > + mask = fsl_espi_read_reg(espi, ESPI_SPIM); > + if (!(events & mask)) { > spin_unlock(>lock); > return IRQ_NONE; Sorry, I was on vacation and therefore couldn't comment earlier. I'm fine with the change, just one thing could be improved IMO. If we skip an unneeded interrupt now, then returning IRQ_NONE causes reporting this interrupt as spurious. This isn't too nice as spurious interrupts typically are seen as a problem indicator. Therefore returning IRQ_HANDLED should be more appropriate. This would just require a comment in the code explaining why we do this, and why it can happen that we receive interrupts we're not interested in. > } >
Re: [PATCH v6 04/11] PCI: designware-ep: Modify MSI and MSIX CAP way of finding
s/MSIX/MSI-X/ (subject and below) On Sat, Mar 14, 2020 at 11:30:31AM +0800, Xiaowei Bao wrote: > Each PF of EP device should have it's own MSI or MSIX capabitily > struct, so create a dw_pcie_ep_func struct and remove the msi_cap > and msix_cap to this struct from dw_pcie_ep, and manage the PFs > with a list. s/capabitily/capability/ I know Lorenzo has already applied this, but for the future, or in case there are other reasons to update this patch. There are a bunch of unnecessary initializations below for future cleanup. > Signed-off-by: Xiaowei Bao > --- > v3: > - This is a new patch, to fix the issue of MSI and MSIX CAP way of >finding. > v4: > - Correct some word of commit message. > v5: > - No change. > v6: > - Fix up the compile error. > > drivers/pci/controller/dwc/pcie-designware-ep.c | 135 > +--- > drivers/pci/controller/dwc/pcie-designware.h| 18 +++- > 2 files changed, 134 insertions(+), 19 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c > b/drivers/pci/controller/dwc/pcie-designware-ep.c > index 933bb89..fb915f2 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c > @@ -19,6 +19,19 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep) > pci_epc_linkup(epc); > } > > +struct dw_pcie_ep_func * > +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no) > +{ > + struct dw_pcie_ep_func *ep_func; > + > + list_for_each_entry(ep_func, >func_list, list) { > + if (ep_func->func_no == func_no) > + return ep_func; > + } > + > + return NULL; > +} > + > static unsigned int dw_pcie_ep_func_select(struct dw_pcie_ep *ep, u8 func_no) > { > unsigned int func_offset = 0; > @@ -59,6 +72,47 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum > pci_barno bar) > __dw_pcie_ep_reset_bar(pci, func_no, bar, 0); > } > > +static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie_ep *ep, u8 func_no, > + u8 cap_ptr, u8 cap) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + unsigned int func_offset = 0; Unnecessary initialization. > + u8 cap_id, next_cap_ptr; > + u16 reg; > + > + if (!cap_ptr) > + return 0; > + > + func_offset = dw_pcie_ep_func_select(ep, func_no); > + > + reg = dw_pcie_readw_dbi(pci, func_offset + cap_ptr); > + cap_id = (reg & 0x00ff); > + > + if (cap_id > PCI_CAP_ID_MAX) > + return 0; > + > + if (cap_id == cap) > + return cap_ptr; > + > + next_cap_ptr = (reg & 0xff00) >> 8; > + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap); > +} > + > +static u8 dw_pcie_ep_find_capability(struct dw_pcie_ep *ep, u8 func_no, u8 > cap) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + unsigned int func_offset = 0; Unnecessary initialization. > + u8 next_cap_ptr; > + u16 reg; > + > + func_offset = dw_pcie_ep_func_select(ep, func_no); > + > + reg = dw_pcie_readw_dbi(pci, func_offset + PCI_CAPABILITY_LIST); > + next_cap_ptr = (reg & 0x00ff); > + > + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap); > +} > + > static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no, > struct pci_epf_header *hdr) > { > @@ -246,13 +300,18 @@ static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 > func_no) > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > u32 val, reg; > unsigned int func_offset = 0; Unnecessary initialization (not from your patch). > + struct dw_pcie_ep_func *ep_func; > > - if (!ep->msi_cap) > + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no); > + if (!ep_func) > + return -EINVAL; > + > + if (!ep_func->msi_cap) > return -EINVAL; > > func_offset = dw_pcie_ep_func_select(ep, func_no); > > - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS; > + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS; > val = dw_pcie_readw_dbi(pci, reg); > if (!(val & PCI_MSI_FLAGS_ENABLE)) > return -EINVAL; > @@ -268,13 +327,18 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 > func_no, u8 interrupts) > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > u32 val, reg; > unsigned int func_offset = 0; Unnecessary initialization (not from your patch). > + struct dw_pcie_ep_func *ep_func; > + > + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no); > + if (!ep_func) > + return -EINVAL; > > - if (!ep->msi_cap) > + if (!ep_func->msi_cap) > return -EINVAL; > > func_offset = dw_pcie_ep_func_select(ep, func_no); > > - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS; > + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS; > val = dw_pcie_readw_dbi(pci, reg); > val &= ~PCI_MSI_FLAGS_QMASK; > val
Re: [PATCH v6 0/3] Carry forward IMA measurement log on kexec on ARM64
[Cc'ing Nayna Jain, linuxppc-dev@lists.ozlabs.org] Hi Lakshmi, On Tue, 2020-09-08 at 16:08 -0700, Lakshmi Ramasubramanian wrote: > On kexec file load Integrity Measurement Architecture(IMA) subsystem > may verify the IMA signature of the kernel and initramfs, and measure > it. The command line parameters passed to the kernel in the kexec call > may also be measured by IMA. A remote attestation service can verify > the measurement through the IMA log and the TPM PCR data. This can be > achieved only if the IMA measurement log is carried over from > the current kernel to the next kernel across the kexec call. > However in the current implementation the IMA measurement logs are not > carried over on ARM64 platforms. Therefore a remote attestation service > cannot verify the authenticity of the running kernel on ARM64 platforms > when the kernel is updated through the kexec system call. > > This patch series adds support for carrying forward the IMA measurement > log on kexec on ARM64. powerpc already supports carrying forward > the IMA measurement log on kexec. > > This series refactors the platform independent code defined for powerpc > such that it can be reused for ARM64 as well. A chosen node namely > "linux,ima-kexec-buffer" is added to the DTB for ARM64 to hold > the address and the size of the memory reserved to carry > the IMA measurement log. > > This patch series has been tested for ARM64 platform using QEMU. > I would like help from the community for testing this change on powerpc. > Thanks. Getting access to and upgrading our group's Power system firmware took time due to limited employee site access. Confirmed, with these patches applied, ima-evm-utils still properly verifies carrying the IMA measurement list across kexec. I plan on reviewing this patch set shortly. thanks, Mimi > > This series is based on commit f4d51dffc6c0 ("Linux 5.9-rc4") in > https://github.com/torvalds/linux "master" branch. > > Changelog: > > v6: > - Remove any existing FDT_PROP_IMA_KEXEC_BUFFER property in the device > tree and also its corresponding memory reservation in the currently > running kernel. > - Moved the function remove_ima_buffer() defined for powerpc to IMA > and renamed the function to ima_remove_kexec_buffer(). Also, moved > delete_fdt_mem_rsv() from powerpc to IMA. > > v5: > - Merged get_addr_size_cells() and do_get_kexec_buffer() into a single > function when moving the arch independent code from powerpc to IMA > - Reverted the change to use FDT functions in powerpc code and added > back the original code in get_addr_size_cells() and > do_get_kexec_buffer() for powerpc. > - Added fdt_add_mem_rsv() for ARM64 to reserve the memory for > the IMA log buffer during kexec. > - Fixed the warning reported by kernel test bot for ARM64 > arch_ima_add_kexec_buffer() - moved this function to a new file > namely arch/arm64/kernel/ima_kexec.c > > v4: > - Submitting the patch series on behalf of the original author > Prakhar Srivastava > - Moved FDT_PROP_IMA_KEXEC_BUFFER ("linux,ima-kexec-buffer") to > libfdt.h so that it can be shared by multiple platforms. > > v3: > Breakup patches further into separate patches. > - Refactoring non architecture specific code out of powerpc > - Update powerpc related code to use fdt functions > - Update IMA buffer read related code to use of functions > - Add support to store the memory information of the IMA > measurement logs to be carried forward. > - Update the property strings to align with documented nodes > https://github.com/devicetree-org/dt-schema/pull/46 > > v2: > Break patches into separate patches. > - Powerpc related Refactoring > - Updating the docuemntation for chosen node > - Updating arm64 to support IMA buffer pass > > v1: > Refactoring carrying over IMA measuremnet logs over Kexec. This patch > moves the non-architecture specific code out of powerpc and adds to > security/ima.(Suggested by Thiago) > Add Documentation regarding the ima-kexec-buffer node in the chosen > node documentation > > v0: > Add a layer of abstraction to use the memory reserved by device tree > for ima buffer pass. > Add support for ima buffer pass using reserved memory for arm64 kexec. > Update the arch sepcific code path in kexec file load to store the > ima buffer in the reserved memory. The same reserved memory is read > on kexec or cold boot. > > Lakshmi Ramasubramanian (3): > powerpc: Refactor kexec functions to move arch independent code to IMA > arm64: Store IMA log information in kimage used for kexec > arm64: Add IMA kexec buffer to DTB > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/ima.h| 18 > arch/arm64/include/asm/kexec.h | 3 + > arch/arm64/kernel/Makefile | 1 + > arch/arm64/kernel/ima_kexec.c | 34 >
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 9:48 PM Al Viro wrote: > FWIW, after playing with that for a while... Do we really want the > compat_sys_...() declarations to live in linux/compat.h? Most of > the users of that file don't want those; why not move them to > linux/syscalls.h? Sure, let's do that. The trend overall is to integrate the compat stuff more closely into where the native implementation lives, so this would just follow that trend. I think with Christoph's latest patches, about half of them are going away as well. > Reason: there's a lot more users of linux/compat.h than those of > linux/syscalls.h - it's pulled by everything in the networking stack, > for starters... Right, the network headers pull in almost everything else through multiple indirect inclusions, anything we can do to reduce that helps. Arnd
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 08:45:51PM +0200, Arnd Bergmann wrote: > On Wed, Sep 23, 2020 at 6:38 PM Al Viro wrote: > > > > I wonder if we should do something like > > > > SYSCALL_DECLARE3(readv, unsigned long, fd, const struct iovec __user *, vec, > > unsigned long, vlen); > > in syscalls.h instead, and not under that ifdef. > > > > Let it expand to declaration of sys_...() in generic case and, on x86, into > > __do_sys_...() and __ia32_sys_...()/__x64_sys_...(), with types matching > > what SYSCALL_DEFINE ends up using. > > > > Similar macro would cover compat_sys_...() declarations. That would > > restore mismatch checking for x86 and friends. AFAICS, the cost wouldn't > > be terribly high - cpp would have more to chew through in syscalls.h, > > but it shouldn't be all that costly. Famous last words, of course... > > > > Does anybody see fundamental problems with that? > > I've had some ideas along those lines in the past and I think it should work. > > As a variation of this, the SYSCALL_DEFINEx() macros could go away > entirely, leaving only the macro instantiations from the header to > require that syntax. It would require first changing the remaining > architectures to build the syscall table from C code instead of > assembler though. > > Regardless of that, another advantage of having the SYSCALL_DECLAREx() > would be the ability to include that header file from elsewhere with a > different > macro definition to create a machine-readable version of the interface when > combined with the syscall.tbl files. This could be used to create a user > space stub for calling into the low-level syscall regardless of the > libc interfaces, > or for synchronizing the interfaces with strace, qemu-user, or anything that > needs to deal with the low-level interface. FWIW, after playing with that for a while... Do we really want the compat_sys_...() declarations to live in linux/compat.h? Most of the users of that file don't want those; why not move them to linux/syscalls.h? Reason: there's a lot more users of linux/compat.h than those of linux/syscalls.h - it's pulled by everything in the networking stack, for starters...
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 6:38 PM Al Viro wrote: > > I wonder if we should do something like > > SYSCALL_DECLARE3(readv, unsigned long, fd, const struct iovec __user *, vec, > unsigned long, vlen); > in syscalls.h instead, and not under that ifdef. > > Let it expand to declaration of sys_...() in generic case and, on x86, into > __do_sys_...() and __ia32_sys_...()/__x64_sys_...(), with types matching > what SYSCALL_DEFINE ends up using. > > Similar macro would cover compat_sys_...() declarations. That would > restore mismatch checking for x86 and friends. AFAICS, the cost wouldn't > be terribly high - cpp would have more to chew through in syscalls.h, > but it shouldn't be all that costly. Famous last words, of course... > > Does anybody see fundamental problems with that? I've had some ideas along those lines in the past and I think it should work. As a variation of this, the SYSCALL_DEFINEx() macros could go away entirely, leaving only the macro instantiations from the header to require that syntax. It would require first changing the remaining architectures to build the syscall table from C code instead of assembler though. Regardless of that, another advantage of having the SYSCALL_DECLAREx() would be the ability to include that header file from elsewhere with a different macro definition to create a machine-readable version of the interface when combined with the syscall.tbl files. This could be used to create a user space stub for calling into the low-level syscall regardless of the libc interfaces, or for synchronizing the interfaces with strace, qemu-user, or anything that needs to deal with the low-level interface. Arnd
Re: [PATCH v2] i2c: cpm: Fix i2c_ram structure
Le 23/09/2020 à 18:08, Wolfram Sang a écrit : On Wed, Sep 23, 2020 at 04:08:40PM +0200, nico.vi...@gmail.com wrote: From: Nicolas VINCENT the i2c_ram structure is missing the sdmatmp field mentionned in datasheet for MPC8272 at paragraph 36.5. With this field missing, the hardware would write past the allocated memory done through cpm_muram_alloc for the i2c_ram structure and land in memory allocated for the buffers descriptors corrupting the cbd_bufaddr field. Since this field is only set during setup(), the first i2c transaction would work and the following would send data read from an arbitrary memory location. Signed-off-by: Nicolas VINCENT Thanks! Is someone able to identify a Fixes: tag I could add? I'd suggest Fixes: 61045dbe9d8d ("i2c: Add support for I2C bus on Freescale CPM1/CPM2 controllers") Christophe --- drivers/i2c/busses/i2c-cpm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c index 1213e1932ccb..24d584a1c9a7 100644 --- a/drivers/i2c/busses/i2c-cpm.c +++ b/drivers/i2c/busses/i2c-cpm.c @@ -65,6 +65,9 @@ struct i2c_ram { charres1[4];/* Reserved */ ushort rpbase; /* Relocation pointer */ charres2[2];/* Reserved */ + /* The following elements are only for CPM2 */ + charres3[4];/* Reserved */ + uintsdmatmp;/* Internal */ }; #define I2COM_START 0x80 -- 2.17.1
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 06:05:27PM +0100, Al Viro wrote: > On Wed, Sep 23, 2020 at 05:38:31PM +0100, Al Viro wrote: > > On Wed, Sep 23, 2020 at 03:59:01PM +0100, Al Viro wrote: > > > > > > That's a very good question. But it does not just compile but actually > > > > works. Probably because all the syscall wrappers mean that we don't > > > > actually generate the normal names. I just tried this: > > > > > > > > --- a/include/linux/syscalls.h > > > > +++ b/include/linux/syscalls.h > > > > @@ -468,7 +468,7 @@ asmlinkage long sys_lseek(unsigned int fd, off_t > > > > offset, > > > > asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t > > > > count); > > > > asmlinkage long sys_write(unsigned int fd, const char __user *buf, > > > > size_t count); > > > > -asmlinkage long sys_readv(unsigned long fd, > > > > +asmlinkage long sys_readv(void *fd, > > > > > > > > for fun, and the compiler doesn't care either.. > > > > > > Try to build it for sparc or ppc... > > > > FWIW, declarations in syscalls.h used to serve 4 purposes: > > 1) syscall table initializers needed symbols declared > > 2) direct calls needed the same > > 3) catching mismatches between the declarations and definitions > > 4) centralized list of all syscalls > > > > (2) has been (thankfully) reduced for some time; in any case, ksys_... is > > used for the remaining ones. > > > > (1) and (3) are served by syscalls.h in architectures other than x86, arm64 > > and s390. On those 3 (1) is done otherwise (near the syscall table > > initializer) > > and (3) is not done at all. > > > > I wonder if we should do something like > > > > SYSCALL_DECLARE3(readv, unsigned long, fd, const struct iovec __user *, vec, > > unsigned long, vlen); > > in syscalls.h instead, and not under that ifdef. > > > > Let it expand to declaration of sys_...() in generic case and, on x86, into > > __do_sys_...() and __ia32_sys_...()/__x64_sys_...(), with types matching > > what SYSCALL_DEFINE ends up using. > > > > Similar macro would cover compat_sys_...() declarations. That would > > restore mismatch checking for x86 and friends. AFAICS, the cost wouldn't > > be terribly high - cpp would have more to chew through in syscalls.h, > > but it shouldn't be all that costly. Famous last words, of course... > > > > Does anybody see fundamental problems with that? > > Just to make it clear - I do not propose to fold that into this series; > there we just need to keep those declarations in sync with fs/read_write.c Agreed. The above idea generally sounds sane to me.
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 12:39 PM Al Viro wrote: > > On Wed, Sep 23, 2020 at 03:59:01PM +0100, Al Viro wrote: > > > > That's a very good question. But it does not just compile but actually > > > works. Probably because all the syscall wrappers mean that we don't > > > actually generate the normal names. I just tried this: > > > > > > --- a/include/linux/syscalls.h > > > +++ b/include/linux/syscalls.h > > > @@ -468,7 +468,7 @@ asmlinkage long sys_lseek(unsigned int fd, off_t > > > offset, > > > asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t > > > count); > > > asmlinkage long sys_write(unsigned int fd, const char __user *buf, > > > size_t count); > > > -asmlinkage long sys_readv(unsigned long fd, > > > +asmlinkage long sys_readv(void *fd, > > > > > > for fun, and the compiler doesn't care either.. > > > > Try to build it for sparc or ppc... > > FWIW, declarations in syscalls.h used to serve 4 purposes: > 1) syscall table initializers needed symbols declared > 2) direct calls needed the same > 3) catching mismatches between the declarations and definitions > 4) centralized list of all syscalls > > (2) has been (thankfully) reduced for some time; in any case, ksys_... is > used for the remaining ones. > > (1) and (3) are served by syscalls.h in architectures other than x86, arm64 > and s390. On those 3 (1) is done otherwise (near the syscall table > initializer) > and (3) is not done at all. > > I wonder if we should do something like > > SYSCALL_DECLARE3(readv, unsigned long, fd, const struct iovec __user *, vec, > unsigned long, vlen); > in syscalls.h instead, and not under that ifdef. > > Let it expand to declaration of sys_...() in generic case and, on x86, into > __do_sys_...() and __ia32_sys_...()/__x64_sys_...(), with types matching > what SYSCALL_DEFINE ends up using. > > Similar macro would cover compat_sys_...() declarations. That would > restore mismatch checking for x86 and friends. AFAICS, the cost wouldn't > be terribly high - cpp would have more to chew through in syscalls.h, > but it shouldn't be all that costly. Famous last words, of course... > > Does anybody see fundamental problems with that? I think this would be a good idea. I have been working on a patchset to clean up the conditional syscall handling (sys_ni.c), and conflicts with the prototypes in syscalls.h have been getting in the way. Having the prototypes use SYSCALL_DECLAREx(...) would solve that issue. -- Brian Gerst
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 05:38:31PM +0100, Al Viro wrote: > On Wed, Sep 23, 2020 at 03:59:01PM +0100, Al Viro wrote: > > > > That's a very good question. But it does not just compile but actually > > > works. Probably because all the syscall wrappers mean that we don't > > > actually generate the normal names. I just tried this: > > > > > > --- a/include/linux/syscalls.h > > > +++ b/include/linux/syscalls.h > > > @@ -468,7 +468,7 @@ asmlinkage long sys_lseek(unsigned int fd, off_t > > > offset, > > > asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t > > > count); > > > asmlinkage long sys_write(unsigned int fd, const char __user *buf, > > > size_t count); > > > -asmlinkage long sys_readv(unsigned long fd, > > > +asmlinkage long sys_readv(void *fd, > > > > > > for fun, and the compiler doesn't care either.. > > > > Try to build it for sparc or ppc... > > FWIW, declarations in syscalls.h used to serve 4 purposes: > 1) syscall table initializers needed symbols declared > 2) direct calls needed the same > 3) catching mismatches between the declarations and definitions > 4) centralized list of all syscalls > > (2) has been (thankfully) reduced for some time; in any case, ksys_... is > used for the remaining ones. > > (1) and (3) are served by syscalls.h in architectures other than x86, arm64 > and s390. On those 3 (1) is done otherwise (near the syscall table > initializer) > and (3) is not done at all. > > I wonder if we should do something like > > SYSCALL_DECLARE3(readv, unsigned long, fd, const struct iovec __user *, vec, >unsigned long, vlen); > in syscalls.h instead, and not under that ifdef. > > Let it expand to declaration of sys_...() in generic case and, on x86, into > __do_sys_...() and __ia32_sys_...()/__x64_sys_...(), with types matching > what SYSCALL_DEFINE ends up using. > > Similar macro would cover compat_sys_...() declarations. That would > restore mismatch checking for x86 and friends. AFAICS, the cost wouldn't > be terribly high - cpp would have more to chew through in syscalls.h, > but it shouldn't be all that costly. Famous last words, of course... > > Does anybody see fundamental problems with that? Just to make it clear - I do not propose to fold that into this series; there we just need to keep those declarations in sync with fs/read_write.c
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 03:59:01PM +0100, Al Viro wrote: > > That's a very good question. But it does not just compile but actually > > works. Probably because all the syscall wrappers mean that we don't > > actually generate the normal names. I just tried this: > > > > --- a/include/linux/syscalls.h > > +++ b/include/linux/syscalls.h > > @@ -468,7 +468,7 @@ asmlinkage long sys_lseek(unsigned int fd, off_t offset, > > asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t count); > > asmlinkage long sys_write(unsigned int fd, const char __user *buf, > > size_t count); > > -asmlinkage long sys_readv(unsigned long fd, > > +asmlinkage long sys_readv(void *fd, > > > > for fun, and the compiler doesn't care either.. > > Try to build it for sparc or ppc... FWIW, declarations in syscalls.h used to serve 4 purposes: 1) syscall table initializers needed symbols declared 2) direct calls needed the same 3) catching mismatches between the declarations and definitions 4) centralized list of all syscalls (2) has been (thankfully) reduced for some time; in any case, ksys_... is used for the remaining ones. (1) and (3) are served by syscalls.h in architectures other than x86, arm64 and s390. On those 3 (1) is done otherwise (near the syscall table initializer) and (3) is not done at all. I wonder if we should do something like SYSCALL_DECLARE3(readv, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen); in syscalls.h instead, and not under that ifdef. Let it expand to declaration of sys_...() in generic case and, on x86, into __do_sys_...() and __ia32_sys_...()/__x64_sys_...(), with types matching what SYSCALL_DEFINE ends up using. Similar macro would cover compat_sys_...() declarations. That would restore mismatch checking for x86 and friends. AFAICS, the cost wouldn't be terribly high - cpp would have more to chew through in syscalls.h, but it shouldn't be all that costly. Famous last words, of course... Does anybody see fundamental problems with that?
Re: [PATCH v2] i2c: cpm: Fix i2c_ram structure
On Wed, Sep 23, 2020 at 04:08:40PM +0200, nico.vi...@gmail.com wrote: > From: Nicolas VINCENT > > the i2c_ram structure is missing the sdmatmp field mentionned in > datasheet for MPC8272 at paragraph 36.5. With this field missing, the > hardware would write past the allocated memory done through > cpm_muram_alloc for the i2c_ram structure and land in memory allocated > for the buffers descriptors corrupting the cbd_bufaddr field. Since this > field is only set during setup(), the first i2c transaction would work > and the following would send data read from an arbitrary memory > location. > > Signed-off-by: Nicolas VINCENT Thanks! Is someone able to identify a Fixes: tag I could add? > --- > drivers/i2c/busses/i2c-cpm.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c > index 1213e1932ccb..24d584a1c9a7 100644 > --- a/drivers/i2c/busses/i2c-cpm.c > +++ b/drivers/i2c/busses/i2c-cpm.c > @@ -65,6 +65,9 @@ struct i2c_ram { > charres1[4];/* Reserved */ > ushort rpbase; /* Relocation pointer */ > charres2[2];/* Reserved */ > + /* The following elements are only for CPM2 */ > + charres3[4];/* Reserved */ > + uintsdmatmp;/* Internal */ > }; > > #define I2COM_START 0x80 > -- > 2.17.1 > signature.asc Description: PGP signature
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Wed, 23 Sep 2020 10:40:32 +0200 pet...@infradead.org wrote: > However, with migrate_disable() we can have each task preempted in a > migrate_disable() region, worse we can stack them all on the _same_ CPU > (super ridiculous odds, sure). And then we end up only able to run one > task, with the rest of the CPUs picking their nose. What if we just made migrate_disable() a local_lock() available for !RT? I mean make it a priority inheritance PER CPU lock. That is, no two tasks could do a migrate_disable() on the same CPU? If one task does a migrate_disable() and then gets preempted and the preempting task does a migrate_disable() on the same CPU, it will block and wait for the first task to do a migrate_enable(). No two tasks on the same CPU could enter the migrate_disable() section simultaneously, just like no two tasks could enter a preempt_disable() section. In essence, we just allow local_lock() to be used for both RT and !RT. Perhaps make migrate_disable() an anonymous local_lock()? This should lower the SHC in theory, if you can't have stacked migrate disables on the same CPU. -- Steve
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 04:32:51PM +0200, Christoph Hellwig wrote: > On Wed, Sep 23, 2020 at 03:25:49PM +0100, Al Viro wrote: > > On Wed, Sep 23, 2020 at 08:05:43AM +0200, Christoph Hellwig wrote: > > > COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd, > > > - const struct compat_iovec __user *,vec, > > > + const struct iovec __user *, vec, > > > > Um... Will it even compile? > > > > > #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64 > > > COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd, > > > - const struct compat_iovec __user *,vec, > > > + const struct iovec __user *, vec, > > > > Ditto. Look into include/linux/compat.h and you'll see > > > > asmlinkage long compat_sys_preadv64(unsigned long fd, > > const struct compat_iovec __user *vec, > > unsigned long vlen, loff_t pos); > > > > How does that manage to avoid the compiler screaming bloody > > murder? > > That's a very good question. But it does not just compile but actually > works. Probably because all the syscall wrappers mean that we don't > actually generate the normal names. I just tried this: > > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -468,7 +468,7 @@ asmlinkage long sys_lseek(unsigned int fd, off_t offset, > asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t count); > asmlinkage long sys_write(unsigned int fd, const char __user *buf, > size_t count); > -asmlinkage long sys_readv(unsigned long fd, > +asmlinkage long sys_readv(void *fd, > > for fun, and the compiler doesn't care either.. Try to build it for sparc or ppc...
Re: [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
On Wed, Sep 23, 2020 at 02:38:24PM +, David Laight wrote: > From: Al Viro > > Sent: 23 September 2020 15:17 > > > > On Wed, Sep 23, 2020 at 08:05:41AM +0200, Christoph Hellwig wrote: > > > > > +struct iovec *iovec_from_user(const struct iovec __user *uvec, > > > + unsigned long nr_segs, unsigned long fast_segs, > > > > Hmm... For fast_segs unsigned long had always been ridiculous > > (4G struct iovec on caller stack frame?), but that got me wondering about > > nr_segs and I wish I'd thought of that when introducing import_iovec(). > > > > The thing is, import_iovec() takes unsigned int there. Which is fine > > (hell, the maximal value that can be accepted in 1024), except that > > we do pass unsigned long syscall argument to it in some places. > > It will make diddly-squit difference. > The parameters end up in registers on most calling conventions. > Plausibly you get an extra 'REX' byte on x86 for the 64bit value. > What you want to avoid is explicit sign/zero extension and value > masking after arithmetic. Don't tell me what I want; your telepathic abilities are consistently sucky. I am *NOT* talking about microoptimization here. I have described the behaviour change of syscall caused by commit 5 years ago. Which is generally considered a problem. Then I asked whether that behaviour change would fall under the "if nobody noticed, it's not a userland ABI breakage" exception. Could you show me the point where I have expressed concerns about the quality of amd64 code generated for that thing, before or after the change in question?
Re: [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
On Wed, Sep 23, 2020 at 03:16:54PM +0100, Al Viro wrote: > On Wed, Sep 23, 2020 at 08:05:41AM +0200, Christoph Hellwig wrote: > > > +struct iovec *iovec_from_user(const struct iovec __user *uvec, > > + unsigned long nr_segs, unsigned long fast_segs, > > Hmm... For fast_segs unsigned long had always been ridiculous > (4G struct iovec on caller stack frame?), but that got me wondering about > nr_segs and I wish I'd thought of that when introducing import_iovec(). > > The thing is, import_iovec() takes unsigned int there. Which is fine > (hell, the maximal value that can be accepted in 1024), except that > we do pass unsigned long syscall argument to it in some places. > > E.g. vfs_readv() quietly truncates vlen to 32 bits, and vlen can > come unchanged through sys_readv() -> do_readv() -> vfs_readv(). > With unsigned long passed by syscall glue. > > AFAICS, passing 4G+1 as the third argument to readv(2) on 64bit box > will be quietly treated as 1 these days. Which would be fine, except > that before "switch {compat_,}do_readv_writev() to {compat_,}import_iovec()" > it used to fail with -EINVAL. > > Userland, BTW, describes readv(2) iovcnt as int; process_vm_readv(), > OTOH, has these counts unsigned long from the userland POV... > > I suppose we ought to switch import_iovec() to unsigned long for nr_segs ;-/ > Strictly speaking that had been a userland ABI change, even though nothing > except regression tests checking for expected errors would've been likely > to notice. And it looks like no regression tests covered that one... > > Linus, does that qualify for your "if no userland has noticed the change, > it's not a breakage"? Egads... We have sys_readv() with unsigned long for file descriptor, since 1.3.31 when it had been introduced. And originally it did comparison with NR_OPEN right in sys_readv(). Then in 2.1.60 it had been switched to fget(), which used to take unsigned long at that point. And in 2.1.90pre1 it went unsigned int, so non-zero upper 32 bits in readv(2) first argument ceased to cause EBADF... Of course, libc had it as int fd all along.
RE: [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
From: Al Viro > Sent: 23 September 2020 15:17 > > On Wed, Sep 23, 2020 at 08:05:41AM +0200, Christoph Hellwig wrote: > > > +struct iovec *iovec_from_user(const struct iovec __user *uvec, > > + unsigned long nr_segs, unsigned long fast_segs, > > Hmm... For fast_segs unsigned long had always been ridiculous > (4G struct iovec on caller stack frame?), but that got me wondering about > nr_segs and I wish I'd thought of that when introducing import_iovec(). > > The thing is, import_iovec() takes unsigned int there. Which is fine > (hell, the maximal value that can be accepted in 1024), except that > we do pass unsigned long syscall argument to it in some places. It will make diddly-squit difference. The parameters end up in registers on most calling conventions. Plausibly you get an extra 'REX' byte on x86 for the 64bit value. What you want to avoid is explicit sign/zero extension and value masking after arithmetic. On x86-64 the 'horrid' type is actually 'signed int'. It often needs sign extending to 64bits (eg when being used as an array subscript). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Mon, Sep 21 2020 at 21:27, Thomas Gleixner wrote: > On Mon, Sep 21 2020 at 09:24, Linus Torvalds wrote: >> On Mon, Sep 21, 2020 at 12:39 AM Thomas Gleixner wrote: >> Maybe we really *could* call this new kmap functionality something >> like "kmap_percpu()" (or maybe "local" is good enough), and make it >> act like your RT code does for spinlocks - not disable preemption, but >> only disabling CPU migration. > > I"m all for it, but the scheduler people have opinions :) I just took the latest version of migrate disable patches https://lore.kernel.org/r/20200921163557.234036...@infradead.org removed the RT dependency on top of them and adopted the kmap stuff (addressing the various comments while at it and unbreaking ARM). I'm not going to post that until there is consensus about the general availability of migrate disable, but for those who want to play with it I've pushed the resulting combo out to: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem For testing I've tweaked a few places to use the new _local() variants and it survived testing so far and I've verified that there is actual preemption which means zap/restore of the thread local kmaps. Thanks, tglx
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 03:25:49PM +0100, Al Viro wrote: > On Wed, Sep 23, 2020 at 08:05:43AM +0200, Christoph Hellwig wrote: > > COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd, > > - const struct compat_iovec __user *,vec, > > + const struct iovec __user *, vec, > > Um... Will it even compile? > > > #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64 > > COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd, > > - const struct compat_iovec __user *,vec, > > + const struct iovec __user *, vec, > > Ditto. Look into include/linux/compat.h and you'll see > > asmlinkage long compat_sys_preadv64(unsigned long fd, > const struct compat_iovec __user *vec, > unsigned long vlen, loff_t pos); > > How does that manage to avoid the compiler screaming bloody > murder? That's a very good question. But it does not just compile but actually works. Probably because all the syscall wrappers mean that we don't actually generate the normal names. I just tried this: --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -468,7 +468,7 @@ asmlinkage long sys_lseek(unsigned int fd, off_t offset, asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t count); asmlinkage long sys_write(unsigned int fd, const char __user *buf, size_t count); -asmlinkage long sys_readv(unsigned long fd, +asmlinkage long sys_readv(void *fd, for fun, and the compiler doesn't care either..
Re: [PATCH 5/9] fs: remove various compat readv/writev helpers
On Wed, Sep 23, 2020 at 08:05:43AM +0200, Christoph Hellwig wrote: > COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd, > - const struct compat_iovec __user *,vec, > + const struct iovec __user *, vec, Um... Will it even compile? > #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64 > COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd, > - const struct compat_iovec __user *,vec, > + const struct iovec __user *, vec, Ditto. Look into include/linux/compat.h and you'll see asmlinkage long compat_sys_preadv64(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen, loff_t pos); How does that manage to avoid the compiler screaming bloody murder?
Re: [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
On Wed, Sep 23, 2020 at 08:05:41AM +0200, Christoph Hellwig wrote: > +struct iovec *iovec_from_user(const struct iovec __user *uvec, > + unsigned long nr_segs, unsigned long fast_segs, Hmm... For fast_segs unsigned long had always been ridiculous (4G struct iovec on caller stack frame?), but that got me wondering about nr_segs and I wish I'd thought of that when introducing import_iovec(). The thing is, import_iovec() takes unsigned int there. Which is fine (hell, the maximal value that can be accepted in 1024), except that we do pass unsigned long syscall argument to it in some places. E.g. vfs_readv() quietly truncates vlen to 32 bits, and vlen can come unchanged through sys_readv() -> do_readv() -> vfs_readv(). With unsigned long passed by syscall glue. AFAICS, passing 4G+1 as the third argument to readv(2) on 64bit box will be quietly treated as 1 these days. Which would be fine, except that before "switch {compat_,}do_readv_writev() to {compat_,}import_iovec()" it used to fail with -EINVAL. Userland, BTW, describes readv(2) iovcnt as int; process_vm_readv(), OTOH, has these counts unsigned long from the userland POV... I suppose we ought to switch import_iovec() to unsigned long for nr_segs ;-/ Strictly speaking that had been a userland ABI change, even though nothing except regression tests checking for expected errors would've been likely to notice. And it looks like no regression tests covered that one... Linus, does that qualify for your "if no userland has noticed the change, it's not a breakage"?
Re: [PATCH v2] i2c: cpm: Fix i2c_ram structure
Acked-by: Jochen Friedrich Am 23.09.2020 um 16:08 schrieb nico.vi...@gmail.com: From: Nicolas VINCENT the i2c_ram structure is missing the sdmatmp field mentionned in datasheet for MPC8272 at paragraph 36.5. With this field missing, the hardware would write past the allocated memory done through cpm_muram_alloc for the i2c_ram structure and land in memory allocated for the buffers descriptors corrupting the cbd_bufaddr field. Since this field is only set during setup(), the first i2c transaction would work and the following would send data read from an arbitrary memory location. Signed-off-by: Nicolas VINCENT --- drivers/i2c/busses/i2c-cpm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c index 1213e1932ccb..24d584a1c9a7 100644 --- a/drivers/i2c/busses/i2c-cpm.c +++ b/drivers/i2c/busses/i2c-cpm.c @@ -65,6 +65,9 @@ struct i2c_ram { charres1[4];/* Reserved */ ushort rpbase; /* Relocation pointer */ charres2[2];/* Reserved */ + /* The following elements are only for CPM2 */ + charres3[4];/* Reserved */ + uintsdmatmp;/* Internal */ }; #define I2COM_START 0x80
Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask
On Tue, Sep 22, 2020 at 12:26:18PM +1000, Alexey Kardashevskiy wrote: > > Well, the original intent of dma_get_required_mask is to return the > > mask that the driver then uses to figure out what to set, so what aacraid > > does fits that use case. > > What was the original intent exactly? The driver asks for the minimum or > maximum DMA mask the platform supports? > > As for now, we (ppc64/powernv) can do: > 1. bypass (==64bit) > 2. a DMA window which used to be limited by 2GB but not anymore. > > I can understand if the driver asked for required mask in expectation to > receive "less or equal than 32bit" and "more than 32 bit" and choose. > And this probably was the intent as at the time when the bug was > introduced, the window was always smaller than 4GB. > > But today the window is bigger than than (44 bits now, or a similar > value, depends on max page order) so the returned mask is >32. Which > still enables that DAC in aacraid but I suspect this is accidental. I think for powernv returning 64-bit always would make a lot of sense. AFAIK all of powernv is PCIe and not legacy PCI, so returning anything less isn't going to help to optimize anything. > > Of course that idea is pretty bogus for > > PCIe devices. > > Why? From the PHB side, there are windows. From the device side, there > are many crippled devices, like, no GPU I saw in last years supported > more than 48bit. Yes, but dma_get_required_mask is misnamed - the mask is not required, it is the optimal mask. Even if the window is smaller we handle it some way, usually by using swiotlb, or by iommu tricks in your case. > > I suspect the right fix is to just not query dma_get_required_mask for > > PCIe devices in aacraid (and other drivers that do something similar). > > May be, if you write nice and big comment next to > dma_get_required_mask() explaining exactly what it does, then I will > realize I am getting this all wrong and we will move to fixing the > drivers :) Yes, it needs a comment or two, and probaby be renamed to dma_get_optimal_dma_mask, and a cleanup of most users. I've added it to my ever growing TODO list, but I would not be unhappy if someone else gives it a spin.
[PATCH v2] i2c: cpm: Fix i2c_ram structure
From: Nicolas VINCENT the i2c_ram structure is missing the sdmatmp field mentionned in datasheet for MPC8272 at paragraph 36.5. With this field missing, the hardware would write past the allocated memory done through cpm_muram_alloc for the i2c_ram structure and land in memory allocated for the buffers descriptors corrupting the cbd_bufaddr field. Since this field is only set during setup(), the first i2c transaction would work and the following would send data read from an arbitrary memory location. Signed-off-by: Nicolas VINCENT --- drivers/i2c/busses/i2c-cpm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c index 1213e1932ccb..24d584a1c9a7 100644 --- a/drivers/i2c/busses/i2c-cpm.c +++ b/drivers/i2c/busses/i2c-cpm.c @@ -65,6 +65,9 @@ struct i2c_ram { charres1[4];/* Reserved */ ushort rpbase; /* Relocation pointer */ charres2[2];/* Reserved */ + /* The following elements are only for CPM2 */ + charres3[4];/* Reserved */ + uintsdmatmp;/* Internal */ }; #define I2COM_START0x80 -- 2.17.1
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Wed, Sep 23 2020 at 10:40, peterz wrote: > Right, so I'm concerned. migrate_disable() wrecks pretty much all > Real-Time scheduler theory we have, and PREEMPRT_RT bringing it in is > somewhat ironic. It's even more ironic that the approach of PREEMPT_RT has been 'pragmatic ignorance of theory' from the very beginning and despite violating all theories it still works. :) > Yes, it allows breaking up non-preemptible regions of non-deterministic > duration, and thereby both reduce and bound the scheduling latency, the > cost for doing that is that the theory on CPU utilization/bandwidth go > out the window. I agree, that the theory goes out of the window, but does it matter in practice? I've yet to see a report of migrate disable stacking being the culprit of a missed deadline and I surely have stared at lots of reports in the past 10+ years. > To easily see this consider an SMP system with a number of tasks equal > to the number of CPUs. On a regular (preempt_disable) kernel we can > always run each task, by virtue of always having an idle CPU to take the > task. > > However, with migrate_disable() we can have each task preempted in a > migrate_disable() region, worse we can stack them all on the _same_ CPU > (super ridiculous odds, sure). And then we end up only able to run one > task, with the rest of the CPUs picking their nose. > > The end result is that, like with unbounded latency, we will still miss > our deadline, simply because we got starved for CPU. I'm well aware of that. > Now, while we could (with a _lot_ of work) rework the kernel to not rely > on the implicit per-cpu ness of things like spinlock_t, the moment we > bring in basic primitives that rely on migrate_disable() we're stuck > with it. Right, but we are stuck with per CPUness and distangling that is just infeasible IMO. > The problem is; afaict there's been no research into this problem. There is no research on a lot of things the kernel does :) > There might be scheduling (read: balancing) schemes that can > mitigate/solve this problem, or it might prove to be a 'hard' problem, > I just don't know. In practive balancing surely can take the number of preempted tasks which are in a migrate disable section into account which would be just another measure to work around the fact that the kernel is not adhering to the theories. It never did that even w/o migrate disable. > But once we start down this road, it's going to be hell to get rid of > it. Like most of the other things the kernel came up with to deal with the oddities of modern hardware :) > That's why I've been arguing (for many years) to strictly limit this to > PREEMPT_RT and only as a gap-stop, not a fundamental primitive to build > on. I know, but short of rewriting the world, I'm not seing the faintest plan to remove the stop gap. :) As we discussed not long ago we have too many inconsistent preemption models already. RT is adding yet another one. And that's worse than introducing migrate disable as a general available facility. IMO, reaching a point of consistency where our different preemption models (NONE, VOLUNTARY, PREEMPT. RT) build on each other is far more important. For !RT migrate disable is far less of an danger than for RT kernels because the amount of code which will use it is rather limited compared to code which still will disable preemption implicit through spin and rw locks. On RT converting these locks to 'sleepable spinlocks' is just possible because RT forces almost everything into task context and migrate disable is just the obvious decomposition of preempt disable which implicitely disables migration. But that means that RT is by orders of magnitude more prone to run into the scheduling trainwreck you are worried about. It just refuses to do so at least with real world work loads. I'm surely in favour of having solid theories behind implementation, but at some point you just have to bite the bullet and chose pragmatism in order to make progress. Proliferating inconsistency is not real progress, as it is violating the most fundamental engineering principles. That's by far more dangerous than violating scheduling theories which are built on perfect models and therefore enforce violation by practical implementations anyway. Thanks, tglx
Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
On Wed, Sep 23, 2020 at 11:01:34AM +0300, Pavel Begunkov wrote: > > I'm not following why that would be considered a valid option, > > as that clearly breaks existing users that update from a 32-bit > > kernel to a 64-bit one. > > Do you mean users who move 32-bit binaries (without recompiling) to a > new x64 kernel? Does the kernel guarantees that to work? Yes. No further (printable) comments for now...
Re: [PATCH] i2c: cpm: Fix i2c_ram structure
Le 23/09/2020 à 14:55, Vincent Nicolas a écrit : From: Christophe Leroy Sent: Wednesday, 23 September 2020 10:01 To: Vincent Nicolas; joc...@scram.de Cc: linuxppc-dev@lists.ozlabs.org; linux-...@vger.kernel.org Subject: Re: [PATCH] i2c: cpm: Fix i2c_ram structure Le 23/09/2020 à 09:18, Vincent Nicolas a écrit : From: Christophe Leroy Sent: Tuesday, 22 September 2020 14:38 To: Vincent Nicolas ; joc...@scram.de Cc: linuxppc-dev@lists.ozlabs.org ; linux-...@vger.kernel.org Subject: Re: [PATCH] i2c: cpm: Fix i2c_ram structure Le 22/09/2020 à 11:04, nico.vi...@gmail.com a écrit : From: Nicolas VINCENT the i2c_ram structure is missing the sdmatmp field mentionned in datasheet for MPC8272 at paragraph 36.5. With this field missing, the hardware would write past the allocated memory done through cpm_muram_alloc for the i2c_ram structure and land in memory allocated for the buffers descriptors corrupting the cbd_bufaddr field. Since this field is only set during setup(), the first i2c transaction would work and the following would send data read from an arbitrary memory location. Signed-off-by: Nicolas VINCENT --- drivers/i2c/busses/i2c-cpm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c index 1213e1932ccb..c5700addbf65 100644 --- a/drivers/i2c/busses/i2c-cpm.c +++ b/drivers/i2c/busses/i2c-cpm.c @@ -64,7 +64,8 @@ struct i2c_ram { uinttxtmp; /* Internal */ charres1[4];/* Reserved */ ushort rpbase; /* Relocation pointer */ - charres2[2];/* Reserved */ + charres2[6];/* Reserved */ + uintsdmatmp;/* Internal */ On CPM1, I2C param RAM has size 0x30 (offset 0x1c80-0x1caf) Your change overlaps the miscellaneous area that contains CP Microcode Revision Number, ref MPC885 Reference Manual §18.7.3 As far as I understand the mpc885 contains in the dts the compatible=fsl,cpm1-i2c which is used in cpm-i2c.c to either determine the address of the i2c_ram structure (cpm1), or dynamically allocate it with cpm_muram_alloc (cpm2). In the first case the structure will indeed overlaps with the miscellaneous section but since the sdmatmp is only used by cpm2 hardware it shall not be an issue. Please, let me know if I am mistaken. If the patch cannot be accepted as is, I would gladly accept pointers on how to address this kind of issue. Please use a mail client that properly sets the > in front of original/answered text. Here your mailer has mixed you text and mine, that's unusable on the long term. I changed my configuration, please let me know if there are still issues Yes it is OK now. Christophe
Re: [PATCH] i2c: cpm: Fix i2c_ram structure
> > From: Christophe Leroy > Sent: Wednesday, 23 September 2020 10:01 > To: Vincent Nicolas; joc...@scram.de > Cc: linuxppc-dev@lists.ozlabs.org; linux-...@vger.kernel.org > Subject: Re: [PATCH] i2c: cpm: Fix i2c_ram structure > > > > Le 23/09/2020 à 09:18, Vincent Nicolas a écrit : >> >> >> >> From: Christophe Leroy >> Sent: Tuesday, 22 September 2020 14:38 >> To: Vincent Nicolas ; joc...@scram.de >> >> Cc: linuxppc-dev@lists.ozlabs.org ; >> linux-...@vger.kernel.org >> Subject: Re: [PATCH] i2c: cpm: Fix i2c_ram structure >> >> >> >> Le 22/09/2020 à 11:04, nico.vi...@gmail.com a écrit : >>> From: Nicolas VINCENT >>> >>> the i2c_ram structure is missing the sdmatmp field mentionned in >>> datasheet for MPC8272 at paragraph 36.5. With this field missing, the >>> hardware would write past the allocated memory done through >>> cpm_muram_alloc for the i2c_ram structure and land in memory allocated >>> for the buffers descriptors corrupting the cbd_bufaddr field. Since this >>> field is only set during setup(), the first i2c transaction would work >>> and the following would send data read from an arbitrary memory >>> location. >>> >>> Signed-off-by: Nicolas VINCENT >>> --- >>> drivers/i2c/busses/i2c-cpm.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c >>> index 1213e1932ccb..c5700addbf65 100644 >>> --- a/drivers/i2c/busses/i2c-cpm.c >>> +++ b/drivers/i2c/busses/i2c-cpm.c >>> @@ -64,7 +64,8 @@ struct i2c_ram { >>> uinttxtmp; /* Internal */ >>> charres1[4];/* Reserved */ >>> ushort rpbase; /* Relocation pointer */ >>> - charres2[2];/* Reserved */ >>> + charres2[6];/* Reserved */ >>> + uintsdmatmp;/* Internal */ >> >> On CPM1, I2C param RAM has size 0x30 (offset 0x1c80-0x1caf) >> >> Your change overlaps the miscellaneous area that contains CP Microcode >> Revision Number, ref MPC885 Reference Manual §18.7.3 >> >> As far as I understand the mpc885 contains in the dts the >> compatible=fsl,cpm1-i2c which is used in cpm-i2c.c to either determine the >> address of the i2c_ram structure (cpm1), or dynamically allocate it with >> cpm_muram_alloc (cpm2). >> In the first case the structure will indeed overlaps with the miscellaneous >> section but since the sdmatmp is only used by cpm2 hardware it shall not be >> an issue. >> >> Please, let me know if I am mistaken. If the patch cannot be accepted as is, >> I would gladly accept pointers on how to address this kind of issue. > > > Please use a mail client that properly sets the > in front of > original/answered text. Here your mailer has mixed you text and mine, > that's unusable on the long term. I changed my configuration, please let me know if there are still issues > > > I think you are right on the fact that it doesn't seem to be an issue. > Nevertheless, that's confusing. > > What I would suggest is to leave res2[2] as is, and add something like: > > /* The following elements are only for CPM2 */ > char res3[4]; /* Reserved */ > uint sdmatmp; /* Internal */ I'll repost the patch like this then. > > > Other solution (not sure that's the best solution thought) would be to > do as in spi-fsl-cpm: use iic_t structure from asm/cpm1.h when > CONFIG_CPM1 is selected and use iic_t from asm/cpm2.h when CONFIG_CPM2 > is selected, taking into account that CONFIG_CPM1 and CONFIG_CPM2 are > mutually exclusive at the time being. Unless someone argues for this solution I will go with the first one. Thank again for your time and quick responses. Nicolas.
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Wed, Sep 23 2020 at 12:19, peterz wrote: > On Mon, Sep 21, 2020 at 09:27:57PM +0200, Thomas Gleixner wrote: >> Alternatively this could of course be solved with per CPU page tables >> which will come around some day anyway I fear. > > Previously (with PTI) we looked at making the entire kernel map per-CPU, > and that takes a 2K copy on switch_mm() (or more general, the user part > of whatever the top level directory is for architectures that have a > shared kernel/user page-table setup in the first place). > > The idea was having a fixed per-cpu kernel page-table, share a bunch of > (kernel) page-tables between all CPUs and then copy in the user part on > switch. > > I've forgotten what the plan was for ASID/PCID in that scheme. > > For x86_64 we've been fearing the performance of that 2k copy, but I > don't think we've ever actually bit the bullet and implemented it to see > how bad it really is. I actually did at some point and depending on the workload the overhead was clearly measurable. And yes, it fell apart with PCID and I could not come up with a scheme for it which did not suck horribly. So I burried the patches in the poison cabinet. Aside of that, we'd need to implement that for a eight other architectures as well... Thanks, tglx
Re: [PATCH] i2c: cpm: Fix i2c_ram structure
From: Christophe Leroy Sent: Tuesday, 22 September 2020 14:38 To: Vincent Nicolas ; joc...@scram.de Cc: linuxppc-dev@lists.ozlabs.org ; linux-...@vger.kernel.org Subject: Re: [PATCH] i2c: cpm: Fix i2c_ram structure Le 22/09/2020 à 11:04, nico.vi...@gmail.com a écrit : > From: Nicolas VINCENT > > the i2c_ram structure is missing the sdmatmp field mentionned in > datasheet for MPC8272 at paragraph 36.5. With this field missing, the > hardware would write past the allocated memory done through > cpm_muram_alloc for the i2c_ram structure and land in memory allocated > for the buffers descriptors corrupting the cbd_bufaddr field. Since this > field is only set during setup(), the first i2c transaction would work > and the following would send data read from an arbitrary memory > location. > > Signed-off-by: Nicolas VINCENT > --- > drivers/i2c/busses/i2c-cpm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c > index 1213e1932ccb..c5700addbf65 100644 > --- a/drivers/i2c/busses/i2c-cpm.c > +++ b/drivers/i2c/busses/i2c-cpm.c > @@ -64,7 +64,8 @@ struct i2c_ram { > uint txtmp; /* Internal */ > char res1[4]; /* Reserved */ > ushort rpbase; /* Relocation pointer */ > - char res2[2]; /* Reserved */ > + char res2[6]; /* Reserved */ > + uint sdmatmp; /* Internal */ On CPM1, I2C param RAM has size 0x30 (offset 0x1c80-0x1caf) Your change overlaps the miscellaneous area that contains CP Microcode Revision Number, ref MPC885 Reference Manual §18.7.3 As far as I understand the mpc885 contains in the dts the compatible=fsl,cpm1-i2c which is used in cpm-i2c.c to either determine the address of the i2c_ram structure (cpm1), or dynamically allocate it with cpm_muram_alloc (cpm2). In the first case the structure will indeed overlaps with the miscellaneous section but since the sdmatmp is only used by cpm2 hardware it shall not be an issue. Please, let me know if I am mistaken. If the patch cannot be accepted as is, I would gladly accept pointers on how to address this kind of issue. Thanks, Nicolas. > }; > > #define I2COM_START 0x80 > Christophe
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Mon, Sep 21, 2020 at 09:27:57PM +0200, Thomas Gleixner wrote: > Alternatively this could of course be solved with per CPU page tables > which will come around some day anyway I fear. Previously (with PTI) we looked at making the entire kernel map per-CPU, and that takes a 2K copy on switch_mm() (or more general, the user part of whatever the top level directory is for architectures that have a shared kernel/user page-table setup in the first place). The idea was having a fixed per-cpu kernel page-table, share a bunch of (kernel) page-tables between all CPUs and then copy in the user part on switch. I've forgotten what the plan was for ASID/PCID in that scheme. For x86_64 we've been fearing the performance of that 2k copy, but I don't think we've ever actually bit the bullet and implemented it to see how bad it really is.
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Mon, Sep 21, 2020 at 09:27:57PM +0200, Thomas Gleixner wrote: > On Mon, Sep 21 2020 at 09:24, Linus Torvalds wrote: > > On Mon, Sep 21, 2020 at 12:39 AM Thomas Gleixner wrote: > >> > >> If a task is migrated to a different CPU then the mapping address will > >> change which will explode in colourful ways. > > > > Right you are. > > > > Maybe we really *could* call this new kmap functionality something > > like "kmap_percpu()" (or maybe "local" is good enough), and make it > > act like your RT code does for spinlocks - not disable preemption, but > > only disabling CPU migration. > > I"m all for it, but the scheduler people have opinions :) Right, so I'm concerned. migrate_disable() wrecks pretty much all Real-Time scheduler theory we have, and PREEMPRT_RT bringing it in is somewhat ironic. Yes, it allows breaking up non-preemptible regions of non-deterministic duration, and thereby both reduce and bound the scheduling latency, the cost for doing that is that the theory on CPU utilization/bandwidth go out the window. To easily see this consider an SMP system with a number of tasks equal to the number of CPUs. On a regular (preempt_disable) kernel we can always run each task, by virtue of always having an idle CPU to take the task. However, with migrate_disable() we can have each task preempted in a migrate_disable() region, worse we can stack them all on the _same_ CPU (super ridiculous odds, sure). And then we end up only able to run one task, with the rest of the CPUs picking their nose. The end result is that, like with unbounded latency, we will still miss our deadline, simply because we got starved for CPU. Now, while we could (with a _lot_ of work) rework the kernel to not rely on the implicit per-cpu ness of things like spinlock_t, the moment we bring in basic primitives that rely on migrate_disable() we're stuck with it. The problem is; afaict there's been no research into this problem. There might be scheduling (read: balancing) schemes that can mitigate/solve this problem, or it might prove to be a 'hard' problem, I just don't know. But once we start down this road, it's going to be hell to get rid of it. That's why I've been arguing (for many years) to strictly limit this to PREEMPT_RT and only as a gap-stop, not a fundamental primitive to build on.
Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
On 22/09/2020 12:01, Arnd Bergmann wrote: > On Tue, Sep 22, 2020 at 9:59 AM Pavel Begunkov wrote: >> On 22/09/2020 10:23, Arnd Bergmann wrote: >>> On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov >>> wrote: On 22/09/2020 03:58, Andy Lutomirski wrote: > On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov > wrote: > I may be looking at a different kernel than you, but aren't you > preventing creating an io_uring regardless of whether SQPOLL is > requested? I diffed a not-saved file on a sleepy head, thanks for noticing. As you said, there should be an SQPOLL check. ... if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL)) goto err; >>> >>> Wouldn't that mean that now 32-bit containers behave differently >>> between compat and native execution? >>> >>> I think if you want to prevent 32-bit applications from using SQPOLL, >>> it needs to be done the same way on both to be consistent: >> >> The intention was to disable only compat not native 32-bit. > > I'm not following why that would be considered a valid option, > as that clearly breaks existing users that update from a 32-bit > kernel to a 64-bit one. Do you mean users who move 32-bit binaries (without recompiling) to a new x64 kernel? Does the kernel guarantees that to work? I'd personally care more native-bitness apps. > > Taking away the features from users that are still on 32-bit kernels > already seems questionable to me, but being inconsistent > about it seems much worse, in particular when the regression > is on the upgrade path. TBH, this won't fix that entirely (e.g. passing non-compat io_uring to a compat process should yield the same problem). So, let's put it aside for now until this bikeshedding would be relevant. > >>> Can we expect all existing and future user space to have a sane >>> fallback when IORING_SETUP_SQPOLL fails? >> >> SQPOLL has a few differences with non-SQPOLL modes, but it's easy >> to convert between them. Anyway, SQPOLL is a privileged special >> case that's here for performance/latency reasons, I don't think >> there will be any non-accidental users of it. > > Ok, so the behavior of 32-bit tasks would be the same as running > the same application as unprivileged 64-bit tasks, with applications Yes, something like that, but that's not automatic and in some (hopefully rare) cases there may be pitfalls. That's in short, I can expand the idea a bit if anyone would be interested. > already having to implement that fallback, right? Well, not everyone _have_ to implement such a fallback, e.g. applications working only whilst privileged may use SQPOLL only. -- Pavel Begunkov
Re: [PATCH] i2c: cpm: Fix i2c_ram structure
Le 23/09/2020 à 09:18, Vincent Nicolas a écrit : From: Christophe Leroy Sent: Tuesday, 22 September 2020 14:38 To: Vincent Nicolas ; joc...@scram.de Cc: linuxppc-dev@lists.ozlabs.org ; linux-...@vger.kernel.org Subject: Re: [PATCH] i2c: cpm: Fix i2c_ram structure Le 22/09/2020 à 11:04, nico.vi...@gmail.com a écrit : From: Nicolas VINCENT the i2c_ram structure is missing the sdmatmp field mentionned in datasheet for MPC8272 at paragraph 36.5. With this field missing, the hardware would write past the allocated memory done through cpm_muram_alloc for the i2c_ram structure and land in memory allocated for the buffers descriptors corrupting the cbd_bufaddr field. Since this field is only set during setup(), the first i2c transaction would work and the following would send data read from an arbitrary memory location. Signed-off-by: Nicolas VINCENT --- drivers/i2c/busses/i2c-cpm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c index 1213e1932ccb..c5700addbf65 100644 --- a/drivers/i2c/busses/i2c-cpm.c +++ b/drivers/i2c/busses/i2c-cpm.c @@ -64,7 +64,8 @@ struct i2c_ram { uint txtmp; /* Internal */ char res1[4]; /* Reserved */ ushort rpbase; /* Relocation pointer */ - char res2[2]; /* Reserved */ + char res2[6]; /* Reserved */ + uint sdmatmp; /* Internal */ On CPM1, I2C param RAM has size 0x30 (offset 0x1c80-0x1caf) Your change overlaps the miscellaneous area that contains CP Microcode Revision Number, ref MPC885 Reference Manual §18.7.3 As far as I understand the mpc885 contains in the dts the compatible=fsl,cpm1-i2c which is used in cpm-i2c.c to either determine the address of the i2c_ram structure (cpm1), or dynamically allocate it with cpm_muram_alloc (cpm2). In the first case the structure will indeed overlaps with the miscellaneous section but since the sdmatmp is only used by cpm2 hardware it shall not be an issue. Please, let me know if I am mistaken. If the patch cannot be accepted as is, I would gladly accept pointers on how to address this kind of issue. Please use a mail client that properly sets the > in front of original/answered text. Here your mailer has mixed you text and mine, that's unusable on the long term. I think you are right on the fact that it doesn't seem to be an issue. Nevertheless, that's confusing. What I would suggest is to leave res2[2] as is, and add something like: /* The following elements are only for CPM2 */ char res3[4]; /* Reserved */ uint sdmatmp; /* Internal */ Other solution (not sure that's the best solution thought) would be to do as in spi-fsl-cpm: use iic_t structure from asm/cpm1.h when CONFIG_CPM1 is selected and use iic_t from asm/cpm2.h when CONFIG_CPM2 is selected, taking into account that CONFIG_CPM1 and CONFIG_CPM2 are mutually exclusive at the time being. Christophe
[PATCH v3] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
When a passthrough IO adapter is removed from a pseries machine using hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS to clear all page table entries related to the adapter. If some are still present, the RTAS call which isolates the PCI slot returns error 9001 "valid outstanding translations" and the removal of the IO adapter fails. This is because when the PHBs are scanned, Linux maps automatically the INTx interrupts in the Linux interrupt number space but these are never removed. To solve this problem, we introduce a PPC platform specific pcibios_remove_bus() routine which clears all interrupt mappings when the bus is removed. This also clears the associated page table entries of the ESB pages when using XIVE. For this purpose, we record the logical interrupt numbers of the mapped interrupt under the PHB structure and let pcibios_remove_bus() do the clean up. Since some PCI adapters, like GPUs, use the "interrupt-map" property to describe interrupt mappings other than the legacy INTx interrupts, we can not restrict the size of the mapping array to PCI_NUM_INTX. The number of interrupt mappings is computed from the "interrupt-map" property and the mapping array is allocated accordingly. Cc: "Oliver O'Halloran" Cc: Alexey Kardashevskiy Reviewed-by: Alexey Kardashevskiy Signed-off-by: Cédric Le Goater --- Changes in v3 : - NULLified 'irq_map' in pci_irq_map_dispose() arch/powerpc/include/asm/pci-bridge.h | 6 ++ arch/powerpc/kernel/pci-common.c | 115 ++ 2 files changed, 121 insertions(+) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index d2a2a14e56f9..d21e070352dc 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -48,6 +48,9 @@ struct pci_controller_ops { /* * Structure of a PCI controller (host bridge) + * + * @irq_count: number of interrupt mappings + * @irq_map: interrupt mappings */ struct pci_controller { struct pci_bus *bus; @@ -127,6 +130,9 @@ struct pci_controller { void *private_data; struct npu *npu; + + unsigned int irq_count; + unsigned int *irq_map; }; /* These are used for config access before all the PCI probing diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index be108616a721..fb492de6902e 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -353,6 +353,116 @@ struct pci_controller *pci_find_controller_for_domain(int domain_nr) return NULL; } +/* + * Assumption is made on the interrupt parent. All interrupt-map + * entries are considered to have the same parent. + */ +static int pcibios_irq_map_count(struct pci_controller *phb) +{ + const __be32 *imap; + int imaplen; + struct device_node *parent; + u32 intsize, addrsize, parintsize, paraddrsize; + + if (of_property_read_u32(phb->dn, "#interrupt-cells", )) + return 0; + if (of_property_read_u32(phb->dn, "#address-cells", )) + return 0; + + imap = of_get_property(phb->dn, "interrupt-map", ); + if (!imap) { + pr_debug("%pOF : no interrupt-map\n", phb->dn); + return 0; + } + imaplen /= sizeof(u32); + pr_debug("%pOF : imaplen=%d\n", phb->dn, imaplen); + + if (imaplen < (addrsize + intsize + 1)) + return 0; + + imap += intsize + addrsize; + parent = of_find_node_by_phandle(be32_to_cpup(imap)); + if (!parent) { + pr_debug("%pOF : no imap parent found !\n", phb->dn); + return 0; + } + + if (of_property_read_u32(parent, "#interrupt-cells", )) { + pr_debug("%pOF : parent lacks #interrupt-cells!\n", phb->dn); + return 0; + } + + if (of_property_read_u32(parent, "#address-cells", )) + paraddrsize = 0; + + return imaplen / (addrsize + intsize + 1 + paraddrsize + parintsize); +} + +static void pcibios_irq_map_init(struct pci_controller *phb) +{ + phb->irq_count = pcibios_irq_map_count(phb); + if (phb->irq_count < PCI_NUM_INTX) + phb->irq_count = PCI_NUM_INTX; + + pr_debug("%pOF : interrupt map #%d\n", phb->dn, phb->irq_count); + + phb->irq_map = kcalloc(phb->irq_count, sizeof(unsigned int), + GFP_KERNEL); +} + +static void pci_irq_map_register(struct pci_dev *pdev, unsigned int virq) +{ + struct pci_controller *phb = pci_bus_to_host(pdev->bus); + int i; + + if (!phb->irq_map) + return; + + for (i = 0; i < phb->irq_count; i++) { + /* +* Look for an empty or an equivalent slot, as INTx +* interrupts can be shared between adapters. +*/ + if (phb->irq_map[i] == virq || !phb->irq_map[i]) { +
[PATCH -next v2] powerpc/perf: Fix symbol undeclared warning
Build kernel with `C=2`: arch/powerpc/perf/isa207-common.c:24:18: warning: symbol 'isa207_pmu_format_attr' was not declared. Should it be static? arch/powerpc/perf/power9-pmu.c:101:5: warning: symbol 'p9_dd21_bl_ev' was not declared. Should it be static? arch/powerpc/perf/power9-pmu.c:115:5: warning: symbol 'p9_dd22_bl_ev' was not declared. Should it be static? Those symbols are used only in the files that define them so we declare them as static to fix the warnings. Signed-off-by: Wang Wensheng --- arch/powerpc/perf/isa207-common.c | 2 +- arch/powerpc/perf/power9-pmu.c| 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c index 964437adec18..85dc860b265b 100644 --- a/arch/powerpc/perf/isa207-common.c +++ b/arch/powerpc/perf/isa207-common.c @@ -21,7 +21,7 @@ PMU_FORMAT_ATTR(thresh_stop, "config:32-35"); PMU_FORMAT_ATTR(thresh_start, "config:36-39"); PMU_FORMAT_ATTR(thresh_cmp,"config:40-49"); -struct attribute *isa207_pmu_format_attr[] = { +static struct attribute *isa207_pmu_format_attr[] = { _attr_event.attr, _attr_pmcxsel.attr, _attr_mark.attr, diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c index 2a57e93a79dc..4a315fad1f99 100644 --- a/arch/powerpc/perf/power9-pmu.c +++ b/arch/powerpc/perf/power9-pmu.c @@ -98,7 +98,7 @@ extern u64 PERF_REG_EXTENDED_MASK; /* PowerISA v2.07 format attribute structure*/ extern struct attribute_group isa207_pmu_format_group; -int p9_dd21_bl_ev[] = { +static int p9_dd21_bl_ev[] = { PM_MRK_ST_DONE_L2, PM_RADIX_PWC_L1_HIT, PM_FLOP_CMPL, @@ -112,7 +112,7 @@ int p9_dd21_bl_ev[] = { PM_DISP_HELD_SYNC_HOLD, }; -int p9_dd22_bl_ev[] = { +static int p9_dd22_bl_ev[] = { PM_DTLB_MISS_16G, PM_DERAT_MISS_2M, PM_DTLB_MISS_2M, -- 2.25.0
Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
On 9/23/20 2:33 AM, Qian Cai wrote: > On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote: >> When a passthrough IO adapter is removed from a pseries machine using >> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the >> guest OS to clear all page table entries related to the adapter. If >> some are still present, the RTAS call which isolates the PCI slot >> returns error 9001 "valid outstanding translations" and the removal of >> the IO adapter fails. This is because when the PHBs are scanned, Linux >> maps automatically the INTx interrupts in the Linux interrupt number >> space but these are never removed. >> >> To solve this problem, we introduce a PPC platform specific >> pcibios_remove_bus() routine which clears all interrupt mappings when >> the bus is removed. This also clears the associated page table entries >> of the ESB pages when using XIVE. >> >> For this purpose, we record the logical interrupt numbers of the >> mapped interrupt under the PHB structure and let pcibios_remove_bus() >> do the clean up. >> >> Since some PCI adapters, like GPUs, use the "interrupt-map" property >> to describe interrupt mappings other than the legacy INTx interrupts, >> we can not restrict the size of the mapping array to PCI_NUM_INTX. The >> number of interrupt mappings is computed from the "interrupt-map" >> property and the mapping array is allocated accordingly. >> >> Cc: "Oliver O'Halloran" >> Cc: Alexey Kardashevskiy >> Signed-off-by: Cédric Le Goater > > Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed > to > this patch. > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config OK. The patch is missing a NULL assignement after kfree() and that might be the issue. I did try PHB removal under PowerNV, so I would like to understand how we managed to remove twice the PCI bus and possibly reproduce. Any chance we could grab what the syscall fuzzer (syzkaller) did ? Thanks, C. > > [ 3574.564109][ T965] ata1.00: disabled > [ 3574.580373][T151472] sd 0:0:0:0: [sdb] Synchronizing SCSI cache > [ 3574.581180][T151472] sd 0:0:0:0: [sdb] Synchronize Cache(10) failed: > Result: hostbyte=0x04 driverbyte=0x00 > [ 3574.581226][T151472] sd 0:0:0:0: [sdb] Stopping disk > [ 3574.581289][T151472] sd 0:0:0:0: [sdb] Start/Stop Unit failed: Result: > hostbyte=0x04 driverbyte=0x00 > [ 3574.611424][ T3019] Read-error on swap-device (254:1:849792) > [ 3574.611685][ T3019] Read-error on swap-device (254:1:914944) > [ 3574.611769][ T3019] Read-error on swap-device (254:1:915072) > [ 3574.611838][ T3019] Read-error on swap-device (254:1:915200) > [ 3574.611926][ T3019] Read-error on swap-device (254:1:915328) > [ 3574.612268][ T3084] Read-error on swap-device (254:1:792576) > [ 3574.612342][ T3084] Read-error on swap-device (254:1:792704) > [ 3574.612757][ T2362] Read-error on swap-device (254:1:957440) > [ 3574.612773][ T2905] Read-error on swap-device (254:1:784128) > [ 3574.613015][ T2362] Read-error on swap-device (254:1:957568) > [ 3574.613160][ T2905] Read-error on swap-device (254:1:784256) > [ 3574.613241][ T2362] Read-error on swap-device (254:1:957696) > [ 3574.613342][ T2362] Read-error on swap-device (254:1:957824) > [ 3574.614448][ T3019] Core dump to |/usr/lib/systemd/systemd-coredump pipe > failed > [ 3574.614663][ T3019] Read-error on swap-device (254:1:961536) > [ 3574.675330][T151844] Read-error on swap-device (254:1:128) > [ 3574.675515][T151844] Read-error on swap-device (254:1:256) > [ 3574.675700][T151844] Read-error on swap-device (254:1:384) > [ 3574.703570][ T971] ata2.00: disabled > [ 3574.710393][T151472] sd 1:0:0:0: [sda] Synchronizing SCSI cache > [ 3574.710864][T151472] sd 1:0:0:0: [sda] Synchronize Cache(10) failed: > Result: hostbyte=0x04 driverbyte=0x00 > [ 3574.710922][T151472] sd 1:0:0:0: [sda] Stopping disk > [ 3574.711010][T151472] sd 1:0:0:0: [sda] Start/Stop Unit failed: Result: > hostbyte=0x04 driverbyte=0x00 > [ 3574.826569][ T674] dm-0: writeback error on inode 68507862, offset 65536, > sector 54281504 > [ 3575.117547][ T3366] dm-0: writeback error on inode 68507851, offset 0, > sector 54378880 > [ 3575.140104][T151472] pci 0004:03:00.0: Removing from iommu group 3 > [ 3575.141778][T151472] pci 0004:03 : [PE# fb] Releasing PE > [ 3575.141965][T151472] pci 0004:03 : [PE# fb] Removing DMA window #0 > [ 3575.142452][T151472] pci 0004:03 : [PE# fb] Disabling 64-bit DMA bypass > [ 3575.149369][T151472] pci_bus 0004:03: busn_res: [bus 03] is released > [ 3575.150574][T152037] Read-error on swap-device (254:1:35584) > [ 3575.150713][T152037] Read-error on swap-device (254:1:35712) > [ 3575.152632][T152037] Read-error on swap-device (254:1:915584) > [ 3575.152706][T151472] pci_bus 0004:04: busn_res: [bus 04-08] is released > [ 3575.152983][T151472] > = > [ 3575.153937][T151472] BUG kmalloc-16 (Not tainted): Object
RE: [PATCH 1/5] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr1-alt-addr' property
Hi Rob, On Wednesday, September 23, 2020 10:33 AM, Rob Herring wrote: > > On Wed, Sep 16, 2020 at 04:18:27PM +0800, Ran Wang wrote: > > From: Biwen Li > > > > The 'fsl,ippdexpcr1-alt-addr' property is used to handle an errata > > A-008646 on LS1021A > > > > Signed-off-by: Biwen Li > > Signed-off-by: Ran Wang > > --- > > Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 19 > > +++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt > > index 5a33619..1be58a3 100644 > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt > > @@ -34,6 +34,11 @@ Chassis Version Example Chips > > Optional properties: > > - little-endian : RCPM register block is Little Endian. Without it RCPM > > will be Big Endian (default case). > > + - fsl,ippdexpcr1-alt-addr : The property is related to a hardware issue > > + on SoC LS1021A and only needed on SoC LS1021A. > > + Must include 2 entries: > > + The first entry must be a link to the SCFG device node. > > + The 2nd entry must be offset of register IPPDEXPCR1 in SCFG. > > You don't need a DT change for this. You can find SCFG node by its compatible > string and then the offset should be known given this issue is only on 1 SoC. Did you mean that RCPM driver just to access IPPDEXPCR1 shadowed register in SCFG directly without fetching it's offset info. from DT? Regards, Ran
[PATCH v2 5/5] arm: dts: ls1021a: fix rcpm failed to claim resource
The range of dcfg reg is wrong, which overlap with other device, such as rcpm. This issue causing rcpm driver failed to claim reg resource when calling devm_ioremap_resource(). Signed-off-by: Ran Wang Acked-by: Li Yang --- Change in v2: - None arch/arm/boot/dts/ls1021a.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi index e372630f..286c547 100644 --- a/arch/arm/boot/dts/ls1021a.dtsi +++ b/arch/arm/boot/dts/ls1021a.dtsi @@ -173,7 +173,7 @@ dcfg: dcfg@1ee { compatible = "fsl,ls1021a-dcfg", "syscon"; - reg = <0x0 0x1ee 0x0 0x1>; + reg = <0x0 0x1ee 0x0 0x1000>; big-endian; }; -- 2.7.4
[PATCH v2 4/5] arm: dts: ls1021a: fix flextimer failed to wake system
The data of property 'fsl,rcpm-wakeup' is not corrcet, which causing RCPM driver incorrectly program register IPPDEXPCR1, then flextimer is wrongly clock gated during system suspend, can't send interrupt to wake. Signed-off-by: Ran Wang Acked-by: Li Yang --- Change in v2: - None arch/arm/boot/dts/ls1021a.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi index 089fe86..e372630f 100644 --- a/arch/arm/boot/dts/ls1021a.dtsi +++ b/arch/arm/boot/dts/ls1021a.dtsi @@ -1014,7 +1014,7 @@ compatible = "fsl,ls1021a-ftm-alarm"; reg = <0x0 0x29d 0x0 0x1>; reg-names = "ftm"; - fsl,rcpm-wakeup = < 0x2 0x0>; + fsl,rcpm-wakeup = < 0x0 0x2000>; interrupts = ; big-endian; }; -- 2.7.4
[PATCH v2 3/5] arm: dts: ls1021a: enable RCPM workaround for erratum A-008646
From: Biwen Li The patch fixes a bug that FlexTimer cannot wakeup system in deep sleep. Signed-off-by: Biwen Li Signed-off-by: Ran Wang --- Change in v2: - Change subject of commit message to be consistent with other related patches. arch/arm/boot/dts/ls1021a.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi index 827373e..089fe86 100644 --- a/arch/arm/boot/dts/ls1021a.dtsi +++ b/arch/arm/boot/dts/ls1021a.dtsi @@ -1007,6 +1007,7 @@ compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-2.1+"; reg = <0x0 0x1ee2140 0x0 0x8>; #fsl,rcpm-wakeup-cells = <2>; + fsl,ippdexpcr1-alt-addr = < 0x51c>; }; ftm_alarm0: timer0@29d { -- 2.7.4
[PATCH v2 1/5] Documentation: dt: binding: fsl: Add 'fsl, ippdexpcr1-alt-addr' property
From: Biwen Li The 'fsl,ippdexpcr1-alt-addr' property is used to handle an errata A-008646 on LS1021A Signed-off-by: Biwen Li Signed-off-by: Ran Wang --- Change in v2: - None Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 19 +++ 1 file changed, 19 insertions(+) diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt index 5a33619..1be58a3 100644 --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt @@ -34,6 +34,11 @@ Chassis Version Example Chips Optional properties: - little-endian : RCPM register block is Little Endian. Without it RCPM will be Big Endian (default case). + - fsl,ippdexpcr1-alt-addr : The property is related to a hardware issue + on SoC LS1021A and only needed on SoC LS1021A. + Must include 2 entries: + The first entry must be a link to the SCFG device node. + The 2nd entry must be offset of register IPPDEXPCR1 in SCFG. Example: The RCPM node for T4240: @@ -43,6 +48,20 @@ The RCPM node for T4240: #fsl,rcpm-wakeup-cells = <2>; }; +The RCPM node for LS1021A: + rcpm: rcpm@1ee2140 { + compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-2.1+"; + reg = <0x0 0x1ee2140 0x0 0x8>; + #fsl,rcpm-wakeup-cells = <2>; + + /* +* The second and third entry compose an alt offset +* address for IPPDEXPCR1(SCFG_SPARECR8) +*/ + fsl,ippdexpcr1-alt-addr = < 0x51c>; + }; + + * Freescale RCPM Wakeup Source Device Tree Bindings --- Required fsl,rcpm-wakeup property should be added to a device node if the device -- 2.7.4
[PATCH v2 2/5] soc: fsl: handle RCPM errata A-008646 on SoC LS1021A
From: Biwen Li Hardware issue: - Reading register RCPM_IPPDEXPCR1 always return zero, this causes system firmware could not get correct information and wrongly do clock gating for all wakeup source IP during system suspend. Then those IPs will never get chance to wake system. Workaround: - Duplicate register RCPM_IPPDEXPCR1's setting to register SCFG_SPARECR8 to allow system firmware's psci method read it and do things accordingly. Signed-off-by: Biwen Li Signed-off-by: Ran Wang --- Change in v2: - Update commit message to be more clear. - Replace device_property_read_u32_array() with syscon_regmap_lookup_by_phandle_args() to make code simpler. drivers/soc/fsl/rcpm.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c index a093dbe..6e3 100644 --- a/drivers/soc/fsl/rcpm.c +++ b/drivers/soc/fsl/rcpm.c @@ -2,7 +2,7 @@ // // rcpm.c - Freescale QorIQ RCPM driver // -// Copyright 2019 NXP +// Copyright 2019-2020 NXP // // Author: Ran Wang @@ -13,6 +13,9 @@ #include #include #include +#include +#include +#include #define RCPM_WAKEUP_CELL_MAX_SIZE 7 @@ -37,6 +40,9 @@ static int rcpm_pm_prepare(struct device *dev) struct device_node *np = dev->of_node; u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1]; u32 setting[RCPM_WAKEUP_CELL_MAX_SIZE] = {0}; + struct regmap *scfg_addr_regmap = NULL; + u32 reg_offset = 0; + u32 reg_value = 0; rcpm = dev_get_drvdata(dev); if (!rcpm) @@ -90,6 +96,29 @@ static int rcpm_pm_prepare(struct device *dev) tmp |= ioread32be(address); iowrite32be(tmp, address); } + /* +* Workaround of errata A-008646 on SoC LS1021A: +* There is a bug of register ippdexpcr1. +* Reading configuration register RCPM_IPPDEXPCR1 +* always return zero. So save ippdexpcr1's value +* to register SCFG_SPARECR8.And the value of +* ippdexpcr1 will be read from SCFG_SPARECR8. +*/ + if (dev_of_node(dev) && (i == 1)) { + scfg_addr_regmap = syscon_regmap_lookup_by_phandle_args(np, + "fsl,ippdexpcr1-alt-addr", + 1, + _offset); + if (!IS_ERR_OR_NULL(scfg_addr_regmap)) { + /* Update value on register SCFG_SPARECR8 */ + regmap_read(scfg_addr_regmap, + reg_offset, + _value); + regmap_write(scfg_addr_regmap, + reg_offset, + tmp | reg_value); + } + } } return 0; -- 2.7.4
[PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec
Use in compat_syscall to import either native or the compat iovecs, and remove the now superflous compat_import_iovec. This removes the need for special compat logic in most callers, and the remaining ones can still be simplified by using __import_iovec with a bool compat parameter. Signed-off-by: Christoph Hellwig --- block/scsi_ioctl.c | 12 ++-- drivers/scsi/sg.c | 9 + fs/aio.c | 8 ++-- fs/io_uring.c | 20 fs/read_write.c| 6 -- fs/splice.c| 2 +- include/linux/uio.h| 8 lib/iov_iter.c | 14 ++ mm/process_vm_access.c | 3 ++- net/compat.c | 4 ++-- security/keys/compat.c | 5 ++--- 11 files changed, 26 insertions(+), 65 deletions(-) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index ef722f04f88a93..e08df86866ee5d 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk, struct iov_iter i; struct iovec *iov = NULL; -#ifdef CONFIG_COMPAT - if (in_compat_syscall()) - ret = compat_import_iovec(rq_data_dir(rq), - hdr->dxferp, hdr->iovec_count, - 0, , ); - else -#endif - ret = import_iovec(rq_data_dir(rq), - hdr->dxferp, hdr->iovec_count, - 0, , ); + ret = import_iovec(rq_data_dir(rq), hdr->dxferp, + hdr->iovec_count, 0, , ); if (ret < 0) goto out_free_cdb; diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 20472aaaf630a4..bfa8d77322d732 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd) struct iovec *iov = NULL; struct iov_iter i; -#ifdef CONFIG_COMPAT - if (in_compat_syscall()) - res = compat_import_iovec(rw, hp->dxferp, iov_count, - 0, , ); - else -#endif - res = import_iovec(rw, hp->dxferp, iov_count, - 0, , ); + res = import_iovec(rw, hp->dxferp, iov_count, 0, , ); if (res < 0) return res; diff --git a/fs/aio.c b/fs/aio.c index d5ec303855669d..c45c20d875388c 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1489,12 +1489,8 @@ static ssize_t aio_setup_rw(int rw, const struct iocb *iocb, *iovec = NULL; return ret; } -#ifdef CONFIG_COMPAT - if (compat) - return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec, - iter); -#endif - return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter); + + return __import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter, compat); } static inline void aio_rw_done(struct kiocb *req, ssize_t ret) diff --git a/fs/io_uring.c b/fs/io_uring.c index 3790c7fe9fee22..ba84ecea7cb1a4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2837,13 +2837,8 @@ static ssize_t __io_import_iovec(int rw, struct io_kiocb *req, return ret; } -#ifdef CONFIG_COMPAT - if (req->ctx->compat) - return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV, - iovec, iter); -#endif - - return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter); + return __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter, + req->ctx->compat); } static ssize_t io_import_iovec(int rw, struct io_kiocb *req, @@ -4179,8 +4174,9 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req, sr->len); iomsg->iov = NULL; } else { - ret = import_iovec(READ, uiov, iov_len, UIO_FASTIOV, - >iov, >msg.msg_iter); + ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV, +>iov, >msg.msg_iter, +false); if (ret > 0) ret = 0; } @@ -4220,9 +4216,9 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req, sr->len = iomsg->iov[0].iov_len; iomsg->iov = NULL; } else { - ret = compat_import_iovec(READ, uiov, len, UIO_FASTIOV, - >iov, - >msg.msg_iter); + ret = __import_iovec(READ, (struct iovec __user *)uiov, len, + UIO_FASTIOV, >iov, + >msg.msg_iter,
[PATCH 1/9] compat.h: fix a spelling error in
There is no compat_sys_readv64v2 syscall, only a compat_sys_preadv64v2 one. Signed-off-by: Christoph Hellwig --- include/linux/compat.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/compat.h b/include/linux/compat.h index b354ce58966e2d..654c1ec36671a4 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd, const struct compat_iovec __user *vec, compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags); #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2 -asmlinkage long compat_sys_readv64v2(unsigned long fd, +asmlinkage long compat_sys_preadv64v2(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen, loff_t pos, rwf_t flags); #endif -- 2.28.0
[PATCH 5/9] fs: remove various compat readv/writev helpers
Now that import_iovec handles compat iovecs as well, all the duplicated code in the compat readv/writev helpers is not needed. Remove them and switch the compat syscall handlers to use the native helpers. Signed-off-by: Christoph Hellwig --- fs/read_write.c | 179 1 file changed, 30 insertions(+), 149 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 0a68037580b455..eab427b7cc0a3f 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1068,226 +1068,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec, return do_pwritev(fd, vec, vlen, pos, flags); } +/* + * Various compat syscalls. Note that they all pretend to take a native + * iovec - import_iovec will properly treat those as compat_iovecs based on + * in_compat_syscall(). + */ #ifdef CONFIG_COMPAT -static size_t compat_readv(struct file *file, - const struct compat_iovec __user *vec, - unsigned long vlen, loff_t *pos, rwf_t flags) -{ - struct iovec iovstack[UIO_FASTIOV]; - struct iovec *iov = iovstack; - struct iov_iter iter; - ssize_t ret; - - ret = import_iovec(READ, (const struct iovec __user *)vec, vlen, - UIO_FASTIOV, , ); - if (ret >= 0) { - ret = do_iter_read(file, , pos, flags); - kfree(iov); - } - if (ret > 0) - add_rchar(current, ret); - inc_syscr(current); - return ret; -} - -static size_t do_compat_readv(compat_ulong_t fd, -const struct compat_iovec __user *vec, -compat_ulong_t vlen, rwf_t flags) -{ - struct fd f = fdget_pos(fd); - ssize_t ret; - loff_t pos; - - if (!f.file) - return -EBADF; - pos = f.file->f_pos; - ret = compat_readv(f.file, vec, vlen, , flags); - if (ret >= 0) - f.file->f_pos = pos; - fdput_pos(f); - return ret; - -} - COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd, - const struct compat_iovec __user *,vec, + const struct iovec __user *, vec, compat_ulong_t, vlen) { - return do_compat_readv(fd, vec, vlen, 0); -} - -static long do_compat_preadv64(unsigned long fd, - const struct compat_iovec __user *vec, - unsigned long vlen, loff_t pos, rwf_t flags) -{ - struct fd f; - ssize_t ret; - - if (pos < 0) - return -EINVAL; - f = fdget(fd); - if (!f.file) - return -EBADF; - ret = -ESPIPE; - if (f.file->f_mode & FMODE_PREAD) - ret = compat_readv(f.file, vec, vlen, , flags); - fdput(f); - return ret; + return do_readv(fd, vec, vlen, 0); } #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd, - const struct compat_iovec __user *,vec, + const struct iovec __user *, vec, unsigned long, vlen, loff_t, pos) { - return do_compat_preadv64(fd, vec, vlen, pos, 0); + return do_preadv(fd, vec, vlen, pos, 0); } #endif COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd, - const struct compat_iovec __user *,vec, + const struct iovec __user *, vec, compat_ulong_t, vlen, u32, pos_low, u32, pos_high) { loff_t pos = ((loff_t)pos_high << 32) | pos_low; - return do_compat_preadv64(fd, vec, vlen, pos, 0); + return do_preadv(fd, vec, vlen, pos, 0); } #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd, - const struct compat_iovec __user *,vec, + const struct iovec __user *, vec, unsigned long, vlen, loff_t, pos, rwf_t, flags) { if (pos == -1) - return do_compat_readv(fd, vec, vlen, flags); - - return do_compat_preadv64(fd, vec, vlen, pos, flags); + return do_readv(fd, vec, vlen, flags); + return do_preadv(fd, vec, vlen, pos, flags); } #endif COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd, - const struct compat_iovec __user *,vec, + const struct iovec __user *, vec, compat_ulong_t, vlen, u32, pos_low, u32, pos_high, rwf_t, flags) { loff_t pos = ((loff_t)pos_high << 32) | pos_low; if (pos == -1) - return do_compat_readv(fd, vec, vlen, flags); - - return do_compat_preadv64(fd, vec, vlen, pos, flags); -} - -static size_t compat_writev(struct file *file, - const struct compat_iovec __user *vec, - unsigned long vlen, loff_t *pos, rwf_t flags) -{ - struct iovec iovstack[UIO_FASTIOV]; - struct iovec *iov = iovstack; - struct iov_iter
[PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
From: David Laight This lets the compiler inline it into import_iovec() generating much better code. Signed-off-by: David Laight Signed-off-by: Christoph Hellwig --- fs/read_write.c | 179 lib/iov_iter.c | 176 +++ 2 files changed, 176 insertions(+), 179 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 5db58b8c78d0dd..e5e891a88442ef 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -752,185 +752,6 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter, return ret; } -/** - * rw_copy_check_uvector() - Copy an array of iovec from userspace - * into the kernel and check that it is valid. - * - * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE. - * @uvector: Pointer to the userspace array. - * @nr_segs: Number of elements in userspace array. - * @fast_segs: Number of elements in @fast_pointer. - * @fast_pointer: Pointer to (usually small on-stack) kernel array. - * @ret_pointer: (output parameter) Pointer to a variable that will point to - * either @fast_pointer, a newly allocated kernel array, or NULL, - * depending on which array was used. - * - * This function copies an array of iovec of @nr_segs from - * userspace into the kernel and checks that each element is valid (e.g. - * it does not point to a kernel address or cause overflow by being too - * large, etc.). - * - * As an optimization, the caller may provide a pointer to a small - * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long - * (the size of this array, or 0 if unused, should be given in @fast_segs). - * - * @ret_pointer will always point to the array that was used, so the - * caller must take care not to call kfree() on it e.g. in case the - * @fast_pointer array was used and it was allocated on the stack. - * - * Return: The total number of bytes covered by the iovec array on success - * or a negative error code on error. - */ -ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector, - unsigned long nr_segs, unsigned long fast_segs, - struct iovec *fast_pointer, - struct iovec **ret_pointer) -{ - unsigned long seg; - ssize_t ret; - struct iovec *iov = fast_pointer; - - /* -* SuS says "The readv() function *may* fail if the iovcnt argument -* was less than or equal to 0, or greater than {IOV_MAX}. Linux has -* traditionally returned zero for zero segments, so... -*/ - if (nr_segs == 0) { - ret = 0; - goto out; - } - - /* -* First get the "struct iovec" from user memory and -* verify all the pointers -*/ - if (nr_segs > UIO_MAXIOV) { - ret = -EINVAL; - goto out; - } - if (nr_segs > fast_segs) { - iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL); - if (iov == NULL) { - ret = -ENOMEM; - goto out; - } - } - if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) { - ret = -EFAULT; - goto out; - } - - /* -* According to the Single Unix Specification we should return EINVAL -* if an element length is < 0 when cast to ssize_t or if the -* total length would overflow the ssize_t return value of the -* system call. -* -* Linux caps all read/write calls to MAX_RW_COUNT, and avoids the -* overflow case. -*/ - ret = 0; - for (seg = 0; seg < nr_segs; seg++) { - void __user *buf = iov[seg].iov_base; - ssize_t len = (ssize_t)iov[seg].iov_len; - - /* see if we we're about to use an invalid len or if -* it's about to overflow ssize_t */ - if (len < 0) { - ret = -EINVAL; - goto out; - } - if (type >= 0 - && unlikely(!access_ok(buf, len))) { - ret = -EFAULT; - goto out; - } - if (len > MAX_RW_COUNT - ret) { - len = MAX_RW_COUNT - ret; - iov[seg].iov_len = len; - } - ret += len; - } -out: - *ret_pointer = iov; - return ret; -} - -#ifdef CONFIG_COMPAT -ssize_t compat_rw_copy_check_uvector(int type, - const struct compat_iovec __user *uvector, unsigned long nr_segs, - unsigned long fast_segs, struct iovec *fast_pointer, - struct iovec **ret_pointer) -{ - compat_ssize_t tot_len; - struct iovec *iov = *ret_pointer = fast_pointer; - ssize_t ret = 0; - int seg; - - /* -* SuS says
[PATCH 8/9] mm: remove compat_process_vm_{readv,writev}
Now that import_iovec handles compat iovecs, the native syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig --- arch/arm64/include/asm/unistd32.h | 4 +- arch/mips/kernel/syscalls/syscall_n32.tbl | 4 +- arch/mips/kernel/syscalls/syscall_o32.tbl | 4 +- arch/parisc/kernel/syscalls/syscall.tbl | 4 +- arch/powerpc/kernel/syscalls/syscall.tbl | 4 +- arch/s390/kernel/syscalls/syscall.tbl | 4 +- arch/sparc/kernel/syscalls/syscall.tbl| 4 +- arch/x86/entry/syscall_x32.c | 2 + arch/x86/entry/syscalls/syscall_32.tbl| 4 +- arch/x86/entry/syscalls/syscall_64.tbl| 4 +- include/linux/compat.h| 8 --- include/uapi/asm-generic/unistd.h | 6 +- mm/process_vm_access.c| 69 --- tools/include/uapi/asm-generic/unistd.h | 6 +- .../arch/powerpc/entry/syscalls/syscall.tbl | 4 +- .../perf/arch/s390/entry/syscalls/syscall.tbl | 4 +- .../arch/x86/entry/syscalls/syscall_64.tbl| 4 +- 17 files changed, 30 insertions(+), 109 deletions(-) diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 11dfae3a8563bd..0c280a05f699bf 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg) #define __NR_setns 375 __SYSCALL(__NR_setns, sys_setns) #define __NR_process_vm_readv 376 -__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv) +__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv) #define __NR_process_vm_writev 377 -__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev) +__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev) #define __NR_kcmp 378 __SYSCALL(__NR_kcmp, sys_kcmp) #define __NR_finit_module 379 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -317,8 +317,8 @@ 306n32 syncfs sys_syncfs 307n32 sendmmsgcompat_sys_sendmmsg 308n32 setns sys_setns -309n32 process_vm_readvcompat_sys_process_vm_readv -310n32 process_vm_writev compat_sys_process_vm_writev +309n32 process_vm_readvsys_process_vm_readv +310n32 process_vm_writev sys_process_vm_writev 311n32 kcmpsys_kcmp 312n32 finit_modulesys_finit_module 313n32 sched_setattr sys_sched_setattr diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 136efc6b8c5444..b408c13b934296 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -356,8 +356,8 @@ 342o32 syncfs sys_syncfs 343o32 sendmmsgsys_sendmmsg compat_sys_sendmmsg 344o32 setns sys_setns -345o32 process_vm_readvsys_process_vm_readv compat_sys_process_vm_readv -346o32 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev +345o32 process_vm_readvsys_process_vm_readv +346o32 process_vm_writev sys_process_vm_writev 347o32 kcmpsys_kcmp 348o32 finit_modulesys_finit_module 349o32 sched_setattr sys_sched_setattr diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index a9e184192caedd..2015a5124b78ad 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -372,8 +372,8 @@ 327common syncfs sys_syncfs 328common setns sys_setns 329common sendmmsgsys_sendmmsg compat_sys_sendmmsg -330common process_vm_readvsys_process_vm_readv compat_sys_process_vm_readv -331common process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev +330common process_vm_readvsys_process_vm_readv +331common process_vm_writev sys_process_vm_writev 332common kcmpsys_kcmp 333common finit_modulesys_finit_module 334common sched_setattr sys_sched_setattr diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 0d4985919ca34d..66a472aa635d3f 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++
let import_iovec deal with compat_iovecs as well v3
Hi Al, this series changes import_iovec to transparently deal with comat iovec structures, and then cleanups up a lot of code dupliation. Changes since v2: - revert the switch of the access process vm sysclls to iov_iter - refactor the import_iovec internals differently - switch aio to use __import_iovec Changes since v1: - improve a commit message - drop a pointless unlikely - drop the PF_FORCE_COMPAT flag - add a few more cleanups (including two from David Laight) Diffstat: arch/arm64/include/asm/unistd32.h | 10 arch/mips/kernel/syscalls/syscall_n32.tbl | 10 arch/mips/kernel/syscalls/syscall_o32.tbl | 10 arch/parisc/kernel/syscalls/syscall.tbl| 10 arch/powerpc/kernel/syscalls/syscall.tbl | 10 arch/s390/kernel/syscalls/syscall.tbl | 10 arch/sparc/kernel/syscalls/syscall.tbl | 10 arch/x86/entry/syscall_x32.c |5 arch/x86/entry/syscalls/syscall_32.tbl | 10 arch/x86/entry/syscalls/syscall_64.tbl | 10 block/scsi_ioctl.c | 12 drivers/scsi/sg.c |9 fs/aio.c | 38 -- fs/io_uring.c | 20 - fs/read_write.c| 362 + fs/splice.c| 57 --- include/linux/compat.h | 24 - include/linux/fs.h | 11 include/linux/uio.h| 10 include/uapi/asm-generic/unistd.h | 12 lib/iov_iter.c | 161 +++-- mm/process_vm_access.c | 85 net/compat.c |4 security/keys/compat.c | 37 -- security/keys/internal.h |5 security/keys/keyctl.c |2 tools/include/uapi/asm-generic/unistd.h| 12 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 10 tools/perf/arch/s390/entry/syscalls/syscall.tbl| 10 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 10 30 files changed, 280 insertions(+), 706 deletions(-)
[PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov
Now that import_iovec handles compat iovecs, the native version of keyctl_instantiate_key_iov can be used for the compat case as well. Signed-off-by: Christoph Hellwig --- security/keys/compat.c | 36 ++-- security/keys/internal.h | 5 - security/keys/keyctl.c | 2 +- 3 files changed, 3 insertions(+), 40 deletions(-) diff --git a/security/keys/compat.c b/security/keys/compat.c index 7ae531db031cf8..1545efdca56227 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -11,38 +11,6 @@ #include #include "internal.h" -/* - * Instantiate a key with the specified compatibility multipart payload and - * link the key into the destination keyring if one is given. - * - * The caller must have the appropriate instantiation permit set for this to - * work (see keyctl_assume_authority). No other permissions are required. - * - * If successful, 0 will be returned. - */ -static long compat_keyctl_instantiate_key_iov( - key_serial_t id, - const struct compat_iovec __user *_payload_iov, - unsigned ioc, - key_serial_t ringid) -{ - struct iovec iovstack[UIO_FASTIOV], *iov = iovstack; - struct iov_iter from; - long ret; - - if (!_payload_iov) - ioc = 0; - - ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov, - ioc, ARRAY_SIZE(iovstack), , ); - if (ret < 0) - return ret; - - ret = keyctl_instantiate_key_common(id, , ringid); - kfree(iov); - return ret; -} - /* * The key control system call, 32-bit compatibility version for 64-bit archs */ @@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, return keyctl_reject_key(arg2, arg3, arg4, arg5); case KEYCTL_INSTANTIATE_IOV: - return compat_keyctl_instantiate_key_iov( - arg2, compat_ptr(arg3), arg4, arg5); + return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4, + arg5); case KEYCTL_INVALIDATE: return keyctl_invalidate_key(arg2); diff --git a/security/keys/internal.h b/security/keys/internal.h index 338a526cbfa516..9b9cf3b6fcbb4d 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t, const struct iovec __user *, unsigned, key_serial_t); extern long keyctl_invalidate_key(key_serial_t); - -struct iov_iter; -extern long keyctl_instantiate_key_common(key_serial_t, - struct iov_iter *, - key_serial_t); extern long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, const char __user *_restriction); diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9febd37a168fd0..e26bbccda7ccee 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key) * * If successful, 0 will be returned. */ -long keyctl_instantiate_key_common(key_serial_t id, +static long keyctl_instantiate_key_common(key_serial_t id, struct iov_iter *from, key_serial_t ringid) { -- 2.28.0
[PATCH 7/9] fs: remove compat_sys_vmsplice
Now that import_iovec handles compat iovecs, the native vmsplice syscall can be used for the compat case as well. Signed-off-by: Christoph Hellwig --- arch/arm64/include/asm/unistd32.h | 2 +- arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +- arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +- arch/parisc/kernel/syscalls/syscall.tbl | 2 +- arch/powerpc/kernel/syscalls/syscall.tbl | 2 +- arch/s390/kernel/syscalls/syscall.tbl | 2 +- arch/sparc/kernel/syscalls/syscall.tbl| 2 +- arch/x86/entry/syscall_x32.c | 1 + arch/x86/entry/syscalls/syscall_32.tbl| 2 +- arch/x86/entry/syscalls/syscall_64.tbl| 2 +- fs/splice.c | 57 +-- include/linux/compat.h| 4 -- include/uapi/asm-generic/unistd.h | 2 +- tools/include/uapi/asm-generic/unistd.h | 2 +- .../arch/powerpc/entry/syscalls/syscall.tbl | 2 +- .../perf/arch/s390/entry/syscalls/syscall.tbl | 2 +- .../arch/x86/entry/syscalls/syscall_64.tbl| 2 +- 17 files changed, 28 insertions(+), 62 deletions(-) diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 4a236493dca5b9..11dfae3a8563bd 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, compat_sys_aarch32_sync_file_range2) #define __NR_tee 342 __SYSCALL(__NR_tee, sys_tee) #define __NR_vmsplice 343 -__SYSCALL(__NR_vmsplice, compat_sys_vmsplice) +__SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages 344 __SYSCALL(__NR_move_pages, compat_sys_move_pages) #define __NR_getcpu 345 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c99a92646f8ee9..5a39d4de0ac85b 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -278,7 +278,7 @@ 267n32 splice sys_splice 268n32 sync_file_range sys_sync_file_range 269n32 tee sys_tee -270n32 vmsplicecompat_sys_vmsplice +270n32 vmsplicesys_vmsplice 271n32 move_pages compat_sys_move_pages 272n32 set_robust_list compat_sys_set_robust_list 273n32 get_robust_list compat_sys_get_robust_list diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 075064d10661bf..136efc6b8c5444 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -318,7 +318,7 @@ 304o32 splice sys_splice 305o32 sync_file_range sys_sync_file_range sys32_sync_file_range 306o32 tee sys_tee -307o32 vmsplicesys_vmsplice compat_sys_vmsplice +307o32 vmsplicesys_vmsplice 308o32 move_pages sys_move_pages compat_sys_move_pages 309o32 set_robust_list sys_set_robust_list compat_sys_set_robust_list 310o32 get_robust_list sys_get_robust_list compat_sys_get_robust_list diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 192abde0001d9d..a9e184192caedd 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -330,7 +330,7 @@ 29232 sync_file_range parisc_sync_file_range 29264 sync_file_range sys_sync_file_range 293common tee sys_tee -294common vmsplicesys_vmsplice compat_sys_vmsplice +294common vmsplicesys_vmsplice 295common move_pages sys_move_pages compat_sys_move_pages 296common getcpu sys_getcpu 297common epoll_pwait sys_epoll_pwait compat_sys_epoll_pwait diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 6f1e2ecf0edad9..0d4985919ca34d 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -369,7 +369,7 @@ 282common unshare sys_unshare 283common splice sys_splice 284common tee sys_tee -285common vmsplicesys_vmsplice compat_sys_vmsplice +285common vmsplicesys_vmsplice 286common openat sys_openat compat_sys_openat 287common
[PATCH 6/9] fs: remove the compat readv/writev syscalls
Now that import_iovec handles compat iovecs, the native readv and writev syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig --- arch/arm64/include/asm/unistd32.h | 4 ++-- arch/mips/kernel/syscalls/syscall_n32.tbl | 4 ++-- arch/mips/kernel/syscalls/syscall_o32.tbl | 4 ++-- arch/parisc/kernel/syscalls/syscall.tbl| 4 ++-- arch/powerpc/kernel/syscalls/syscall.tbl | 4 ++-- arch/s390/kernel/syscalls/syscall.tbl | 4 ++-- arch/sparc/kernel/syscalls/syscall.tbl | 4 ++-- arch/x86/entry/syscall_x32.c | 2 ++ arch/x86/entry/syscalls/syscall_32.tbl | 4 ++-- arch/x86/entry/syscalls/syscall_64.tbl | 4 ++-- fs/read_write.c| 14 -- include/linux/compat.h | 4 include/uapi/asm-generic/unistd.h | 4 ++-- tools/include/uapi/asm-generic/unistd.h| 4 ++-- tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 4 ++-- tools/perf/arch/s390/entry/syscalls/syscall.tbl| 4 ++-- tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 4 ++-- 17 files changed, 30 insertions(+), 46 deletions(-) diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 734860ac7cf9d5..4a236493dca5b9 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock) #define __NR_msync 144 __SYSCALL(__NR_msync, sys_msync) #define __NR_readv 145 -__SYSCALL(__NR_readv, compat_sys_readv) +__SYSCALL(__NR_readv, sys_readv) #define __NR_writev 146 -__SYSCALL(__NR_writev, compat_sys_writev) +__SYSCALL(__NR_writev, sys_writev) #define __NR_getsid 147 __SYSCALL(__NR_getsid, sys_getsid) #define __NR_fdatasync 148 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index f9df9edb67a407..c99a92646f8ee9 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -25,8 +25,8 @@ 15 n32 ioctl compat_sys_ioctl 16 n32 pread64 sys_pread64 17 n32 pwrite64sys_pwrite64 -18 n32 readv compat_sys_readv -19 n32 writev compat_sys_writev +18 n32 readv sys_readv +19 n32 writev sys_writev 20 n32 access sys_access 21 n32 pipesysm_pipe 22 n32 _newselect compat_sys_select diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 195b43cf27c848..075064d10661bf 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -156,8 +156,8 @@ 142o32 _newselect sys_select compat_sys_select 143o32 flock sys_flock 144o32 msync sys_msync -145o32 readv sys_readv compat_sys_readv -146o32 writev sys_writev compat_sys_writev +145o32 readv sys_readv +146o32 writev sys_writev 147o32 cacheflush sys_cacheflush 148o32 cachectlsys_cachectl 149o32 sysmips __sys_sysmips diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index def64d221cd4fb..192abde0001d9d 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -159,8 +159,8 @@ 142common _newselect sys_select compat_sys_select 143common flock sys_flock 144common msync sys_msync -145common readv sys_readv compat_sys_readv -146common writev sys_writev compat_sys_writev +145common readv sys_readv +146common writev sys_writev 147common getsid sys_getsid 148common fdatasync sys_fdatasync 149common _sysctl sys_ni_syscall diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index c2d737ff2e7bec..6f1e2ecf0edad9 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -193,8 +193,8 @@ 142common _newselect sys_select compat_sys_select 143common
[PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
Split rw_copy_check_uvector into two new helpers with more sensible calling conventions: - iovec_from_user copies a iovec from userspace either into the provided stack buffer if it fits, or allocates a new buffer for it. Returns the actually used iovec. It also verifies that iov_len does fit a signed type, and handles compat iovecs if the compat flag is set. - __import_iovec consolidates the native and compat versions of import_iovec. It calls iovec_from_user, then validates each iovec actually points to user addresses, and ensures the total length doesn't overflow. This has two major implications: - the access_process_vm case loses the total lenght checking, which wasn't required anyway, given that each call receives two iovecs for the local and remote side of the operation, and it verifies the total length on the local side already. - instead of a single loop there now are two loops over the iovecs. Given that the iovecs are cache hot this doesn't make a major difference Signed-off-by: Christoph Hellwig --- include/linux/compat.h | 6 - include/linux/fs.h | 13 -- include/linux/uio.h| 12 +- lib/iov_iter.c | 300 - mm/process_vm_access.c | 34 +++-- 5 files changed, 138 insertions(+), 227 deletions(-) diff --git a/include/linux/compat.h b/include/linux/compat.h index 654c1ec36671a4..b930de791ff16b 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -451,12 +451,6 @@ extern long compat_arch_ptrace(struct task_struct *child, compat_long_t request, struct epoll_event;/* fortunately, this one is fixed-layout */ -extern ssize_t compat_rw_copy_check_uvector(int type, - const struct compat_iovec __user *uvector, - unsigned long nr_segs, - unsigned long fast_segs, struct iovec *fast_pointer, - struct iovec **ret_pointer); - extern void __user *compat_alloc_user_space(unsigned long len); int compat_restore_altstack(const compat_stack_t __user *uss); diff --git a/include/linux/fs.h b/include/linux/fs.h index 7519ae003a082c..e69b45b6cc7b5f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -178,14 +178,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File supports async buffered reads */ #define FMODE_BUF_RASYNC ((__force fmode_t)0x4000) -/* - * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector - * that indicates that they should check the contents of the iovec are - * valid, but not check the memory that the iovec elements - * points too. - */ -#define CHECK_IOVEC_ONLY -1 - /* * Attribute flags. These should be or-ed together to figure out what * has been changed! @@ -1887,11 +1879,6 @@ static inline int call_mmap(struct file *file, struct vm_area_struct *vma) return file->f_op->mmap(file, vma); } -ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector, - unsigned long nr_segs, unsigned long fast_segs, - struct iovec *fast_pointer, - struct iovec **ret_pointer); - extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *); extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t vfs_readv(struct file *, const struct iovec __user *, diff --git a/include/linux/uio.h b/include/linux/uio.h index 3835a8a8e9eae0..92c11fe41c6228 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -266,9 +266,15 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp, struct iov_iter *i); -ssize_t import_iovec(int type, const struct iovec __user * uvector, -unsigned nr_segs, unsigned fast_segs, -struct iovec **iov, struct iov_iter *i); +struct iovec *iovec_from_user(const struct iovec __user *uvector, + unsigned long nr_segs, unsigned long fast_segs, + struct iovec *fast_iov, bool compat); +ssize_t import_iovec(int type, const struct iovec __user *uvec, +unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, +struct iov_iter *i); +ssize_t __import_iovec(int type, const struct iovec __user *uvec, +unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, +struct iov_iter *i, bool compat); #ifdef CONFIG_COMPAT struct compat_iovec; diff --git a/lib/iov_iter.c b/lib/iov_iter.c index ccea9db3f72be8..d5d8afe31fca16 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -1650,107 +1651,133 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags) } EXPORT_SYMBOL(dup_iter); -/** - * rw_copy_check_uvector() - Copy an array of iovec from userspace - *