Re: [PATCH RFC v1 0/5] Add SCSI per device tagsets
On Fri, Feb 18, 2022 at 06:41:52PM +, Melanie Plageman (Microsoft) wrote: > Currently a single blk_mq_tag_set is associated with a Scsi_Host. When SCSI > controllers are limited, attaching multiple devices to the same controller is > required. In cloud environments with relatively high latency persistent > storage, > requiring all devices on a controller to share a single blk_mq_tag_set > negatively impacts performance. So add more controllers instead of completely breaking the architecture.
[Bug 215622] WARNING: possible irq lock inversion dependency detected
https://bugzilla.kernel.org/show_bug.cgi?id=215622 --- Comment #1 from Erhard F. (erhar...@mailbox.org) --- Created attachment 300485 --> https://bugzilla.kernel.org/attachment.cgi?id=300485=edit kernel .config (5.17-rc4, PowerMac G5 11,2) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215622] New: WARNING: possible irq lock inversion dependency detected
https://bugzilla.kernel.org/show_bug.cgi?id=215622 Bug ID: 215622 Summary: WARNING: possible irq lock inversion dependency detected Product: Platform Specific/Hardware Version: 2.5 Kernel Version: 5.17-rc4 Hardware: PPC-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: PPC-64 Assignee: platform_ppc...@kernel-bugs.osdl.org Reporter: erhar...@mailbox.org Regression: No Created attachment 300484 --> https://bugzilla.kernel.org/attachment.cgi?id=300484=edit dmesg (5.17-rc4, PowerMac G5 11,2) Don't know whether this has something to do or in common with a netconsole issue I unearthed on other hardware (bug #212509)... But as the trace looks different I'll post this as a seperate bug: [...] WARNING: possible irq lock inversion dependency detected 5.17.0-rc4-PowerMacG5+ #2 Not tainted swapper/0/1 just changed the state of lock: c13e4490 (native_tlbie_lock){+.-.}-{2:2}, at: .tlbie+0x70/0x10c but this lock was taken by another, HARDIRQ-safe lock in the past: (console_owner){-...}-{0:0} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: console_owner --> target_list_lock --> native_tlbie_lock Possible interrupt unsafe locking scenario: CPU0CPU1 lock(native_tlbie_lock); local_irq_disable(); lock(console_owner); lock(target_list_lock); lock(console_owner); *** DEADLOCK *** no locks held by swapper/0/1. the shortest dependencies between 2nd lock and 1st lock: -> (console_owner){-...}-{0:0} ops: 994 { IN-HARDIRQ-W at: .lock_acquire+0x290/0x2e8 .console_unlock+0x2fc/0x628 .vprintk_emit+0x270/0x280 .vprintk+0x8c/0x94 ._printk+0x30/0x44 .crng_fast_load+0x128/0x17c .add_interrupt_randomness+0x330/0x488 .handle_irq_event_percpu+0x28/0x54 .handle_irq_event+0x44/0x70 .handle_fasteoi_irq+0xac/0x158 .handle_irq_desc+0x34/0x54 .__do_irq+0x174/0x250 .__do_IRQ+0xac/0xb4 init_stack+0x3820/0x4000 .do_IRQ+0xd0/0x124 hardware_interrupt_common_virt+0x208/0x210 .power4_idle+0x3c/0x70 .arch_cpu_idle+0x8c/0x114 .default_idle_call+0x7c/0xd4 .do_idle+0x118/0x12c .cpu_startup_entry+0x28/0x2c .rest_init+0x1bc/0x1c8 .start_kernel+0xba8/0xca0 start_here_common+0x1c/0x44 INITIAL USE at: .lock_acquire+0x290/0x2e8 .console_unlock+0x2fc/0x628 .vprintk_emit+0x270/0x280 .vprintk+0x8c/0x94 ._printk+0x30/0x44 .start_kernel+0xc4/0xca0 start_here_common+0x1c/0x44 } ... key at: [] console_owner_dep_map+0x0/0x28 ... acquired at: ._raw_spin_lock_irqsave+0x6c/0x98 .write_msg+0x64/0x10c .console_unlock+0x53c/0x628 .register_console+0x250/0x330 .init_netconsole+0x538/0x610 .do_one_initcall+0x100/0x2dc .kernel_init_freeable+0x644/0x748 .kernel_init+0x20/0x178 .ret_from_kernel_thread+0x58/0x60 -> (target_list_lock){}-{2:2} ops: 461 { INITIAL USE at: .lock_acquire+0x290/0x2e8 ._raw_spin_lock_irqsave+0x6c/0x98 .init_netconsole+0x40c/0x610 .do_one_initcall+0x100/0x2dc .kernel_init_freeable+0x644/0x748 .kernel_init+0x20/0x178 .ret_from_kernel_thread+0x58/0x60 } ... key at: [] target_list_lock+0x18/0x40 ... acquired at: ._raw_spin_lock+0x44/0x68 .tlbie+0x70/0x10c .native_hpte_invalidate+0xcc/0x114 .hash__kernel_map_pages+0x270/0x280 .debug_pagealloc_unmap_pages+0x34/0x40 .free_unref_page_prepare+0x2c8/0x314 .free_unref_page+0x38/0xdc .__free_slab+0xc4/0x158 .kfree_skbmem+0x5c/0x7c .zap_completion_queue+0x128/0x130 .netpoll_send_skb+0x2e0/0x348 .write_msg+0xfc/0x10c .console_unlock+0x53c/0x628 .vprintk_emit+0x270/0x280 .vprintk+0x8c/0x94 ._printk+0x30/0x44 .register_console+0x288/0x330
[Bug 215621] Warning: Unable to mark rodata read only on this CPU. (PPC970MP)
https://bugzilla.kernel.org/show_bug.cgi?id=215621 --- Comment #1 from Erhard F. (erhar...@mailbox.org) --- Created attachment 300483 --> https://bugzilla.kernel.org/attachment.cgi?id=300483=edit kernel .config (5.17-rc4, PowerMac G5 11,2) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215621] New: Warning: Unable to mark rodata read only on this CPU. (PPC970MP)
https://bugzilla.kernel.org/show_bug.cgi?id=215621 Bug ID: 215621 Summary: Warning: Unable to mark rodata read only on this CPU. (PPC970MP) Product: Platform Specific/Hardware Version: 2.5 Kernel Version: 5.17-rc4 Hardware: PPC-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: PPC-64 Assignee: platform_ppc...@kernel-bugs.osdl.org Reporter: erhar...@mailbox.org Regression: No Created attachment 300482 --> https://bugzilla.kernel.org/attachment.cgi?id=300482=edit dmesg (5.17-rc4, PowerMac G5 11,2) STRICT_MODULE_RWX is enabled in kernel but it does not seem to work on my PowerMac G5 11,2. Kernel dmesg says: [...] Freeing unused kernel image (initmem) memory: 3968K Warning: Unable to mark rodata read only on this CPU. rodata_test: test data was not read only [...] # lscpu Architecture: ppc64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Big Endian CPU(s):2 On-line CPU(s) list: 0,1 Model name:PPC970MP, altivec supported Model: 1.1 (pvr 0044 0101) Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 CPU max MHz: 2300. CPU min MHz: 1150. Caches (sum of all): L1d: 64 KiB (2 instances) L1i: 128 KiB (2 instances) L2: 2 MiB (2 instances) NUMA: NUMA node(s):1 NUMA node0 CPU(s): 0,1 Vulnerabilities: Itlb multihit: Not affected L1tf:Vulnerable Mds: Not affected Meltdown:Vulnerable Spec store bypass: Vulnerable Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Vulnerable Srbds: Not affected Tsx async abort: Not affected -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
False positive kmemleak report for dtb properties names on powerpc
Hello! I was running a powerpc 32bit kernel (built using qemu_ppc_mpc8544ds_defconfig buildroot config, with enabling DEBUGFS+KMEMLEAK+HIGHMEM in the kernel config) on qemu and invoked the kmemleak scan (twice. for some reason the first time wasn't enough). (Actually the problem will probably reproduce on every ppc kernel with HIGHMEM enabled, but I only checked this config) I got 97 leak reports, all similar to the following: ``` unreferenced object 0xc1803840 (size 16): comm "swapper", pid 1, jiffies 4294892303 (age 39.320s) hex dump (first 16 bytes): 64 65 76 69 63 65 5f 74 79 70 65 00 00 00 00 00 device_type. backtrace: [<(ptrval)>] kstrdup+0x40/0x98 [<(ptrval)>] __of_add_property_sysfs+0xa4/0x10c [<(ptrval)>] __of_attach_node_sysfs+0xc0/0x110 [<(ptrval)>] of_core_init+0xa8/0x15c [<(ptrval)>] driver_init+0x24/0x3c [<(ptrval)>] kernel_init_freeable+0xb8/0x23c [<(ptrval)>] kernel_init+0x24/0x14c [<(ptrval)>] ret_from_kernel_thread+0x5c/0x64 ``` The objects in the reports are the names of the sysfs files created for the dtb nodes and properties. These are definitely not leaked, as they are even visible to the user as the sysfs file names. These strings (for dtb properties, in the case of the shown report, but the case with dtb nodes is very similar) are created in __of_add_property_sysfs() and the pointer to them is stored in pp->attr.attr.name (so, actually stored in the memory pointed by pp) pp is one of the dtb property objects which are allocated in early_init_dt_alloc_memory_arch() in of/fdt.c using memblock_alloc. This happens very early, in setup_arch()->unflatten_device_tree(). memblock_alloc lets kmemleak know about the allocated memory using kmemleak_alloc_phys (in mm/memblock.c:memblock_alloc_range_nid()). The problem is with the following code (mm/kmemleak.c): ```c void __ref kmemleak_alloc_phys(phys_addr_t phys, size_t size, int min_count, gfp_t gfp) { if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn) kmemleak_alloc(__va(phys), size, min_count, gfp); } ``` When CONFIG_HIGHMEM is enabled, the pfn of the allocated memory is checked against max_low_pfn, to make sure it is not in the HIGHMEM zone. However, when called through unflatten_device_tree(), max_low_pfn is not yet initialized in powerpc. max_low_pfn is initialized (when NUMA is disabled) in arch/powerpc/mm/mem.c:mem_topology_setup() which is called only after unflatten_device_tree() is called in the same function (setup_arch()). Because max_low_pfn is global it is 0 before initialization, so as far as kmemleak_alloc_phys() is concerned, every memory is HIGHMEM (: and the allocated memory is not tracked by kmemleak, causing references to objects allocated later with kmalloc() to be ignored and these objects are marked as leaked. I actually tried to find out whether this happen on other arches as well, and it seems like arm64 also have this problem when dtb is used instead of acpi, although I haven't had the chance to confirm this. I don't suppose I can just shuffle the calls in setup_arch() around, so I wanted to hear your opinions first Thanks!
Re: [PATCH v2 13/18] uaccess: generalize access_ok()
On Fri, Feb 18, 2022 at 1:30 AM David Laight wrote: > > From: Andy Lutomirski > > Sent: 17 February 2022 19:15 > ... > > This isn't actually optimal. On x86, TASK_SIZE_MAX is a bizarre > > constant that has a very specific value to work around a bug^Wdesign > > error^Wfeature of Intel CPUs. TASK_SIZE_MAX is the maximum address at > > which userspace is permitted to allocate memory, but there is a huge > > gap between user and kernel addresses, and any value in the gap would > > be adequate for the comparison. If we wanted to optimize this, simply > > checking the high bit (which x86 can do without any immediate > > constants at all) would be sufficient and, for an access known to fit > > in 32 bits, one could get even fancier and completely ignore the size > > of the access. (For accesses not known to fit in 32 bits, I suspect > > some creativity could still come up with a construction that's > > substantially faster than the one in your patch.) > > > > So there's plenty of room for optimization here. > > > > (This is not in any respect a NAK -- it's just an observation that > > this could be even better.) > > For 64bit arch that use the top bit to separate user/kernel > you can test '(addr | size) >> 62)'. > The compiler optimises out constant sizes. > > This has all been mentioned a lot of times. > You do get different fault types. > > OTOH an explicit check for constant size (less than something big) > can use the cheaper test of the sign bit. > Big constant sizes could be compile time errors. The different fault type issue may well be a real problem. Right now the core x86 fault code reserves the right to grouch if we get #GP instead of #PF. We could change that.
Re: [PATCH v6 0/4] Add perf interface to expose nvdimm
On Thu, Feb 17, 2022 at 8:34 AM Kajol Jain wrote: > > Patchset adds performance stats reporting support for nvdimm. > Added interface includes support for pmu register/unregister > functions. A structure is added called nvdimm_pmu to be used for > adding arch/platform specific data such as cpumask, nvdimm device > pointer and pmu event functions like event_init/add/read/del. > User could use the standard perf tool to access perf events > exposed via pmu. > > Interface also defines supported event list, config fields for the > event attributes and their corresponding bit values which are exported > via sysfs. Patch 3 exposes IBM pseries platform nmem* device > performance stats using this interface. > > Result from power9 pseries lpar with 2 nvdimm device: > > Ex: List all event by perf list > > command:# perf list nmem > > nmem0/cache_rh_cnt/[Kernel PMU event] > nmem0/cache_wh_cnt/[Kernel PMU event] > nmem0/cri_res_util/[Kernel PMU event] > nmem0/ctl_res_cnt/ [Kernel PMU event] > nmem0/ctl_res_tm/ [Kernel PMU event] > nmem0/fast_w_cnt/ [Kernel PMU event] > nmem0/host_l_cnt/ [Kernel PMU event] > nmem0/host_l_dur/ [Kernel PMU event] > nmem0/host_s_cnt/ [Kernel PMU event] > nmem0/host_s_dur/ [Kernel PMU event] > nmem0/med_r_cnt/ [Kernel PMU event] > nmem0/med_r_dur/ [Kernel PMU event] > nmem0/med_w_cnt/ [Kernel PMU event] > nmem0/med_w_dur/ [Kernel PMU event] > nmem0/mem_life/[Kernel PMU event] > nmem0/poweron_secs/[Kernel PMU event] > ... > nmem1/mem_life/[Kernel PMU event] > nmem1/poweron_secs/[Kernel PMU event] > > Patch1: > Introduces the nvdimm_pmu structure > Patch2: > Adds common interface to add arch/platform specific data > includes nvdimm device pointer, pmu data along with > pmu event functions. It also defines supported event list > and adds attribute groups for format, events and cpumask. > It also adds code for cpu hotplug support. > Patch3: > Add code in arch/powerpc/platform/pseries/papr_scm.c to expose > nmem* pmu. It fills in the nvdimm_pmu structure with pmu name, > capabilities, cpumask and event functions and then registers > the pmu by adding callbacks to register_nvdimm_pmu. > Patch4: > Sysfs documentation patch > > Changelog > --- > Resend v5 -> v6 > - No logic change, just a rebase to latest upstream and > tested the patchset. > > - Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979 > > v5 -> Resend v5 > - Resend the patchset > > - Link to the patchset v5: https://lkml.org/lkml/2021/9/28/643 > > v4 -> v5: > - Remove multiple variables defined in nvdimm_pmu structure include > name and pmu functions(event_int/add/del/read) as they are just > used to copy them again in pmu variable. Now we are directly doing > this step in arch specific code as suggested by Dan Williams. > > - Remove attribute group field from nvdimm pmu structure and > defined these attribute groups in common interface which > includes format, event list along with cpumask as suggested by > Dan Williams. > Since we added static defination for attrbute groups needed in > common interface, removes corresponding code from papr. > > - Add nvdimm pmu event list with event codes in the common interface. > > - Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored > to handle review comments from Dan. I don't think review comments should invalidate the Acked-by tags in this case. Nothing fundamentally changed in the approach, and I would like to have the perf ack before taking this through the nvdimm tree. Otherwise this looks good to me. Peter, might you have a chance to re-Ack this series, or any concerns about me retrieving those Acks from the previous postings?
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.17-4 tag
The pull request you sent on Fri, 18 Feb 2022 13:54:05 +1100: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.17-4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/ea4b3d299fe6b6c9afa4a91dc2cf5479d0089eeb Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [PATCH net v3] net: Force inlining of checksum functions in net/checksum.h
On Fri, Feb 18, 2022 at 08:29:20AM -0800, Stephen Hemminger wrote: > On Fri, 18 Feb 2022 06:12:37 -0600 > Segher Boessenkool wrote: > > On Fri, Feb 18, 2022 at 10:35:48AM +0900, Masahiro Yamada wrote: > > > On Fri, Feb 18, 2022 at 3:10 AM Segher Boessenkool > > > wrote: > > > > On Fri, Feb 18, 2022 at 02:27:16AM +0900, Masahiro Yamada wrote: > > > > > On Fri, Feb 18, 2022 at 1:49 AM David Laight > > > > > wrote: > > > > > > That description is largely fine. > > > > > > > > > > > > Inappropriate 'inline' ought to be removed. > > > > > > Then 'inline' means - 'really do inline this'. > > > > > > > > > > You cannot change "static inline" to "static" > > > > > in header files. > > > > > > > > Why not? Those two have identical semantics! > > > > > > e.g.) > > > > > > > > > [1] Open include/linux/device.h with your favorite editor, > > > then edit > > > > > > static inline void *devm_kcalloc(struct device *dev, > > > > > > to > > > > > > static void *devm_kcalloc(struct device *dev, > > > > > > > > > [2] Build the kernel > > > > You get some "defined but not used" warnings that are shushed for > > inlines. Do you see something else? > > > > The semantics are the same. Warnings are just warnings. It builds > > fine. > > Kernel code should build with zero warnings, the compiler is telling you > something. The second part is of course true. The first part less so, and is in fact not true at all from some points of view: $ ./build --kernel x86_64 Building x86_64... (target x86_64-linux) kernel: configure [00:06] build [02:12] 1949 warnings OK (This is with a development version of GCC.) There are simple ways to shut up specific warnings for specific code. That is useful, certainly. And so is having a warning-free build. It is obvious that we do survive without either of that though! And none of this detracts from the point that the semantics of "static" and "static inline" are identical. Segher
Re: [PATCH net v3] net: Force inlining of checksum functions in net/checksum.h
On Fri, 18 Feb 2022 06:12:37 -0600 Segher Boessenkool wrote: > On Fri, Feb 18, 2022 at 10:35:48AM +0900, Masahiro Yamada wrote: > > On Fri, Feb 18, 2022 at 3:10 AM Segher Boessenkool > > wrote: > > > On Fri, Feb 18, 2022 at 02:27:16AM +0900, Masahiro Yamada wrote: > > > > On Fri, Feb 18, 2022 at 1:49 AM David Laight > > > > wrote: > > > > > That description is largely fine. > > > > > > > > > > Inappropriate 'inline' ought to be removed. > > > > > Then 'inline' means - 'really do inline this'. > > > > > > > > You cannot change "static inline" to "static" > > > > in header files. > > > > > > Why not? Those two have identical semantics! > > > > e.g.) > > > > > > [1] Open include/linux/device.h with your favorite editor, > > then edit > > > > static inline void *devm_kcalloc(struct device *dev, > > > > to > > > > static void *devm_kcalloc(struct device *dev, > > > > > > [2] Build the kernel > > You get some "defined but not used" warnings that are shushed for > inlines. Do you see something else? > > The semantics are the same. Warnings are just warnings. It builds > fine. Kernel code should build with zero warnings, the compiler is telling you something.
RE: [PATCH v2 05/18] x86: remove __range_not_ok()
From: Christoph Hellwig > Sent: 18 February 2022 06:29 ... > > > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c > > index 15b058eefc4e..ee117fcf46ed 100644 > > --- a/arch/x86/kernel/stacktrace.c > > +++ b/arch/x86/kernel/stacktrace.c > > @@ -90,7 +90,7 @@ copy_stack_frame(const struct stack_frame_user __user *fp, > > { > > int ret; > > > > - if (__range_not_ok(fp, sizeof(*frame), TASK_SIZE)) > > + if (!__access_ok(fp, sizeof(*frame))) > > return 0; > > Just switch the __get_user calls below to get_user instead. Is this worth doing at all? How much userspace code is actually compiled with stack frames? Won't work well for a 32bit process on a 64bit kernel either. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH net v3] net: Force inlining of checksum functions in net/checksum.h
On Fri, Feb 18, 2022 at 10:35:48AM +0900, Masahiro Yamada wrote: > On Fri, Feb 18, 2022 at 3:10 AM Segher Boessenkool > wrote: > > On Fri, Feb 18, 2022 at 02:27:16AM +0900, Masahiro Yamada wrote: > > > On Fri, Feb 18, 2022 at 1:49 AM David Laight > > > wrote: > > > > That description is largely fine. > > > > > > > > Inappropriate 'inline' ought to be removed. > > > > Then 'inline' means - 'really do inline this'. > > > > > > You cannot change "static inline" to "static" > > > in header files. > > > > Why not? Those two have identical semantics! > > e.g.) > > > [1] Open include/linux/device.h with your favorite editor, > then edit > > static inline void *devm_kcalloc(struct device *dev, > > to > > static void *devm_kcalloc(struct device *dev, > > > [2] Build the kernel You get some "defined but not used" warnings that are shushed for inlines. Do you see something else? The semantics are the same. Warnings are just warnings. It builds fine. Segher
Re: [PATCH] net/ibmvnic: Cleanup workaround doing an EOI after partition migration
Hello: This patch was applied to netdev/net-next.git (master) by David S. Miller : On Fri, 18 Feb 2022 09:07:08 +0100 you wrote: > There were a fair amount of changes to workaround a firmware bug leaving > a pending interrupt after migration of the ibmvnic device : > > commit 2df5c60e198c ("net/ibmvnic: Ignore H_FUNCTION return from H_EOI > to tolerate XIVE mode") > commit 284f87d2f387 ("Revert "net/ibmvnic: Fix EOI when running in > XIVE mode"") > commit 11d49ce9f794 ("net/ibmvnic: Fix EOI when running in XIVE mode.") > commit f23e0643cd0b ("ibmvnic: Clear pending interrupt after device reset") > > [...] Here is the summary with links: - net/ibmvnic: Cleanup workaround doing an EOI after partition migration https://git.kernel.org/netdev/net-next/c/7ea0c16a74a4 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
On Fri, Feb 18, 2022 at 11:17:59AM +, Joakim Tjernlund wrote: > I was happy with commit msgs and I don't know what the criticism was. I have no context anymore, sorry. Can someone resubmit the change again and we can take it from there? thanks, greg k-h
Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
I was happy with commit msgs and I don't know what the criticism was. From: gre...@linuxfoundation.org Sent: 18 February 2022 11:39 To: Joakim Tjernlund Cc: Thorsten Leemhuis; Leo Li; eugene_bordenkirc...@selinc.com; linux-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; ba...@kernel.org Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop. On Fri, Feb 18, 2022 at 10:21:12AM +, Joakim Tjernlund wrote: > I think you could apply them as is, only criticism was the commit msgs. That is always a good reason to reject a change. Please resubmit them with the commit message cleaned up and I will be glad to review it again. thanks, greg k-h
Re: [PATCH v7 1/3] riscv: Introduce CONFIG_RELOCATABLE
Hi Palmer, Do you intend to pull that in for-next or not yet? Can I do something to help? Thanks, Alex On Mon, Jan 10, 2022 at 9:05 AM Alexandre ghiti wrote: > > Hi Palmer, > > Do you think this could go in for-next? > > Thanks, > > Alex > > On 12/6/21 10:44, Alexandre ghiti wrote: > > @Palmer, can I do anything for that to be pulled in 5.17? > > > > Thanks, > > > > Alex > > > > On 10/27/21 07:04, Alexandre ghiti wrote: > >> Hi Palmer, > >> > >> On 10/26/21 11:29 PM, Palmer Dabbelt wrote: > >>> On Sat, 09 Oct 2021 10:20:20 PDT (-0700), a...@ghiti.fr wrote: > Arf, I have sent this patchset with the wrong email address. @Palmer > tell me if you want me to resend it correctly. > >>> Sorry for being kind of slow here. It's fine: there's a "From:" in > >>> the patch, and git picks those up so it'll match the signed-off-by > >>> line. I send pretty much all my patches that way, as I never managed > >>> to get my Google address working correctly. > >>> > Thanks, > > Alex > > On 10/9/21 7:12 PM, Alexandre Ghiti wrote: > > From: Alexandre Ghiti > > > > This config allows to compile 64b kernel as PIE and to relocate it at > > any virtual address at runtime: this paves the way to KASLR. > > Runtime relocation is possible since relocation metadata are > > embedded into > > the kernel. > >>> IMO this should really be user selectable, at a bare minimum so it's > >>> testable. > >>> I just sent along a patch to do that (my power's off at home, so email > >>> is a bit > >>> wacky right now). > >>> > >>> I haven't put this on for-next yet as I'm not sure if you had a fix > >>> for the > >>> kasan issue (which IIUC would conflict with this). > >> > >> The kasan issue only revealed that I need to move the kasan shadow > >> memory around with sv48 support, that's not related to the relocatable > >> kernel. > >> > >> Thanks, > >> > >> Alex > >> > >> > > Note that relocating at runtime introduces an overhead even if the > > kernel is loaded at the same address it was linked at and that the > > compiler > > options are those used in arm64 which uses the same RELA relocation > > format. > > > > Signed-off-by: Alexandre Ghiti > > --- > > arch/riscv/Kconfig | 12 > > arch/riscv/Makefile | 7 +++-- > > arch/riscv/kernel/vmlinux.lds.S | 6 > > arch/riscv/mm/Makefile | 4 +++ > > arch/riscv/mm/init.c| 54 > > - > > 5 files changed, 80 insertions(+), 3 deletions(-) > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index ea16fa2dd768..043ba92559fa 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -213,6 +213,18 @@ config PGTABLE_LEVELS > > config LOCKDEP_SUPPORT > > def_bool y > > > > +config RELOCATABLE > > +bool > > +depends on MMU && 64BIT && !XIP_KERNEL > > +help > > + This builds a kernel as a Position Independent Executable > > (PIE), > > + which retains all relocation metadata required to > > relocate the > > + kernel binary at runtime to a different virtual address > > than the > > + address it was linked at. > > + Since RISCV uses the RELA relocation format, this > > requires a > > + relocation pass at runtime even if the kernel is loaded > > at the > > + same address it was linked at. > > + > > source "arch/riscv/Kconfig.socs" > > source "arch/riscv/Kconfig.erratas" > > > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile > > index 0eb4568fbd29..2f509915f246 100644 > > --- a/arch/riscv/Makefile > > +++ b/arch/riscv/Makefile > > @@ -9,9 +9,12 @@ > > # > > > > OBJCOPYFLAGS:= -O binary > > -LDFLAGS_vmlinux := > > +ifeq ($(CONFIG_RELOCATABLE),y) > > +LDFLAGS_vmlinux += -shared -Bsymbolic -z notext -z norelro > > +KBUILD_CFLAGS += -fPIE > > +endif > > ifeq ($(CONFIG_DYNAMIC_FTRACE),y) > > -LDFLAGS_vmlinux := --no-relax > > +LDFLAGS_vmlinux += --no-relax > > KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY > > CC_FLAGS_FTRACE := -fpatchable-function-entry=8 > > endif > > diff --git a/arch/riscv/kernel/vmlinux.lds.S > > b/arch/riscv/kernel/vmlinux.lds.S > > index 5104f3a871e3..862a8c09723c 100644 > > --- a/arch/riscv/kernel/vmlinux.lds.S > > +++ b/arch/riscv/kernel/vmlinux.lds.S > > @@ -133,6 +133,12 @@ SECTIONS > > > > BSS_SECTION(PAGE_SIZE, PAGE_SIZE, 0) > > > > +.rela.dyn : ALIGN(8) { > > +__rela_dyn_start = .; > > +*(.rela .rela*) > > +__rela_dyn_end = .; > > +} > > + > > #ifdef CONFIG_EFI > > . =
Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
On Fri, Feb 18, 2022 at 10:21:12AM +, Joakim Tjernlund wrote: > I think you could apply them as is, only criticism was the commit msgs. That is always a good reason to reject a change. Please resubmit them with the commit message cleaned up and I will be glad to review it again. thanks, greg k-h
Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
I think you could apply them as is, only criticism was the commit msgs. Jocke From: Thorsten Leemhuis Sent: 18 February 2022 08:11 To: Leo Li; Joakim Tjernlund; eugene_bordenkirc...@selinc.com; linux-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Cc: gre...@linuxfoundation.org; ba...@kernel.org Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop. Hi, this is your Linux kernel regression tracker speaking. Top-posting for once, to make this easy accessible to everyone. Sadly it looks to me like nobody is going to address this (quite old) regression (that afaic only very few people will hit), despite the rough patch to fix it that was already posted and tested in this thread. Well, guess that's how it is sometimes. Marking it as "on back burner" in regzbot to reduce the noise there: #regzbot backburner: Tested patch available, but things nevertheless got stuck Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them and lack knowledge about most of the areas they concern. I thus unfortunately will sometimes get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. #regzbot poke On 20.01.22 13:54, Thorsten Leemhuis wrote: > Hi, this is your Linux kernel regression tracker speaking. > > On 04.12.21 01:40, Leo Li wrote: >>> -Original Message- >>> From: Joakim Tjernlund >>> Sent: Thursday, December 2, 2021 4:45 PM >>> To: regressi...@leemhuis.info; Leo Li ; >>> eugene_bordenkirc...@selinc.com; linux-...@vger.kernel.org; linuxppc- >>> d...@lists.ozlabs.org >>> Cc: gre...@linuxfoundation.org; ba...@kernel.org >>> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to >>> unrecoverable loop. >>> >>> On Thu, 2021-12-02 at 20:35 +, Leo Li wrote: > -Original Message- > From: Joakim Tjernlund > Sent: Wednesday, December 1, 2021 8:19 AM > To: regressi...@leemhuis.info; Leo Li ; > eugene_bordenkirc...@selinc.com; linux-...@vger.kernel.org; > linuxppc- d...@lists.ozlabs.org > Cc: gre...@linuxfoundation.org; ba...@kernel.org > Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list > leads to unrecoverable loop. > > On Tue, 2021-11-30 at 12:56 +0100, Joakim Tjernlund wrote: >> On Mon, 2021-11-29 at 23:48 +, Eugene Bordenkircher wrote: >>> Agreed, >>> >>> We are happy pick up the torch on this, but I'd like to try and >>> hear from > Joakim first before we do. The patch set is his, so I'd like to > give him the opportunity. I think he's the only one that can add a > truly proper description as well because he mentioned that this > includes a "few more fixes" than just the one we ran into. I'd > rather hear from him than try to reverse engineer what was being >>> addressed. >>> >>> Joakim, if you are still watching the thread, would you like to >>> take a stab > at it? If I don't hear from you in a couple days, we'll pick up the > torch and do what we can. > > Did anything happen? Sure, it's a old regression from the v3.4-rc4 days, > but there iirc was already a tested proto-patch in that thread that > fixes the issue. Or was progress made and I just missed it? > > Ciao, Thorsten > > P.S.: As a Linux kernel regression tracker I'm getting a lot of reports > on my table. I can only look briefly into most of them. Unfortunately > therefore I sometimes will get things wrong or miss something important. > I hope that's not the case here; if you think it is, don't hesitate to > tell me about it in a public reply, that's in everyone's interest. > > BTW, I have no personal interest in this issue, which is tracked using > regzbot, my Linux kernel regression tracking bot > (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinux-regtracking.leemhuis.info%2Fregzbot%2Fdata=04%7C01%7Cjoakim.tjernlund%40infinera.com%7C8784242cb55d4627e61608d9f2adec23%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637807651100768999%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=dQS2xwqJjHY4DqSawLZKoe0XZaBAqPLul5YgPdQWFio%3Dreserved=0). > I'm only posting > this mail to get things rolling again and hence don't need to be CC on > all further activities wrt to this regression. > > #regzbot ignore-activity > >> I am far away from this now and still on 4.19. I don't mind if you >> tweak > tweak the patches for better "upstreamability" > > Even better would be to migrate to the chipidea driver, I am told > just a few tweaks are needed but this is probably something NXP > should do as they have access to
Re: [PATCH v2 00/18] clean up asm/uaccess.h, kill set_fs for good
Le 18/02/2022 à 02:50, Al Viro a écrit : > On Thu, Feb 17, 2022 at 07:20:11AM +, Christophe Leroy wrote: > >> And we have also >> user_access_begin()/user_read_access_begin()/user_write_access_begin() >> which call access_ok() then do the real work. Could be made generic with >> call to some arch specific __user_access_begin() and friends after the >> access_ok() and eventually the might_fault(). > > Not a good idea, considering the fact that we do not want to invite > uses of "faster" variants... I'm not sure I understand your concern. Today in powerpc we have: static __must_check inline bool user_read_access_begin(const void __user *ptr, size_t len) { if (unlikely(!access_ok(ptr, len))) return false; might_fault(); allow_read_from_user(ptr, len); return true; } We could instead have a generic static __must_check inline bool user_read_access_begin(const void __user *ptr, size_t len) { if (unlikely(!access_ok(ptr, len))) return false; might_fault(); return arch_user_read_access_begin(ptr, len); } And then a powerpc specific static __must_check __always_inline bool arch_user_read_access_begin(const void __user *ptr, size_t len) { allow_read_from_user(ptr, len); return true; } #define arch_user_read_access_begin arch_user_read_access_begin And a generic fallback for arch_user_read_access_begin() that does nothing at all. Do you mean that in that case people might be tempted to use arch_user_read_access_begin() instead of using user_read_access_begin() ? If that's the case isn't it something we could verify via checkpatch.pl ? Today it seems to be problematic that functions in asm/uaccess.h use access_ok(). Such an approach would allow to get rid of access_ok() use in architecture's uaccess.h Christophe
RE: [PATCH v2 13/18] uaccess: generalize access_ok()
From: Andy Lutomirski > Sent: 17 February 2022 19:15 ... > This isn't actually optimal. On x86, TASK_SIZE_MAX is a bizarre > constant that has a very specific value to work around a bug^Wdesign > error^Wfeature of Intel CPUs. TASK_SIZE_MAX is the maximum address at > which userspace is permitted to allocate memory, but there is a huge > gap between user and kernel addresses, and any value in the gap would > be adequate for the comparison. If we wanted to optimize this, simply > checking the high bit (which x86 can do without any immediate > constants at all) would be sufficient and, for an access known to fit > in 32 bits, one could get even fancier and completely ignore the size > of the access. (For accesses not known to fit in 32 bits, I suspect > some creativity could still come up with a construction that's > substantially faster than the one in your patch.) > > So there's plenty of room for optimization here. > > (This is not in any respect a NAK -- it's just an observation that > this could be even better.) For 64bit arch that use the top bit to separate user/kernel you can test '(addr | size) >> 62)'. The compiler optimises out constant sizes. This has all been mentioned a lot of times. You do get different fault types. OTOH an explicit check for constant size (less than something big) can use the cheaper test of the sign bit. Big constant sizes could be compile time errors. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH v2 10/18] m68k: fix access_ok for coldfire
On Fri, Feb 18, 2022 at 10:00 AM Geert Uytterhoeven wrote: > > /* We let the MMU do all checking */ > > -static inline int access_ok(const void __user *addr, > > +static inline int access_ok(const void __user *ptr, > > unsigned long size) > > { > > + unsigned long limit = TASK_SIZE; > > + unsigned long addr = (unsigned long)ptr; > > + > > /* > > * XXX: for !CONFIG_CPU_HAS_ADDRESS_SPACES this really needs to > > check > > * for TASK_SIZE! > > +* Removing this helper is probably sufficient. > > */ > > Shouldn't the above comment block be removed completely, > as this is now implemented below? Yes, obviously. Fixed now. > > - return 1; > > + if (IS_ENABLED(CONFIG_CPU_HAS_ADDRESS_SPACES)) > > + return 1; I just noticed this should have the same change that I made for the generic version, changed it now to + if (IS_ENABLED(CONFIG_CPU_HAS_ADDRESS_SPACES) || + !IS_ENABLED(CONFIG_MMU)) This is gone again after the cleanup patch, when the generic version is used instead. > > + return (size <= limit) && (addr <= (limit - size)); > > } > > Any pesky compilers that warn (or worse with -Werror) about > "condition always true" for TASK_SIZE = 0xUL? No, using a local variable avoids this warning. Arnd
Re: [PATCH v2 00/18] clean up asm/uaccess.h, kill set_fs for good
n Fri, Feb 18, 2022 at 3:21 AM Al Viro wrote: > > On Thu, Feb 17, 2022 at 08:49:59AM +0100, Arnd Bergmann wrote: > > > Same here: architectures can already provide a __put_user_fn() > > and __get_user_fn(), to get the generic versions of the interface, > > but few architectures use that. You can actually get all the interfaces > > by just providing raw_copy_from_user() and raw_copy_to_user(), > > but the get_user/put_user versions you get from that are fairly > > inefficient. > > FWIW, __{get,put}_user_{8,16,32,64} would probably make it easier to > unify. That's where the really variable part tends to be, anyway. > IMO __get_user_fn() had been a mistake. I've prototyped this now, to see what this might look like, see https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/commit/?h=generic-get_user-prototype This adds generic inline version of {__get,get,__put,put}_user() and converts x86 to (optionally) use it. This builds with gcc-5 through gcc-11 on 32-bit and 64-bit x86, using asm-goto with outputs where possible, and requiring a minimum set of macro definitions from the architecture. Compiling with clang produces no warnings but does cause a linker issue at the moment, so there is probably at least one bug in it. Aside from compile-testing, I have not tried to verify if this is correct or efficient, but let me know if you think this is headed in the right direction. > One thing I somewhat dislike about the series is the boilerplate in > asm/uaccess.h instances - #include in > a lot of them might make sense as a transitory state, but getting > stuck with those indefinitely... Christoph also complained about it, the problem for now is that asm-generic/access_ok.h must first see the macro definitions for architectures that override any of the contents, but access_ok() itself is used at least in some of the asm/uaccess.h files as well, so it must be included in the middle of it, until more of the uaccess.h implementation is moved to linux/uaccess.h in an architecture independent way. Would you prefer having an asm/access_ok.h that falls back to the asm-generic version but can have an architecture specific override when needed (ia64, arm64, x86, um)? > BTW, do we need user_addr_max() anymore? The definition in > asm-generic/access-ok.h is the only one, so ifndef around it is pointless. Right, the v2 changes got rid of the last override, so it could get hardcoded to TASK_SIZE_MAX, or we can convert the five references to just use that instead and remove it altogether: arch/arm64/kernel/traps.c: if (address >= user_addr_max()) { \ arch/parisc/kernel/signal.c:if (start >= user_addr_max() - sigframe_size) arch/parisc/kernel/signal.c:if (A([0]) >= user_addr_max() - 5 * sizeof(int)) lib/strncpy_from_user.c:max_addr = user_addr_max(); lib/strnlen_user.c: max_addr = user_addr_max(); user_addr_max() first showed up in architecture-independent code in c5389831cda3 ("sparc: Fix user_addr_max() definition."), and from that I think the original intent is no longer useful. Arnd
Re: [PATCH v2 13/18] uaccess: generalize access_ok()
On Wed, Feb 16, 2022 at 2:17 PM Arnd Bergmann wrote: > From: Arnd Bergmann > > There are many different ways that access_ok() is defined across > architectures, but in the end, they all just compare against the > user_addr_max() value or they accept anything. > > Provide one definition that works for most architectures, checking > against TASK_SIZE_MAX for user processes or skipping the check inside > of uaccess_kernel() sections. > > For architectures without CONFIG_SET_FS(), this should be the fastest > check, as it comes down to a single comparison of a pointer against a > compile-time constant, while the architecture specific versions tend to > do something more complex for historic reasons or get something wrong. > > Type checking for __user annotations is handled inconsistently across > architectures, but this is easily simplified as well by using an inline > function that takes a 'const void __user *' argument. A handful of > callers need an extra __user annotation for this. > > Some architectures had trick to use 33-bit or 65-bit arithmetic on the > addresses to calculate the overflow, however this simpler version uses > fewer registers, which means it can produce better object code in the > end despite needing a second (statically predicted) branch. > > Reviewed-by: Christoph Hellwig > Acked-by: Mark Rutland [arm64, asm-generic] > Signed-off-by: Arnd Bergmann > arch/m68k/Kconfig.cpu | 1 + > arch/m68k/include/asm/uaccess.h | 19 + Acked-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v2 10/18] m68k: fix access_ok for coldfire
Hi Arnd, On Wed, Feb 16, 2022 at 2:17 PM Arnd Bergmann wrote: > From: Arnd Bergmann > > While most m68k platforms use separate address spaces for user > and kernel space, at least coldfire does not, and the other > ones have a TASK_SIZE that is less than the entire 4GB address > range. > > Using the default implementation of __access_ok() stops coldfire > user space from trivially accessing kernel memory. > > Signed-off-by: Arnd Bergmann Thanks for your patch! > --- a/arch/m68k/include/asm/uaccess.h > +++ b/arch/m68k/include/asm/uaccess.h > @@ -12,14 +12,21 @@ > #include > > /* We let the MMU do all checking */ > -static inline int access_ok(const void __user *addr, > +static inline int access_ok(const void __user *ptr, > unsigned long size) > { > + unsigned long limit = TASK_SIZE; > + unsigned long addr = (unsigned long)ptr; > + > /* > * XXX: for !CONFIG_CPU_HAS_ADDRESS_SPACES this really needs to check > * for TASK_SIZE! > +* Removing this helper is probably sufficient. > */ Shouldn't the above comment block be removed completely, as this is now implemented below? > - return 1; > + if (IS_ENABLED(CONFIG_CPU_HAS_ADDRESS_SPACES)) > + return 1; > + > + return (size <= limit) && (addr <= (limit - size)); > } Any pesky compilers that warn (or worse with -Werror) about "condition always true" for TASK_SIZE = 0xUL? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v2 08/18] uaccess: add generic __{get,put}_kernel_nofault
On Wed, Feb 16, 2022 at 2:17 PM Arnd Bergmann wrote: > From: Arnd Bergmann > > Nine architectures are still missing __{get,put}_kernel_nofault: > alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa. > > Add a generic version that lets everything use the normal > copy_{from,to}_kernel_nofault() code based on these, removing the last > use of get_fs()/set_fs() from architecture-independent code. > > Reviewed-by: Christoph Hellwig > Signed-off-by: Arnd Bergmann > arch/m68k/include/asm/uaccess.h | 2 - Acked-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
RE: [PATCH net v3] net: Force inlining of checksum functions in net/checksum.h
From: Masahiro Yamada > Sent: 17 February 2022 17:27 > > On Fri, Feb 18, 2022 at 1:49 AM David Laight wrote: > > > > From: Masahiro Yamada > > > Sent: 17 February 2022 16:17 > > ... > > > No. Not that one. > > > > > > The commit you presumably want to revert is: > > > > > > a771f2b82aa2 ("[PATCH] Add a section about inlining to > > > Documentation/CodingStyle") > > > > > > This is now referred to as "__always_inline disease", though. > > > > That description is largely fine. > > > > Inappropriate 'inline' ought to be removed. > > Then 'inline' means - 'really do inline this'. > > > You cannot change "static inline" to "static" > in header files. You'd need some 'magicary' to get an extern except for a special include that generated the visible function. It has been done. > If "static inline" meant __always_inline, > there would be no way to negate it. > That's why we need both inline and __always_inline. I'd go the other way, 'inline' and 'inline_for_code_bloat' (or maybe inline_for_speed). Much the same as the noinline's to stop stack bloat. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH] net/ibmvnic: Cleanup workaround doing an EOI after partition migration
There were a fair amount of changes to workaround a firmware bug leaving a pending interrupt after migration of the ibmvnic device : commit 2df5c60e198c ("net/ibmvnic: Ignore H_FUNCTION return from H_EOI to tolerate XIVE mode") commit 284f87d2f387 ("Revert "net/ibmvnic: Fix EOI when running in XIVE mode"") commit 11d49ce9f794 ("net/ibmvnic: Fix EOI when running in XIVE mode.") commit f23e0643cd0b ("ibmvnic: Clear pending interrupt after device reset") Here is the final one taking into account the XIVE interrupt mode. Cc: Sukadev Bhattiprolu Cc: Dany Madden Signed-off-by: Cédric Le Goater --- drivers/net/ethernet/ibm/ibmvnic.c | 35 ++ 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 29617a86b299..61975cb9c1a4 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include #include @@ -3562,6 +3563,30 @@ static int disable_scrq_irq(struct ibmvnic_adapter *adapter, return rc; } +/* We can not use the IRQ chip EOI handler because that has the + * unintended effect of changing the interrupt priority. + */ +static void ibmvnic_xics_eoi(struct device *dev, struct ibmvnic_sub_crq_queue *scrq) +{ + u64 val = 0xff00 | scrq->hw_irq; + unsigned long rc; + + rc = plpar_hcall_norets(H_EOI, val); + if (rc) + dev_err(dev, "H_EOI FAILED irq 0x%llx. rc=%ld\n", val, rc); +} + +/* Due to a firmware bug, the hypervisor can send an interrupt to a + * transmit or receive queue just prior to a partition migration. + * Force an EOI after migration. + */ +static void ibmvnic_clear_pending_interrupt(struct device *dev, + struct ibmvnic_sub_crq_queue *scrq) +{ + if (!xive_enabled()) + ibmvnic_xics_eoi(dev, scrq); +} + static int enable_scrq_irq(struct ibmvnic_adapter *adapter, struct ibmvnic_sub_crq_queue *scrq) { @@ -3575,15 +3600,7 @@ static int enable_scrq_irq(struct ibmvnic_adapter *adapter, if (test_bit(0, >resetting) && adapter->reset_reason == VNIC_RESET_MOBILITY) { - u64 val = (0xff00) | scrq->hw_irq; - - rc = plpar_hcall_norets(H_EOI, val); - /* H_EOI would fail with rc = H_FUNCTION when running -* in XIVE mode which is expected, but not an error. -*/ - if (rc && (rc != H_FUNCTION)) - dev_err(dev, "H_EOI FAILED irq 0x%llx. rc=%ld\n", - val, rc); + ibmvnic_clear_pending_interrupt(dev, scrq); } rc = plpar_hcall_norets(H_VIOCTL, adapter->vdev->unit_address, -- 2.34.1