Re: [PATCH v4] lockdown,selinux: fix wrong subject in some SELinux lockdown checks
On Mon, Sep 13, 2021 at 5:05 PM Paul Moore wrote: > > On Mon, Sep 13, 2021 at 10:02 AM Ondrej Mosnacek wrote: > > > > Commit 59438b46471a ("security,lockdown,selinux: implement SELinux > > lockdown") added an implementation of the locked_down LSM hook to > > SELinux, with the aim to restrict which domains are allowed to perform > > operations that would breach lockdown. > > > > However, in several places the security_locked_down() hook is called in > > situations where the current task isn't doing any action that would > > directly breach lockdown, leading to SELinux checks that are basically > > bogus. > > > > To fix this, add an explicit struct cred pointer argument to > > security_lockdown() and define NULL as a special value to pass instead > > of current_cred() in such situations. LSMs that take the subject > > credentials into account can then fall back to some default or ignore > > such calls altogether. In the SELinux lockdown hook implementation, use > > SECINITSID_KERNEL in case the cred argument is NULL. > > > > Most of the callers are updated to pass current_cred() as the cred > > pointer, thus maintaining the same behavior. The following callers are > > modified to pass NULL as the cred pointer instead: > > 1. arch/powerpc/xmon/xmon.c > > Seems to be some interactive debugging facility. It appears that > > the lockdown hook is called from interrupt context here, so it > > should be more appropriate to request a global lockdown decision. > > 2. fs/tracefs/inode.c:tracefs_create_file() > > Here the call is used to prevent creating new tracefs entries when > > the kernel is locked down. Assumes that locking down is one-way - > > i.e. if the hook returns non-zero once, it will never return zero > > again, thus no point in creating these files. Also, the hook is > > often called by a module's init function when it is loaded by > > userspace, where it doesn't make much sense to do a check against > > the current task's creds, since the task itself doesn't actually > > use the tracing functionality (i.e. doesn't breach lockdown), just > > indirectly makes some new tracepoints available to whoever is > > authorized to use them. > > 3. net/xfrm/xfrm_user.c:copy_to_user_*() > > Here a cryptographic secret is redacted based on the value returned > > from the hook. There are two possible actions that may lead here: > > a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the > > task context is relevant, since the dumped data is sent back to > > the current task. > > b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the > > dumped SA is broadcasted to tasks subscribed to XFRM events - > > here the current task context is not relevant as it doesn't > > represent the tasks that could potentially see the secret. > > It doesn't seem worth it to try to keep using the current task's > > context in the a) case, since the eventual data leak can be > > circumvented anyway via b), plus there is no way for the task to > > indicate that it doesn't care about the actual key value, so the > > check could generate a lot of "false alert" denials with SELinux. > > Thus, let's pass NULL instead of current_cred() here faute de > > mieux. > > > > Improvements-suggested-by: Casey Schaufler > > Improvements-suggested-by: Paul Moore > > Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux > > lockdown") > > Acked-by: Dan Williams [cxl] > > Acked-by: Steffen Klassert [xfrm] > > Signed-off-by: Ondrej Mosnacek > > --- > > > > v4: > > - rebase on top of TODO > > - fix rebase conflicts: > > * drivers/cxl/pci.c > > - trivial: the lockdown reason was corrected in mainline > > * kernel/bpf/helpers.c, kernel/trace/bpf_trace.c > > - trivial: LOCKDOWN_BPF_READ was renamed to LOCKDOWN_BPF_READ_KERNEL > > in mainline > > * kernel/power/hibernate.c > > - trivial: !secretmem_active() was added to the condition in > > hibernation_available() > > - cover new security_locked_down() call in kernel/bpf/helpers.c > > (LOCKDOWN_BPF_WRITE_USER in BPF_FUNC_probe_write_user case) > > > > v3: > > https://lore.kernel.org/lkml/20210616085118.1141101-1-omosn...@redhat.com/ > > - add the cred argument to security_locked_down() and adapt all callers > > - keep using current_cred() in BPF, as the hook calls have been shifted > > to program load time (commit ff40e51043af ("bpf, lockdown, audit: Fix > > buggy SELinux lockdown permission checks")) > > - in SELinux, don't ignore hook calls where cred == NULL, but use > > SECINITSID_KERNEL as the subject instead > > - update explanations in the commit message > > > > v2: > > https://lore.kernel.org/lkml/20210517092006.803332-1-omosn...@redhat.com/ > > - change to a single hook based on suggestions by Casey Schaufler > > > > v1: > >
Re: [PATCH v6 4/4] powerpc/64s: Initialize and use a temporary mm for patching on Radix
On Thu, Sep 16, 2021 at 10:40 AM Christopher M. Riedl wrote: > > On Tue Sep 14, 2021 at 11:24 PM CDT, Jordan Niethe wrote: > > On Sat, Sep 11, 2021 at 12:39 PM Christopher M. Riedl > > wrote: > > > ... > > > +/* > > > + * This can be called for kernel text or a module. > > > + */ > > > +static int map_patch_mm(const void *addr, struct patch_mapping > > > *patch_mapping) > > > +{ > > > + struct page *page; > > > + struct mm_struct *patching_mm = __this_cpu_read(cpu_patching_mm); > > > + unsigned long patching_addr = __this_cpu_read(cpu_patching_addr); > > > + > > > + if (is_vmalloc_or_module_addr(addr)) > > > + page = vmalloc_to_page(addr); > > > + else > > > + page = virt_to_page(addr); > > > + > > > + patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr, > > > +_mapping->ptl); > > > + if (unlikely(!patch_mapping->ptep)) { > > > + pr_warn("map patch: failed to allocate pte for > > > patching\n"); > > > + return -1; > > > + } > > > + > > > + set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, > > > + pte_mkdirty(mk_pte(page, PAGE_KERNEL))); > > > > I think because switch_mm_irqs_off() will not necessarily have a > > barrier so a ptesync would be needed. > > A spurious fault here from __patch_instruction() would not be handled > > correctly. > > Sorry I don't quite follow - can you explain this to me in a bit more > detail? radix__set_pte_at() skips calling ptesync as an optimization. If there is no ordering between changing the pte and then accessing the page with __patch_instruction(), a spurious fault could be raised. I think such a fault would end up being causing bad_kernel_fault() -> true and would not be fixed up. I thought there might be a barrier in switch_mm_irqs_off() that would provide this ordering but afaics that is not always the case. So I think that we need to have a ptesync after set_pte_at().
Re: [PATCH v6 4/4] powerpc/64s: Initialize and use a temporary mm for patching on Radix
On Thu, Sep 16, 2021 at 10:38 AM Christopher M. Riedl wrote: > > On Sat Sep 11, 2021 at 4:14 AM CDT, Jordan Niethe wrote: > > On Sat, Sep 11, 2021 at 12:39 PM Christopher M. Riedl > > wrote: > > > > > > When code patching a STRICT_KERNEL_RWX kernel the page containing the > > > address to be patched is temporarily mapped as writeable. Currently, a > > > per-cpu vmalloc patch area is used for this purpose. While the patch > > > area is per-cpu, the temporary page mapping is inserted into the kernel > > > page tables for the duration of patching. The mapping is exposed to CPUs > > > other than the patching CPU - this is undesirable from a hardening > > > perspective. Use a temporary mm instead which keeps the mapping local to > > > the CPU doing the patching. > > > > > > Use the `poking_init` init hook to prepare a temporary mm and patching > > > address. Initialize the temporary mm by copying the init mm. Choose a > > > randomized patching address inside the temporary mm userspace address > > > space. The patching address is randomized between PAGE_SIZE and > > > DEFAULT_MAP_WINDOW-PAGE_SIZE. > > > > > > Bits of entropy with 64K page size on BOOK3S_64: > > > > > > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) > > > > > > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB > > > bits of entropy = log2(128TB / 64K) > > > bits of entropy = 31 > > > > > > The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU > > > operates - by default the space above DEFAULT_MAP_WINDOW is not > > > available. Currently the Hash MMU does not use a temporary mm so > > > technically this upper limit isn't necessary; however, a larger > > > randomization range does not further "harden" this overall approach and > > > future work may introduce patching with a temporary mm on Hash as well. > > > > > > Randomization occurs only once during initialization at boot for each > > > possible CPU in the system. > > > > > > Introduce two new functions, map_patch_mm() and unmap_patch_mm(), to > > > respectively create and remove the temporary mapping with write > > > permissions at patching_addr. Map the page with PAGE_KERNEL to set > > > EAA[0] for the PTE which ignores the AMR (so no need to unlock/lock > > > KUAP) according to PowerISA v3.0b Figure 35 on Radix. > > > > > > Based on x86 implementation: > > > > > > commit 4fc19708b165 > > > ("x86/alternatives: Initialize temporary mm for patching") > > > > > > and: > > > > > > commit b3fd8e83ada0 > > > ("x86/alternatives: Use temporary mm for text poking") > > > > > > Signed-off-by: Christopher M. Riedl > > > > > > --- > > > > > > v6: * Small clean-ups (naming, formatting, style, etc). > > > * Call stop_using_temporary_mm() before pte_unmap_unlock() after > > >patching. > > > * Replace BUG_ON()s in poking_init() w/ WARN_ON()s. > > > > > > v5: * Only support Book3s64 Radix MMU for now. > > > * Use a per-cpu datastructure to hold the patching_addr and > > >patching_mm to avoid the need for a synchronization lock/mutex. > > > > > > v4: * In the previous series this was two separate patches: one to init > > >the temporary mm in poking_init() (unused in powerpc at the time) > > >and the other to use it for patching (which removed all the > > >per-cpu vmalloc code). Now that we use poking_init() in the > > >existing per-cpu vmalloc approach, that separation doesn't work > > >as nicely anymore so I just merged the two patches into one. > > > * Preload the SLB entry and hash the page for the patching_addr > > >when using Hash on book3s64 to avoid taking an SLB and Hash fault > > >during patching. The previous implementation was a hack which > > >changed current->mm to allow the SLB and Hash fault handlers to > > >work with the temporary mm since both of those code-paths always > > >assume mm == current->mm. > > > * Also (hmm - seeing a trend here) with the book3s64 Hash MMU we > > >have to manage the mm->context.active_cpus counter and mm cpumask > > >since they determine (via mm_is_thread_local()) if the TLB flush > > >in pte_clear() is local or not - it should always be local when > > >we're using the temporary mm. On book3s64's Radix MMU we can > > >just call local_flush_tlb_mm(). > > > * Use HPTE_USE_KERNEL_KEY on Hash to avoid costly lock/unlock of > > >KUAP. > > > --- > > > arch/powerpc/lib/code-patching.c | 119 +-- > > > 1 file changed, 112 insertions(+), 7 deletions(-) > > > > > > diff --git a/arch/powerpc/lib/code-patching.c > > > b/arch/powerpc/lib/code-patching.c > > > index e802e42c2789..af8e2a02a9dd 100644 > > > --- a/arch/powerpc/lib/code-patching.c > > > +++ b/arch/powerpc/lib/code-patching.c > > > @@ -11,6 +11,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > >
Re: [PATCH v6 4/4] powerpc/64s: Initialize and use a temporary mm for patching on Radix
On Tue Sep 14, 2021 at 11:24 PM CDT, Jordan Niethe wrote: > On Sat, Sep 11, 2021 at 12:39 PM Christopher M. Riedl > wrote: > > ... > > +/* > > + * This can be called for kernel text or a module. > > + */ > > +static int map_patch_mm(const void *addr, struct patch_mapping > > *patch_mapping) > > +{ > > + struct page *page; > > + struct mm_struct *patching_mm = __this_cpu_read(cpu_patching_mm); > > + unsigned long patching_addr = __this_cpu_read(cpu_patching_addr); > > + > > + if (is_vmalloc_or_module_addr(addr)) > > + page = vmalloc_to_page(addr); > > + else > > + page = virt_to_page(addr); > > + > > + patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr, > > +_mapping->ptl); > > + if (unlikely(!patch_mapping->ptep)) { > > + pr_warn("map patch: failed to allocate pte for patching\n"); > > + return -1; > > + } > > + > > + set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, > > + pte_mkdirty(mk_pte(page, PAGE_KERNEL))); > > I think because switch_mm_irqs_off() will not necessarily have a > barrier so a ptesync would be needed. > A spurious fault here from __patch_instruction() would not be handled > correctly. Sorry I don't quite follow - can you explain this to me in a bit more detail?
Re: [PATCH v6 4/4] powerpc/64s: Initialize and use a temporary mm for patching on Radix
On Sat Sep 11, 2021 at 4:14 AM CDT, Jordan Niethe wrote: > On Sat, Sep 11, 2021 at 12:39 PM Christopher M. Riedl > wrote: > > > > When code patching a STRICT_KERNEL_RWX kernel the page containing the > > address to be patched is temporarily mapped as writeable. Currently, a > > per-cpu vmalloc patch area is used for this purpose. While the patch > > area is per-cpu, the temporary page mapping is inserted into the kernel > > page tables for the duration of patching. The mapping is exposed to CPUs > > other than the patching CPU - this is undesirable from a hardening > > perspective. Use a temporary mm instead which keeps the mapping local to > > the CPU doing the patching. > > > > Use the `poking_init` init hook to prepare a temporary mm and patching > > address. Initialize the temporary mm by copying the init mm. Choose a > > randomized patching address inside the temporary mm userspace address > > space. The patching address is randomized between PAGE_SIZE and > > DEFAULT_MAP_WINDOW-PAGE_SIZE. > > > > Bits of entropy with 64K page size on BOOK3S_64: > > > > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) > > > > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB > > bits of entropy = log2(128TB / 64K) > > bits of entropy = 31 > > > > The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU > > operates - by default the space above DEFAULT_MAP_WINDOW is not > > available. Currently the Hash MMU does not use a temporary mm so > > technically this upper limit isn't necessary; however, a larger > > randomization range does not further "harden" this overall approach and > > future work may introduce patching with a temporary mm on Hash as well. > > > > Randomization occurs only once during initialization at boot for each > > possible CPU in the system. > > > > Introduce two new functions, map_patch_mm() and unmap_patch_mm(), to > > respectively create and remove the temporary mapping with write > > permissions at patching_addr. Map the page with PAGE_KERNEL to set > > EAA[0] for the PTE which ignores the AMR (so no need to unlock/lock > > KUAP) according to PowerISA v3.0b Figure 35 on Radix. > > > > Based on x86 implementation: > > > > commit 4fc19708b165 > > ("x86/alternatives: Initialize temporary mm for patching") > > > > and: > > > > commit b3fd8e83ada0 > > ("x86/alternatives: Use temporary mm for text poking") > > > > Signed-off-by: Christopher M. Riedl > > > > --- > > > > v6: * Small clean-ups (naming, formatting, style, etc). > > * Call stop_using_temporary_mm() before pte_unmap_unlock() after > >patching. > > * Replace BUG_ON()s in poking_init() w/ WARN_ON()s. > > > > v5: * Only support Book3s64 Radix MMU for now. > > * Use a per-cpu datastructure to hold the patching_addr and > >patching_mm to avoid the need for a synchronization lock/mutex. > > > > v4: * In the previous series this was two separate patches: one to init > >the temporary mm in poking_init() (unused in powerpc at the time) > >and the other to use it for patching (which removed all the > >per-cpu vmalloc code). Now that we use poking_init() in the > >existing per-cpu vmalloc approach, that separation doesn't work > >as nicely anymore so I just merged the two patches into one. > > * Preload the SLB entry and hash the page for the patching_addr > >when using Hash on book3s64 to avoid taking an SLB and Hash fault > >during patching. The previous implementation was a hack which > >changed current->mm to allow the SLB and Hash fault handlers to > >work with the temporary mm since both of those code-paths always > >assume mm == current->mm. > > * Also (hmm - seeing a trend here) with the book3s64 Hash MMU we > >have to manage the mm->context.active_cpus counter and mm cpumask > >since they determine (via mm_is_thread_local()) if the TLB flush > >in pte_clear() is local or not - it should always be local when > >we're using the temporary mm. On book3s64's Radix MMU we can > >just call local_flush_tlb_mm(). > > * Use HPTE_USE_KERNEL_KEY on Hash to avoid costly lock/unlock of > >KUAP. > > --- > > arch/powerpc/lib/code-patching.c | 119 +-- > > 1 file changed, 112 insertions(+), 7 deletions(-) > > > > diff --git a/arch/powerpc/lib/code-patching.c > > b/arch/powerpc/lib/code-patching.c > > index e802e42c2789..af8e2a02a9dd 100644 > > --- a/arch/powerpc/lib/code-patching.c > > +++ b/arch/powerpc/lib/code-patching.c > > @@ -11,6 +11,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -103,6 +104,7 @@ static inline void stop_using_temporary_mm(struct > > temp_mm *temp_mm) > > > > static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); > > static DEFINE_PER_CPU(unsigned long, cpu_patching_addr); > > +static DEFINE_PER_CPU(struct
Re: [PATCH v6 1/4] powerpc/64s: Introduce temporary mm for Radix MMU
On Sat Sep 11, 2021 at 3:26 AM CDT, Jordan Niethe wrote: > On Sat, Sep 11, 2021 at 12:35 PM Christopher M. Riedl > wrote: > > > > x86 supports the notion of a temporary mm which restricts access to > > temporary PTEs to a single CPU. A temporary mm is useful for situations > > where a CPU needs to perform sensitive operations (such as patching a > > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing > > said mappings to other CPUs. Another benefit is that other CPU TLBs do > > not need to be flushed when the temporary mm is torn down. > > > > Mappings in the temporary mm can be set in the userspace portion of the > > address-space. > > > > Interrupts must be disabled while the temporary mm is in use. HW > > breakpoints, which may have been set by userspace as watchpoints on > > addresses now within the temporary mm, are saved and disabled when > > loading the temporary mm. The HW breakpoints are restored when unloading > > the temporary mm. All HW breakpoints are indiscriminately disabled while > > the temporary mm is in use - this may include breakpoints set by perf. > > I had thought CPUs with a DAWR might not need to do this because the > privilege level that breakpoints trigger on can be configured. But it > turns out in ptrace, etc we use HW_BRK_TYPE_PRIV_ALL. Thanks for double checking :) > > > > > Based on x86 implementation: > > > > commit cefa929c034e > > ("x86/mm: Introduce temporary mm structs") > > > > Signed-off-by: Christopher M. Riedl > > > > --- > > > > v6: * Use {start,stop}_using_temporary_mm() instead of > >{use,unuse}_temporary_mm() as suggested by Christophe. > > > > v5: * Drop support for using a temporary mm on Book3s64 Hash MMU. > > > > v4: * Pass the prev mm instead of NULL to switch_mm_irqs_off() when > >using/unusing the temp mm as suggested by Jann Horn to keep > >the context.active counter in-sync on mm/nohash. > > * Disable SLB preload in the temporary mm when initializing the > >temp_mm struct. > > * Include asm/debug.h header to fix build issue with > >ppc44x_defconfig. > > --- > > arch/powerpc/include/asm/debug.h | 1 + > > arch/powerpc/kernel/process.c| 5 +++ > > arch/powerpc/lib/code-patching.c | 56 > > 3 files changed, 62 insertions(+) > > > > diff --git a/arch/powerpc/include/asm/debug.h > > b/arch/powerpc/include/asm/debug.h > > index 86a14736c76c..dfd82635ea8b 100644 > > --- a/arch/powerpc/include/asm/debug.h > > +++ b/arch/powerpc/include/asm/debug.h > > @@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs > > *regs) { return 0; } > > #endif > > > > void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk); > > +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk); > > bool ppc_breakpoint_available(void); > > #ifdef CONFIG_PPC_ADV_DEBUG_REGS > > extern void do_send_trap(struct pt_regs *regs, unsigned long address, > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > > index 50436b52c213..6aa1f5c4d520 100644 > > --- a/arch/powerpc/kernel/process.c > > +++ b/arch/powerpc/kernel/process.c > > @@ -865,6 +865,11 @@ static inline int set_breakpoint_8xx(struct > > arch_hw_breakpoint *brk) > > return 0; > > } > > > > +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk) > > +{ > > + memcpy(brk, this_cpu_ptr(_brk[nr]), sizeof(*brk)); > > +} > > The breakpoint code is already a little hard to follow. I'm worried > doing this might spread breakpoint handling into more places in the > future. > What about something like having a breakpoint_pause() function which > clears the hardware registers only and then a breakpoint_resume() > function that copies from current_brk[] back to the hardware > registers? > Then we don't have to make another copy of the breakpoint state. I think that sounds reasonable - I'll add those functions instead with the next spin. > > > + > > void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk) > > { > > memcpy(this_cpu_ptr(_brk[nr]), brk, sizeof(*brk)); > > diff --git a/arch/powerpc/lib/code-patching.c > > b/arch/powerpc/lib/code-patching.c > > index f9a3019e37b4..8d61a7d35b89 100644 > > --- a/arch/powerpc/lib/code-patching.c > > +++ b/arch/powerpc/lib/code-patching.c > > Sorry I might have missed it, but what was the reason for not putting > this stuff in mmu_context.h? x86 ended up moving this code into their code-patching file as well. I suppose nobody has thought of another use for a temporary mm like this yet :) > > > @@ -17,6 +17,9 @@ > > #include > > #include > > #include > > +#include > > +#include > > +#include > > > > static int __patch_instruction(u32 *exec_addr, struct ppc_inst instr, u32 > > *patch_addr) > > { > > @@ -45,6 +48,59 @@ int raw_patch_instruction(u32 *addr, struct ppc_inst > > instr) > > } > > > > #ifdef CONFIG_STRICT_KERNEL_RWX > > + > > +struct temp_mm { > > + struct
Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
On Wed, Sep 15, 2021 at 07:18:34PM +0200, Christophe Leroy wrote: > Could you please provide more explicit explanation why inlining such an > helper is considered as bad practice and messy ? Tom already told you to look at the previous threads. Let's read them together. This one, for example: https://lore.kernel.org/lkml/ysscwvpxevxw%2f...@infradead.org/ | > To take it out of line, I'm leaning towards the latter, creating a new | > file that is built based on the ARCH_HAS_PROTECTED_GUEST setting. | | Yes. In general everytime architectures have to provide the prototype | and not just the implementation of something we end up with a giant mess | sooner or later. In a few cases that is still warranted due to | performance concerns, but i don't think that is the case here. So I think what Christoph means here is that you want to have the generic prototype defined in a header and arches get to implement it exactly to the letter so that there's no mess. As to what mess exactly, I'd let him explain that. > Because as demonstrated in my previous response some days ago, taking that > outline ends up with an unneccessary ugly generated code and we don't > benefit front GCC's capability to fold in and opt out unreachable code. And this is real fast path where a couple of instructions matter or what? set_memory_encrypted/_decrypted doesn't look like one to me. > I can't see your point here. Inlining the function wouldn't add any > ifdeffery as far as I can see. If the function is touching defines etc, they all need to be visible. If that function needs to call other functions - which is the case on x86, perhaps not so much on power - then you need to either ifdef around them or provide stubs with ifdeffery in the headers. And you need to make them global functions instead of keeping them static to the same compilation unit, etc, etc. With a separate compilation unit, you don't need any of that and it is all kept in that single file. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
On 15.09.2021 15:37, Roman Skakun wrote: >>> From: Roman Skakun >>> >>> It is possible when default IO TLB size is not >>> enough to fit a long buffers as described here [1]. >>> >>> This patch makes a way to set this parameter >>> using cmdline instead of recompiling a kernel. >>> >>> [1] https://www.xilinx.com/support/answers/72694.html >> >> I'm not convinced the swiotlb use describe there falls under "intended >> use" - mapping a 1280x720 framebuffer in a single chunk? > > I had the same issue while mapping DMA chuck ~4MB for gem fb when > using xen vdispl. > I got the next log: > [ 142.030421] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz: > 3686400 bytes), total 32768 (slots), used 32 (slots) > > It happened when I tried to map bounce buffer, which has a large size. > The default size if 128(IO_TLB_SEGSIZE) * 2048(IO_TLB_SHIFT) = 262144 > bytes, but we requested 3686400 bytes. > When I change IO_TLB_SEGSIZE to 2048. (2048(IO_TLB_SEGSIZE) * > 2048(IO_TLB_SHIFT) = 4194304bytes). > It makes possible to retrieve a bounce buffer for requested size. > After changing this value, the problem is gone. But the question remains: Why does the framebuffer need to be mapped in a single giant chunk? >> In order to be sure to catch all uses like this one (including ones >> which make it upstream in parallel to yours), I think you will want >> to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE. > > I don't understand your point. Can you clarify this? There's a concrete present example: I have a patch pending adding another use of IO_TLB_SEGSIZE. This use would need to be replaced like you do here in several places. The need for the additional replacement would be quite obvious (from a build failure) if you renamed the manifest constant. Without renaming, it'll take someone running into an issue on a live system, which I consider far worse. This is because a simple re-basing of one of the patches on top of the other will not point out the need for the extra replacement, nor would a test build (with both patches in place). Jan
Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function
On 9/15/21 9:46 AM, Borislav Petkov wrote: Sathya, if you want to prepare the Intel variant intel_cc_platform_has() ontop of those and send it to me, that would be good because then I can integrate it all in one branch which can be used to base future work ontop. I have a Intel variant patch (please check following patch). But it includes TDX changes as well. Shall I move TDX changes to different patch and just create a separate patch for adding intel_cc_platform_has()? commit fc5f98a0ed94629d903827c5b44ee9295f835831 Author: Kuppuswamy Sathyanarayanan Date: Wed May 12 11:35:13 2021 -0700 x86/tdx: Add confidential guest support for TDX guest TDX architecture provides a way for VM guests to be highly secure and isolated (from untrusted VMM). To achieve this requirement, any data coming from VMM cannot be completely trusted. TDX guest fixes this issue by hardening the IO drivers against the attack from the VMM. So, when adding hardening fixes to the generic drivers, to protect custom fixes use cc_platform_has() API. Also add TDX guest support to cc_platform_has() API to protect the TDX specific fixes. Signed-off-by: Kuppuswamy Sathyanarayanan diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a5b14de03458..2e78358923a1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -871,6 +871,7 @@ config INTEL_TDX_GUEST depends on SECURITY select X86_X2APIC select SECURITY_LOCKDOWN_LSM + select ARCH_HAS_CC_PLATFORM help Provide support for running in a trusted domain on Intel processors equipped with Trusted Domain eXtensions. TDX is a new Intel diff --git a/arch/x86/include/asm/intel_cc_platform.h b/arch/x86/include/asm/intel_cc_platform.h new file mode 100644 index ..472c3174beac --- /dev/null +++ b/arch/x86/include/asm/intel_cc_platform.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2021 Intel Corporation */ +#ifndef _ASM_X86_INTEL_CC_PLATFORM_H +#define _ASM_X86_INTEL_CC_PLATFORM_H + +#if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_ARCH_HAS_CC_PLATFORM) +bool intel_cc_platform_has(unsigned int flag); +#else +static inline bool intel_cc_platform_has(unsigned int flag) { return false; } +#endif + +#endif /* _ASM_X86_INTEL_CC_PLATFORM_H */ + diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 3c9bacd3c3f3..e83bc2f48efe 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -10,11 +10,16 @@ #include #include #include +#include + +#include bool cc_platform_has(enum cc_attr attr) { if (sme_me_mask) return amd_cc_platform_has(attr); + else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) + return intel_cc_platform_has(attr); return false; } diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 8321c43554a1..ab486a3b1eb0 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -60,6 +61,21 @@ static u64 msr_test_ctrl_cache __ro_after_init; */ static bool cpu_model_supports_sld __ro_after_init; +#ifdef CONFIG_ARCH_HAS_CC_PLATFORM +bool intel_cc_platform_has(enum cc_attr attr) +{ + switch (attr) { + case CC_ATTR_GUEST_TDX: + return cpu_feature_enabled(X86_FEATURE_TDX_GUEST); + default: + return false; + } + + return false; +} +EXPORT_SYMBOL_GPL(intel_cc_platform_has); +#endif + /* * Processors which have self-snooping capability can handle conflicting * memory type across CPUs by snooping its own cache. However, there exists diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index 253f3ea66cd8..e38430e6e396 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -61,6 +61,15 @@ enum cc_attr { * Examples include SEV-ES. */ CC_ATTR_GUEST_STATE_ENCRYPT, + + /** +* @CC_ATTR_GUEST_TDX: Trusted Domain Extension Support +* +* The platform/OS is running as a TDX guest/virtual machine. +* +* Examples include SEV-ES. +*/ + CC_ATTR_GUEST_TDX, }; #ifdef CONFIG_ARCH_HAS_CC_PLATFORM -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
Le 15/09/2021 à 12:08, Borislav Petkov a écrit : On Wed, Sep 15, 2021 at 10:28:59AM +1000, Michael Ellerman wrote: I don't love it, a new C file and an out-of-line call to then call back to a static inline that for most configuration will return false ... but whatever :) Yeah, hch thinks it'll cause a big mess otherwise: https://lore.kernel.org/lkml/ysscwvpxevxw%2f...@infradead.org/ Could you please provide more explicit explanation why inlining such an helper is considered as bad practice and messy ? Because as demonstrated in my previous response some days ago, taking that outline ends up with an unneccessary ugly generated code and we don't benefit front GCC's capability to fold in and opt out unreachable code. As pointed by Michael in most cases the function will just return false so behind the performance concern, there is also the code size and code coverage topic that is to be taken into account. And even when the function doesn't return false, the only thing it does folds into a single powerpc instruction so there is really no point in making a dedicated out-of-line fonction for that and suffer the cost and the size of a function call and to justify the addition of a dedicated C file. I guess less ifdeffery is nice too. I can't see your point here. Inlining the function wouldn't add any ifdeffery as far as I can see. So, would you mind reconsidering your approach and allow architectures to provide inline implementation by just not enforcing a generic prototype ? Or otherwise provide more details and exemple of why the cons are more important versus the pros ? Thanks Christophe
Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function
On Wed, Sep 08, 2021 at 05:58:31PM -0500, Tom Lendacky wrote: > This patch series provides a generic helper function, cc_platform_has(), > to replace the sme_active(), sev_active(), sev_es_active() and > mem_encrypt_active() functions. > > It is expected that as new confidential computing technologies are > added to the kernel, they can all be covered by a single function call > instead of a collection of specific function calls all called from the > same locations. > > The powerpc and s390 patches have been compile tested only. Can the > folks copied on this series verify that nothing breaks for them. Also, > a new file, arch/powerpc/platforms/pseries/cc_platform.c, has been > created for powerpc to hold the out of line function. ... > > Tom Lendacky (8): > x86/ioremap: Selectively build arch override encryption functions > mm: Introduce a function to check for confidential computing features > x86/sev: Add an x86 version of cc_platform_has() > powerpc/pseries/svm: Add a powerpc version of cc_platform_has() > x86/sme: Replace occurrences of sme_active() with cc_platform_has() > x86/sev: Replace occurrences of sev_active() with cc_platform_has() > x86/sev: Replace occurrences of sev_es_active() with cc_platform_has() > treewide: Replace the use of mem_encrypt_active() with > cc_platform_has() Ok, modulo the minor things the plan is to take this through tip after -rc2 releases in order to pick up the powerpc build fix and have a clean base (-rc2) to base stuff on, at the same time. Pls holler if something's still amiss. Sathya, if you want to prepare the Intel variant intel_cc_platform_has() ontop of those and send it to me, that would be good because then I can integrate it all in one branch which can be used to base future work ontop. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
On Sat, 11 Sept 2021 at 03:19, Ricardo Neri wrote: > > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to > check for the idle state of the destination CPU, dst_cpu, but also of > its SMT siblings. > > If dst_cpu is idle but its SMT siblings are busy, performance suffers > if it pulls tasks from a medium priority CPU that does not have SMT > siblings. > > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT > siblings of both dst_cpu and the CPUs in the candidate busiest group. > > Cc: Aubrey Li > Cc: Ben Segall > Cc: Daniel Bristot de Oliveira > Cc: Dietmar Eggemann > Cc: Mel Gorman > Cc: Quentin Perret > Cc: Rafael J. Wysocki > Cc: Srinivas Pandruvada > Cc: Steven Rostedt > Cc: Tim Chen > Reviewed-by: Joel Fernandes (Google) > Reviewed-by: Len Brown > Signed-off-by: Ricardo Neri > --- > Changes since v4: > * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group. > (Vincent, Peter) > * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent) > * Updated function documentation and corrected a typo. > > Changes since v3: > * Removed the arch_asym_check_smt_siblings() hook. Discussions with the > powerpc folks showed that this patch should not impact them. Also, more > recent powerpc processor no longer use asym_packing. (PeterZ) > * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar) > * Removed unnecessary check for local CPUs when the local group has zero > utilization. (Joel) > * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect > the fact that it deals with SMT cases. > * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so > that callers can deal with non-SMT cases. > > Changes since v2: > * Reworded the commit message to reflect updates in code. > * Corrected misrepresentation of dst_cpu as the CPU doing the load > balancing. (PeterZ) > * Removed call to arch_asym_check_smt_siblings() as it is now called in > sched_asym(). > > Changes since v1: > * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull > tasks. Instead, reclassify the candidate busiest group, as it > may still be selected. (PeterZ) > * Avoid an expensive and unnecessary call to cpumask_weight() when > determining if a sched_group is comprised of SMT siblings. > (PeterZ). > --- > kernel/sched/fair.c | 94 + > 1 file changed, 94 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 26db017c14a3..8d763dd0174b 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct, > return group_has_spare; > } > > +/** > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull > tasks > + * @dst_cpu: Destination CPU of the load balancing > + * @sds: Load-balancing data with statistics of the local group > + * @sgs: Load-balancing statistics of the candidate busiest group > + * @sg:The candidate busiest group > + * > + * Check the state of the SMT siblings of both @sds::local and @sg and decide > + * if @dst_cpu can pull tasks. > + * > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more > of > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull > tasks > + * only if @dst_cpu has higher priority. > + * > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher > priority. > + * Bigger imbalances in the number of busy CPUs will be dealt with in > + * update_sd_pick_busiest(). > + * > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT > siblings > + * of @dst_cpu are idle and @sg has lower priority. > + */ > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds, > + struct sg_lb_stats *sgs, > + struct sched_group *sg) > +{ > +#ifdef CONFIG_SCHED_SMT > + bool local_is_smt, sg_is_smt; > + int sg_busy_cpus; > + > + local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY; > + sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY; > + > + sg_busy_cpus = sgs->group_weight - sgs->idle_cpus; > + > + if (!local_is_smt) { > + /* > +* If we are here, @dst_cpu is idle and does not have SMT > +* siblings. Pull tasks if candidate group has two or more > +* busy CPUs. > +*/ > + if (sg_is_smt && sg_busy_cpus >= 2) Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then sd_is_smt must be true ? Also, This is the default behavior where we want to even the number of busy cpu. Shouldn't you return false and fall back to the default behavior ? That being said, the default
[PATCH] powerpc: warn on emulation of dcbz instruction
dcbz instruction shouldn't be used on non-cached memory. Using it on non-cached memory can result in alignment exception and implies a heavy handling. Instead of silentely emulating the instruction and resulting in high performance degradation, warn whenever an alignment exception is taken due to dcbz, so that the user is made aware that dcbz instruction has been used unexpectedly. Reported-by: Stan Johnson Cc: Finn Thain Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/align.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index bbb4181621dd..adc3a4a9c6e4 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -349,6 +349,7 @@ int fix_alignment(struct pt_regs *regs) if (op.type != CACHEOP + DCBZ) return -EINVAL; PPC_WARN_ALIGNMENT(dcbz, regs); + WARN_ON_ONCE(1); r = emulate_dcbz(op.ea, regs); } else { if (type == LARX || type == STCX) -- 2.31.1
[PATCH] powerpc/32s: Fix kuap_kernel_restore()
At interrupt exit, kuap_kernel_restore() calls kuap_unclok() with the value contained in regs->kuap. However, when regs->kuap contains 0x it means that KUAP was not unlocked so calling kuap_unlock() is unrelevant and results in jeopardising the contents of kernel space segment registers. So check that regs->kuap doesn't contain KUAP_NONE before calling kuap_unlock(). In the meantime it also means that if KUAP has not been correcly locked back at interrupt exit, it must be locked before continuing. This is done by checking the content of current->thread.kuap which was returned by kuap_get_and_assert_locked() Fixes: 16132529cee5 ("powerpc/32s: Rework Kernel Userspace Access Protection") Reported-by: Stan Johnson Cc: Finn Thain Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/kup.h | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index d4b145b279f6..9f38040f0641 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -136,6 +136,14 @@ static inline void kuap_kernel_restore(struct pt_regs *regs, unsigned long kuap) if (kuap_is_disabled()) return; + if (unlikely(kuap != KUAP_NONE)) { + current->thread.kuap = KUAP_NONE; + kuap_lock(kuap, false); + } + + if (likely(regs->kuap == KUAP_NONE)) + return; + current->thread.kuap = regs->kuap; kuap_unlock(regs->kuap, false); -- 2.31.1
Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
On Wed, Sep 15, 2021 at 03:49:52PM +0200, Jan Beulich wrote: > But the question remains: Why does the framebuffer need to be mapped > in a single giant chunk? More importantly: if you use dynamic dma mappings for your framebuffer you're doing something wrong.
Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
Hi Jan, Thanks for the answer. >> From: Roman Skakun >> >> It is possible when default IO TLB size is not >> enough to fit a long buffers as described here [1]. >> >> This patch makes a way to set this parameter >> using cmdline instead of recompiling a kernel. >> >> [1] https://www.xilinx.com/support/answers/72694.html > > I'm not convinced the swiotlb use describe there falls under "intended > use" - mapping a 1280x720 framebuffer in a single chunk? I had the same issue while mapping DMA chuck ~4MB for gem fb when using xen vdispl. I got the next log: [ 142.030421] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz: 3686400 bytes), total 32768 (slots), used 32 (slots) It happened when I tried to map bounce buffer, which has a large size. The default size if 128(IO_TLB_SEGSIZE) * 2048(IO_TLB_SHIFT) = 262144 bytes, but we requested 3686400 bytes. When I change IO_TLB_SEGSIZE to 2048. (2048(IO_TLB_SEGSIZE) * 2048(IO_TLB_SHIFT) = 4194304bytes). It makes possible to retrieve a bounce buffer for requested size. After changing this value, the problem is gone. > the bottom of this page is also confusing, as following "Then we can > confirm the modified swiotlb size in the boot log:" there is a log > fragment showing the same original size of 64Mb. I suspect, this is a mistake in the article. According to https://elixir.bootlin.com/linux/v5.14.4/source/kernel/dma/swiotlb.c#L214 and https://elixir.bootlin.com/linux/v5.15-rc1/source/kernel/dma/swiotlb.c#L182 The IO_TLB_SEGSIZE is not used to calculate total size of swiotlb area. This explains why we have the same total size before and after changing of TLB segment size. > In order to be sure to catch all uses like this one (including ones > which make it upstream in parallel to yours), I think you will want > to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE. I don't understand your point. Can you clarify this? >> + /* get max IO TLB segment size */ >> + if (isdigit(*str)) { >> + tmp = simple_strtoul(str, , 0); >> + if (tmp) >> + io_tlb_seg_size = ALIGN(tmp, IO_TLB_SEGSIZE); > > From all I can tell io_tlb_seg_size wants to be a power of 2. Merely > aligning to a multiple of IO_TLB_SEGSIZE isn't going to be enough. Yes, right, thanks! Cheers, Roman. вт, 14 сент. 2021 г. в 18:29, Jan Beulich : > > On 14.09.2021 17:10, Roman Skakun wrote: > > From: Roman Skakun > > > > It is possible when default IO TLB size is not > > enough to fit a long buffers as described here [1]. > > > > This patch makes a way to set this parameter > > using cmdline instead of recompiling a kernel. > > > > [1] https://www.xilinx.com/support/answers/72694.html > > I'm not convinced the swiotlb use describe there falls under "intended > use" - mapping a 1280x720 framebuffer in a single chunk? (As an aside, > the bottom of this page is also confusing, as following "Then we can > confirm the modified swiotlb size in the boot log:" there is a log > fragment showing the same original size of 64Mb. > > > --- a/arch/mips/cavium-octeon/dma-octeon.c > > +++ b/arch/mips/cavium-octeon/dma-octeon.c > > @@ -237,7 +237,7 @@ void __init plat_swiotlb_setup(void) > > swiotlbsize = 64 * (1<<20); > > #endif > > swiotlb_nslabs = swiotlbsize >> IO_TLB_SHIFT; > > - swiotlb_nslabs = ALIGN(swiotlb_nslabs, IO_TLB_SEGSIZE); > > + swiotlb_nslabs = ALIGN(swiotlb_nslabs, swiotlb_io_seg_size()); > > In order to be sure to catch all uses like this one (including ones > which make it upstream in parallel to yours), I think you will want > to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE. > > > @@ -81,15 +86,30 @@ static unsigned int max_segment; > > static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; > > > > static int __init > > -setup_io_tlb_npages(char *str) > > +setup_io_tlb_params(char *str) > > { > > + unsigned long tmp; > > + > > if (isdigit(*str)) { > > - /* avoid tail segment of size < IO_TLB_SEGSIZE */ > > - default_nslabs = > > - ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE); > > + default_nslabs = simple_strtoul(str, , 0); > > } > > if (*str == ',') > > ++str; > > + > > + /* get max IO TLB segment size */ > > + if (isdigit(*str)) { > > + tmp = simple_strtoul(str, , 0); > > + if (tmp) > > + io_tlb_seg_size = ALIGN(tmp, IO_TLB_SEGSIZE); > > From all I can tell io_tlb_seg_size wants to be a power of 2. Merely > aligning to a multiple of IO_TLB_SEGSIZE isn't going to be enough. > > Jan > -- Best Regards, Roman.
[PATCH] video: fbdev: use memset_io() instead of memset()
While investigating a lockup at startup on Powerbook 3400C, it was identified that the fbdev driver generates alignment exception at startup: --- interrupt: 600 at memset+0x60/0xc0 NIP: c0021414 LR: c03fc49c CTR: 7fff REGS: ca021c10 TRAP: 0600 Tainted: GW (5.14.2-pmac-00727-g12a41fa69492) MSR: 9032 CR: 44008442 XER: 2100 DAR: cab80020 DSISR: 00017c07 GPR00: 0007 ca021cd0 c14412e0 cab8 0010 cab8001c 0004 GPR08: 0010 7fff 84008442 c0006fb4 GPR16: 0010 GPR24: 8180 0320 c15fa400 c14d1878 c14d1800 c094e19c NIP [c0021414] memset+0x60/0xc0 LR [c03fc49c] chipsfb_pci_init+0x160/0x580 --- interrupt: 600 [ca021cd0] [c03fc46c] chipsfb_pci_init+0x130/0x580 (unreliable) [ca021d20] [c03a3a70] pci_device_probe+0xf8/0x1b8 [ca021d50] [c043d584] really_probe.part.0+0xac/0x388 [ca021d70] [c043d914] __driver_probe_device+0xb4/0x170 [ca021d90] [c043da18] driver_probe_device+0x48/0x144 [ca021dc0] [c043e318] __driver_attach+0x11c/0x1c4 [ca021de0] [c043ad30] bus_for_each_dev+0x88/0xf0 [ca021e10] [c043c724] bus_add_driver+0x190/0x22c [ca021e40] [c043ee94] driver_register+0x9c/0x170 [ca021e60] [c0006c28] do_one_initcall+0x54/0x1ec [ca021ed0] [c08246e4] kernel_init_freeable+0x1c0/0x270 [ca021f10] [c0006fdc] kernel_init+0x28/0x11c [ca021f30] [c0017148] ret_from_kernel_thread+0x14/0x1c Instruction dump: 7d4601a4 39490777 7d4701a4 39490888 7d4801a4 39490999 7d4901a4 39290aaa 7d2a01a4 4c00012c 4bfffe88 0fe0 <4bfffe80> 9421fff0 38210010 48001970 This is due to 'dcbz' instruction being used on non-cached memory. 'dcbz' instruction is used by memset() to zeroize a complete cacheline at once, and memset() is not expected to be used on non cached memory. When performing a 'sparse' check on fbdev driver, it also appears that the use of memset() is unexpected: drivers/video/fbdev/chipsfb.c:334:17: warning: incorrect type in argument 1 (different address spaces) drivers/video/fbdev/chipsfb.c:334:17:expected void * drivers/video/fbdev/chipsfb.c:334:17:got char [noderef] __iomem *screen_base drivers/video/fbdev/chipsfb.c:334:15: warning: memset with byte count of 1048576 Use fb_memset() instead of memset(). fb_memset() is defined as memset_io() for powerpc. Fixes: 8c8709334cec ("[PATCH] ppc32: Remove CONFIG_PMAC_PBOOK") Reported-by: Stan Johnson Cc: Finn Thain Signed-off-by: Christophe Leroy --- drivers/video/fbdev/chipsfb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/video/fbdev/chipsfb.c b/drivers/video/fbdev/chipsfb.c index 998067b701fa..393894af26f8 100644 --- a/drivers/video/fbdev/chipsfb.c +++ b/drivers/video/fbdev/chipsfb.c @@ -331,7 +331,7 @@ static const struct fb_var_screeninfo chipsfb_var = { static void init_chips(struct fb_info *p, unsigned long addr) { - memset(p->screen_base, 0, 0x10); + fb_memset(p->screen_base, 0, 0x10); p->fix = chipsfb_fix; p->fix.smem_start = addr; -- 2.31.1
[PATCH] serial: 8250: SERIAL_8250_FSL should not default to y when compile-testing
Commit b1442c55ce8977aa ("serial: 8250: extend compile-test coverage") added compile-test support to the Freescale 16550 driver. However, as SERIAL_8250_FSL is an invisible symbol, merely enabling COMPILE_TEST now enables this driver. Fix this by making SERIAL_8250_FSL visible. Tighten the dependencies to prevent asking the user about this driver when configuring a kernel without appropriate Freescale SoC or ACPI support. Fixes: b1442c55ce8977aa ("serial: 8250: extend compile-test coverage") Signed-off-by: Geert Uytterhoeven --- Yes, it's ugly, but I see no better solution. Do you? drivers/tty/serial/8250/Kconfig | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig index 808268edd2e82a45..a2978b31144e94f2 100644 --- a/drivers/tty/serial/8250/Kconfig +++ b/drivers/tty/serial/8250/Kconfig @@ -361,9 +361,13 @@ config SERIAL_8250_BCM2835AUX If unsure, say N. config SERIAL_8250_FSL - bool + bool "Freescale 16550-style UART support (8250 based driver)" depends on SERIAL_8250_CONSOLE - default PPC || ARM || ARM64 || COMPILE_TEST + depends on FSL_SOC || ARCH_LAYERSCAPE || SOC_LS1021A || (ARM64 && ACPI) || COMPILE_TEST + default FSL_SOC || ARCH_LAYERSCAPE || SOC_LS1021A || (ARM64 && ACPI) + help + Selecting this option will add support for the 16550-style serial + port hardware found on Freescale SoCs. config SERIAL_8250_DW tristate "Support for Synopsys DesignWare 8250 quirks" -- 2.25.1
Re: [PATCH] pci: Rename pcibios_add_device to match
On Tue, 2021-09-14 at 01:27 +1000, Oliver O'Halloran wrote: > The general convention for pcibios_* hooks is that they're named after > the corresponding pci_* function they provide a hook for. The exception > is pcibios_add_device() which provides a hook for pci_device_add(). This > has been irritating me for years so rename it. > > Also, remove the export of the microblaze version. The only caller > must be compiled as a built-in so there's no reason for the export. > > Signed-off-by: Oliver O'Halloran > --- > arch/microblaze/pci/pci-common.c | 3 +-- > arch/powerpc/kernel/pci-common.c | 2 +- > arch/powerpc/platforms/powernv/pci-sriov.c | 2 +- > arch/s390/pci/pci.c| 2 +- > arch/sparc/kernel/pci.c| 2 +- > arch/x86/pci/common.c | 2 +- > drivers/pci/pci.c | 4 ++-- > drivers/pci/probe.c| 4 ++-- > include/linux/pci.h| 2 +- > 9 files changed, 11 insertions(+), 12 deletions(-) > > .. snip .. > diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c > index e7e6788d75a8..ded3321b7208 100644 > --- a/arch/s390/pci/pci.c > +++ b/arch/s390/pci/pci.c > @@ -561,7 +561,7 @@ static void zpci_cleanup_bus_resources(struct zpci_dev > *zdev) > zdev->has_resources = 0; > } > > -int pcibios_add_device(struct pci_dev *pdev) > +int pcibios_device_add(struct pci_dev *pdev) > { > struct zpci_dev *zdev = to_zpci(pdev); > struct resource *res; > .. snip .. > I agree with your assesment this is indeed confusing. Interestingly pcibios_release_device() also doesn't follow the convention exactly as it is called from pci_release_dev() but at least that isn't very confusing. So for the arch/s390/pci bit: Acked-by: Niklas Schnelle
Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
On Wed, Sep 15, 2021 at 10:28:59AM +1000, Michael Ellerman wrote: > I don't love it, a new C file and an out-of-line call to then call back > to a static inline that for most configuration will return false ... but > whatever :) Yeah, hch thinks it'll cause a big mess otherwise: https://lore.kernel.org/lkml/ysscwvpxevxw%2f...@infradead.org/ I guess less ifdeffery is nice too. > Acked-by: Michael Ellerman (powerpc) Thx. > Yeah, fixed in mainline today, thanks for trying to cross compile :) Always! :-) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH trivial v2] powerpc/powernv/dump: Fix typo in comment
On Tue, 14 Sept 2021 at 14:38, Vasant Hegde wrote: > > Signed-off-by: Vasant Hegde Reviewed-by: Joel Stanley > --- > arch/powerpc/platforms/powernv/opal-dump.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/powernv/opal-dump.c > b/arch/powerpc/platforms/powernv/opal-dump.c > index 00c5a59d82d9..717d1d30ade5 100644 > --- a/arch/powerpc/platforms/powernv/opal-dump.c > +++ b/arch/powerpc/platforms/powernv/opal-dump.c > @@ -419,7 +419,7 @@ void __init opal_platform_dump_init(void) > int rc; > int dump_irq; > > - /* ELOG not supported by firmware */ > + /* Dump not supported by firmware */ > if (!opal_check_token(OPAL_DUMP_READ)) > return; > > -- > 2.31.1 >
Re: [PATCH] powerpc/powernv/flash: Check OPAL flash calls exist before using
On 9/15/21 11:53 AM, Michael Ellerman wrote: Vasant Hegde writes: Currently only FSP based powernv systems supports firmware update interfaces. Hence check that the token OPAL_FLASH_VALIDATE exists before initalising the flash driver. Signed-off-by: Vasant Hegde --- arch/powerpc/platforms/powernv/opal-flash.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/platforms/powernv/opal-flash.c b/arch/powerpc/platforms/powernv/opal-flash.c index 7e7d38b17420..05490fc22fae 100644 --- a/arch/powerpc/platforms/powernv/opal-flash.c +++ b/arch/powerpc/platforms/powernv/opal-flash.c @@ -520,6 +520,10 @@ void __init opal_flash_update_init(void) { int ret; + /* Firmware update is not supported by firmware */ + if (!opal_check_token(OPAL_FLASH_VALIDATE)) + return; + Michael, That will mean the following files no longer appear on BMC systems: /sys/firmware/opal/image /sys/firmware/opal/validate_flash /sys/firmware/opal/manage_flash /sys/firmware/opal/update_flash Presumably those files don't actually work correctly, but are we sure their mere existence isn't used by anything at all? That's correct. We never used these files/interfaces on BMC based systems. We've had trouble in the past where removing sysfs files breaks tools unexpectedly, see smt_snooze_delay. AFAIK only update_flash uses these interfaces on baremetal systems. This change shouldn't break update_flash as these interfaces never used/worked on BMC based powernv systems. -Vasant
Re: [PATCH] powerpc/powernv/flash: Check OPAL flash calls exist before using
On 9/15/21 11:53 AM, Michael Ellerman wrote: Vasant Hegde writes: Currently only FSP based powernv systems supports firmware update interfaces. Hence check that the token OPAL_FLASH_VALIDATE exists before initalising the flash driver. Signed-off-by: Vasant Hegde --- arch/powerpc/platforms/powernv/opal-flash.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/platforms/powernv/opal-flash.c b/arch/powerpc/platforms/powernv/opal-flash.c index 7e7d38b17420..05490fc22fae 100644 --- a/arch/powerpc/platforms/powernv/opal-flash.c +++ b/arch/powerpc/platforms/powernv/opal-flash.c @@ -520,6 +520,10 @@ void __init opal_flash_update_init(void) { int ret; + /* Firmware update is not supported by firmware */ + if (!opal_check_token(OPAL_FLASH_VALIDATE)) + return; + Michael, That will mean the following files no longer appear on BMC systems: /sys/firmware/opal/image /sys/firmware/opal/validate_flash /sys/firmware/opal/manage_flash /sys/firmware/opal/update_flash Presumably those files don't actually work correctly, but are we sure their mere existence isn't used by anything at all? That's correct. We never used these files/interfaces on BMC based systems. We've had trouble in the past where removing sysfs files breaks tools unexpectedly, see smt_snooze_delay. AFAIK only update_flash uses these interfaces on baremetal systems. This change shouldn't break update_flash as these interfaces never used/worked on BMC based powernv systems. -Vasant
Re: [PATCH] powerpc/powernv/flash: Check OPAL flash calls exist before using
Vasant Hegde writes: > Currently only FSP based powernv systems supports firmware update > interfaces. Hence check that the token OPAL_FLASH_VALIDATE exists > before initalising the flash driver. > > Signed-off-by: Vasant Hegde > --- > arch/powerpc/platforms/powernv/opal-flash.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/powerpc/platforms/powernv/opal-flash.c > b/arch/powerpc/platforms/powernv/opal-flash.c > index 7e7d38b17420..05490fc22fae 100644 > --- a/arch/powerpc/platforms/powernv/opal-flash.c > +++ b/arch/powerpc/platforms/powernv/opal-flash.c > @@ -520,6 +520,10 @@ void __init opal_flash_update_init(void) > { > int ret; > > + /* Firmware update is not supported by firmware */ > + if (!opal_check_token(OPAL_FLASH_VALIDATE)) > + return; > + That will mean the following files no longer appear on BMC systems: /sys/firmware/opal/image /sys/firmware/opal/validate_flash /sys/firmware/opal/manage_flash /sys/firmware/opal/update_flash Presumably those files don't actually work correctly, but are we sure their mere existence isn't used by anything at all? We've had trouble in the past where removing sysfs files breaks tools unexpectedly, see smt_snooze_delay. cheers