[PATCH] pseries/iommu: Tweak ddw behavior in presence of pmem
Recently we discovered an issue on pseries guests that prevents pci devices from accessing pmem memory via DMA. Performing such an operation will cause PHB to freeze the corresponding partition endpoint and in some scenarios will shutdown the disk that hosts the rootfs. A fix for this is in works until then this patch proposes to disable DDW if pmem nodes are present in the device-tree. This would force all DMA from I/O adapters through default 2-GB window and prevent direct access of pmem memory which is located beyond 4-TB guest physical address. Since this change can have performance penalty for cases where its known that i/o adapters wont be performing DMA to pmem, the patch adds new args to the 'disable_ddw' kernel commanline flag with following possible values: 'default' : Enable DDW only if no Pmem nodes present in device-tree 'Yes' : Disable DDW always 'No' : Force enable DDW even if Pmem nodes present in device-tree Signed-off-by: Vaibhav Jain --- .../admin-guide/kernel-parameters.txt | 10 ++- arch/powerpc/platforms/pseries/iommu.c| 67 +-- 2 files changed, 70 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index c07815d230bc..58e09f7a2cb9 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -901,9 +901,13 @@ The feature only exists starting from Arch Perfmon v4 (Skylake and newer). - disable_ddw [PPC/PSERIES] - Disable Dynamic DMA Window support. Use this if - to workaround buggy firmware. + disable_ddw=[PPC/PSERIES] + Controls weather Dynamic DMA Window support is enabled. + Use this if to workaround buggy firmware. Following + values are supported: +on Disable ddw always +off Enable ddw always + default Enable ddw if no Persistent memory present (default) disable_ipv6= [IPV6] See Documentation/networking/ipv6.txt. diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 2e0a8eab5588..97498cc25c9f 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -12,6 +12,7 @@ #include #include +#include #include #include #include @@ -755,12 +756,39 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev) pci_name(dev)); } +/* + * Following values for the variable are handled + * '-1': Force enable ddw even if Persistent memory is present + * '0' : Enable ddw if no Persistent memory present (default) + * '1' : Disable ddw always + */ static int __read_mostly disable_ddw; -static int __init disable_ddw_setup(char *str) +static int __init disable_ddw_setup(char *param) { - disable_ddw = 1; - printk(KERN_INFO "ppc iommu: disabling ddw.\n"); + bool val; + int res; + + /* Maintain old behaviour that disables DDW when flag given */ + if (!param) { + disable_ddw = 1; + return 0; + } + + res = strtobool(param, ); + + if (!res) { + if (val) { + disable_ddw = 1; + pr_info("ppc iommu: disabling ddw.\n"); + } else if (!val) { + /* Force enable of DDW even if pmem is available */ + disable_ddw = -1; + pr_info("ppc iommu: will force enable ddw.\n"); + } + } else if (strcmp(param, "default") == 0) { + disable_ddw = 0; + } return 0; } @@ -1313,6 +1341,37 @@ static struct notifier_block iommu_reconfig_nb = { .notifier_call = iommu_reconfig_notifier, }; +/* Check if DDW can be supported for this lpar */ +int ddw_supported(void) +{ + struct device_node *dn; + + if (disable_ddw == -1) /* force enable ddw */ + goto out; + + if (disable_ddw == 1) + return 0; + + /* +* Due to DMA window limitations currently DDW is not supported +* for persistent memory. This is due 1 TiB size of direct mapped +* DMA window size limitation enforce by phyp. Since pmem memory +* will be mapped at phy address > 4TiB, we cannot accmodate pmem +* in the DDW window and DMA's to/from the pmem memory will result in +* PHBs getting frozen triggering EEH. Hence for the the time being +* disable DDW in presence of a 'ibm,pmemory' node. +*/ + dn = of_find_compatible_node(NULL, NULL, "ibm,pmemory"); + if (dn) { + pr_info("IOMMU: Disabling DDW as pmem memory available\n"); + of_node_put(dn); + return 0; + } + out:
[PATCH -next 000/491] treewide: use fallthrough;
There is a new fallthrough pseudo-keyword macro that can be used to replace the various /* fallthrough */ style comments that are used to indicate a case label code block is intended to fallthrough to the next case label block. See commit 294f69e662d1 ("compiler_attributes.h: Add 'fallthrough' pseudo keyword for switch/case use") These patches are intended to allow clang to detect missing switch/case fallthrough uses. Do a depth-first pass on the MAINTAINERS file and find the various F: pattern files and convert the fallthrough comments to fallthrough; for all files matched by all F: patterns in in each section. Done via the perl script below and the previously posted cvt_fallthrough.pl script. Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ These patches are based on next-20200310 and are available in git://repo.or.cz/linux-2.6/trivial-mods.git in branch 20200310_fallthrough_2 $ cat commit_fallthrough.pl #!/usr/bin/env perl use sort 'stable'; # # Reorder a sorted array so file entries are before directory entries # depends on a trailing / for directories # so: # foo/ # foo/bar.c # becomes # foo/bar.c # foo/ # sub file_before_directory { my ($array_ref) = (@_); my $count = scalar(@$array_ref); for (my $i = 1; $i < $count; $i++) { if (substr(@$array_ref[$i - 1], -1) eq '/' && substr(@$array_ref[$i], 0, length(@$array_ref[$i - 1])) eq @$array_ref[$i - 1]) { my $string = @$array_ref[$i - 1]; @$array_ref[$i - 1] = @$array_ref[$i]; @$array_ref[$i] = $string; } } } sub uniq { my (@parms) = @_; my %saw; @parms = grep(!$saw{$_}++, @parms); return @parms; } # Get all the F: file patterns in MAINTAINERS that could be a .[ch] file my $maintainer_patterns = `grep -P '^F:\\s+' MAINTAINERS`; my @patterns = split('\n', $maintainer_patterns); s/^F:\s*// for @patterns; @patterns = grep(!/^(?:Documentation|tools|scripts)\//, @patterns); @patterns = grep(!/\.(?:dtsi?|rst|config)$/, @patterns); @patterns = sort @patterns; @patterns = sort { $b =~ tr/\//\// cmp $a =~ tr/\//\// } @patterns; file_before_directory(\@patterns); my %sections_done; foreach my $pattern (@patterns) { # Find the files the pattern matches my $pattern_files = `git ls-files -- $pattern`; my @new_patterns = split('\n', $pattern_files); $pattern_files = join(' ', @new_patterns); next if ($pattern_files =~ /^\s*$/); # Find the section the first file matches my $pattern_file = @new_patterns[0]; my $section_output = `./scripts/get_maintainer.pl --nogit --nogit-fallback --sections --pattern-depth=1 $pattern_file`; my @section = split('\n', $section_output); my $section_header = @section[0]; print("Section: <$section_header>\n"); # Skip the section if it's already done print("Already done '$section_header'\n") if ($sections_done{$section_header}); next if ($sections_done{$section_header}++); # Find all the .[ch] files in all F: lines in that section my @new_section; foreach my $line (@section) { last if ($line =~ /^\s*$/); push(@new_section, $line); } @section = grep(/^F:/, @new_section); s/^F:\s*// for @section; @section = grep(!/^(?:Documentation|tools|scripts)\//, @section); @section = grep(!/\.(?:dtsi?|rst|config)$/, @section); @section = sort @section; @section = uniq(@section); my $section_files = join(' ', @section); print("section_files: <$section_files>\n"); next if ($section_files =~ /^\s*$/); my $cvt_files = `git ls-files -- $section_files`; my @files = split('\n', $cvt_files); @files = grep(!/^(?:Documentation|tools|scripts)\//, @files); @files = grep(!/\.(?:dtsi?|rst|config)$/, @files); @files = grep(/\.[ch]$/, @files); @files = sort @files; @files = uniq(@files); $cvt_files = join(' ', @files); print("files: <$cvt_files>\n"); next if (scalar(@files) < 1); # Convert fallthroughs for all [.ch] files in the section print("doing cvt_fallthrough.pl -- $cvt_files\n"); `cvt_fallthrough.pl -- $cvt_files`; # If nothing changed, nothing to commit `git diff-index --quiet HEAD --`; next if (!$?); # Commit the changes my $fh; open($fh, "+>", "cvt_fallthrough.commit_msg") or die "$0: can't create temporary file: $!\n"; print $fh <https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git@perches.com/ EOF ; close $fh; `git commit -s -a -F cvt_fallthrough.commit_msg`; } Joe Perches (491): MELLANOX ETHERNET INNOVA DRIVERS: Use fallthrough; MARVELL OCTEONTX2 RVU ADMIN FUNCTION DRIVER: Use fallthrough; MELLANOX MLX5 core VPI driver: Use fallthrough; PERFORMANCE EVENTS SUBSYSTEM: Use fallthrough; ARM/UNI
[PATCH -next 017/491] CELL BROADBAND ENGINE ARCHITECTURE: Use fallthrough;
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ Signed-off-by: Joe Perches --- arch/powerpc/platforms/cell/spufs/switch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/cell/spufs/switch.c b/arch/powerpc/platforms/cell/spufs/switch.c index 5c3f5d0..d56b4e3 100644 --- a/arch/powerpc/platforms/cell/spufs/switch.c +++ b/arch/powerpc/platforms/cell/spufs/switch.c @@ -177,7 +177,7 @@ static inline void save_mfc_cntl(struct spu_state *csa, struct spu *spu) POLL_WHILE_FALSE((in_be64(>mfc_control_RW) & MFC_CNTL_SUSPEND_DMA_STATUS_MASK) == MFC_CNTL_SUSPEND_COMPLETE); - /* fall through */ + fallthrough; case MFC_CNTL_SUSPEND_COMPLETE: if (csa) csa->priv2.mfc_control_RW = -- 2.24.0
[PATCH -next 016/491] KERNEL VIRTUAL MACHINE FOR POWERPC (KVM/powerpc): Use fallthrough;
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ Signed-off-by: Joe Perches --- arch/powerpc/kvm/book3s_32_mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu.c | 2 +- arch/powerpc/kvm/book3s_pr.c | 2 +- arch/powerpc/kvm/booke.c | 6 +++--- arch/powerpc/kvm/powerpc.c | 1 - 5 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index f21e734..3fbd57 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -234,7 +234,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr, case 2: case 6: pte->may_write = true; - /* fall through */ + fallthrough; case 3: case 5: case 7: diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index 5991332..26b8b2 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -311,7 +311,7 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, case 2: case 6: gpte->may_write = true; - /* fall through */ + fallthrough; case 3: case 5: case 7: diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 729a0f..7db3695 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -740,7 +740,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, (vcpu->arch.hflags & BOOK3S_HFLAG_SPLIT_HACK) && ((pte.raddr & SPLIT_HACK_MASK) == SPLIT_HACK_OFFS)) pte.raddr &= ~SPLIT_HACK_MASK; - /* fall through */ + fallthrough; case MSR_IR: vcpu->arch.mmu.esid_to_vsid(vcpu, eaddr >> SID_SHIFT, ); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 7b27604..be47815 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -421,11 +421,11 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_DATA_STORAGE: case BOOKE_IRQPRIO_ALIGNMENT: update_dear = true; - /* fall through */ + fallthrough; case BOOKE_IRQPRIO_INST_STORAGE: case BOOKE_IRQPRIO_PROGRAM: update_esr = true; - /* fall through */ + fallthrough; case BOOKE_IRQPRIO_ITLB_MISS: case BOOKE_IRQPRIO_SYSCALL: case BOOKE_IRQPRIO_FP_UNAVAIL: @@ -459,7 +459,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_DECREMENTER: case BOOKE_IRQPRIO_FIT: keep_irq = true; - /* fall through */ + fallthrough; case BOOKE_IRQPRIO_EXTERNAL: case BOOKE_IRQPRIO_DBELL: allowed = vcpu->arch.shared->msr & MSR_EE; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 1af96fb5..c242a79 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -525,7 +525,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = 1; break; case KVM_CAP_PPC_GUEST_DEBUG_SSTEP: - /* fall through */ case KVM_CAP_PPC_PAIRED_SINGLES: case KVM_CAP_PPC_OSI: case KVM_CAP_PPC_GET_PVINFO: -- 2.24.0
Re: [PATCH v3 21/27] powerpc/powernv/pmem: Add an IOCTL to request controller health & perf data
On Wed, 2020-03-04 at 12:06 +0100, Frederic Barrat wrote: > > Le 28/02/2020 à 07:12, Andrew Donnellan a écrit : > > On 21/2/20 2:27 pm, Alastair D'Silva wrote: > > > From: Alastair D'Silva > > > > > > When health & performance data is requested from the controller, > > > it responds with an error log containing the requested > > > information. > > > > > > This patch allows the request to me issued via an IOCTL. > > > > A better explanation would be good - this IOCTL triggers a request > > to > > the controller to collect controller health/perf data, and the > > controller will later respond with an error log that can be picked > > up > > via the error log IOCTL that you've defined earlier. > > And even more precisely (to also check my understanding): > > > this IOCTL triggers a request to > > the controller to collect controller health/perf data, and the > > controller will later respond > > by raising an interrupt to let the user app know that > > > an error log that can be picked up > > via the error log IOCTL that you've defined earlier. > > > The rest of the patch looks ok to me. > >Fred Ok -- Alastair D'Silva Open Source Developer Linux Technology Centre, IBM Australia mob: 0423 762 819
Re: [PATCH v3 20/27] powerpc/powernv/pmem: Forward events to userspace
On Wed, 2020-03-04 at 12:00 +0100, Frederic Barrat wrote: > > Le 21/02/2020 à 04:27, Alastair D'Silva a écrit : > > From: Alastair D'Silva > > > > Some of the interrupts that the card generates are better handled > > by the userspace daemon, in particular: > > Controller Hardware/Firmware Fatal > > Controller Dump Available > > Error Log available > > > > This patch allows a userspace application to register an eventfd > > with > > the driver via SCM_IOCTL_EVENTFD to receive notifications of these > > interrupts. > > > > Userspace can then identify what events have occurred by calling > > SCM_IOCTL_EVENT_CHECK and checking against the SCM_IOCTL_EVENT_FOO > > masks. > > > > Signed-off-by: Alastair D'Silva > > --- > > arch/powerpc/platforms/powernv/pmem/ocxl.c| 216 > > ++ > > .../platforms/powernv/pmem/ocxl_internal.h| 5 + > > include/uapi/nvdimm/ocxl-pmem.h | 16 ++ > > 3 files changed, 237 insertions(+) > > > > diff --git a/arch/powerpc/platforms/powernv/pmem/ocxl.c > > b/arch/powerpc/platforms/powernv/pmem/ocxl.c > > index 009d4fd29e7d..e46696d3cc36 100644 > > --- a/arch/powerpc/platforms/powernv/pmem/ocxl.c > > +++ b/arch/powerpc/platforms/powernv/pmem/ocxl.c > > @@ -10,6 +10,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -335,11 +336,22 @@ static void free_ocxlpmem(struct ocxlpmem > > *ocxlpmem) > > { > > int rc; > > > > + // Disable doorbells > > + (void)ocxl_global_mmio_set64(ocxlpmem->ocxl_afu, > > GLOBAL_MMIO_CHIEC, > > +OCXL_LITTLE_ENDIAN, > > +GLOBAL_MMIO_CHI_ALL); > > + > > if (ocxlpmem->nvdimm_bus) > > nvdimm_bus_unregister(ocxlpmem->nvdimm_bus); > > > > free_minor(ocxlpmem); > > > > + if (ocxlpmem->irq_addr[1]) > > + iounmap(ocxlpmem->irq_addr[1]); > > + > > + if (ocxlpmem->irq_addr[0]) > > + iounmap(ocxlpmem->irq_addr[0]); > > + > > if (ocxlpmem->cdev.owner) > > cdev_del(>cdev); > > > > @@ -443,6 +455,11 @@ static int file_release(struct inode *inode, > > struct file *file) > > { > > struct ocxlpmem *ocxlpmem = file->private_data; > > > > + if (ocxlpmem->ev_ctx) { > > + eventfd_ctx_put(ocxlpmem->ev_ctx); > > + ocxlpmem->ev_ctx = NULL; > > + } > > + > > ocxlpmem_put(ocxlpmem); > > return 0; > > } > > @@ -938,6 +955,51 @@ static int ioctl_controller_stats(struct > > ocxlpmem *ocxlpmem, > > return rc; > > } > > > > +static int ioctl_eventfd(struct ocxlpmem *ocxlpmem, > > +struct ioctl_ocxl_pmem_eventfd __user *uarg) > > +{ > > + struct ioctl_ocxl_pmem_eventfd args; > > + > > + if (copy_from_user(, uarg, sizeof(args))) > > + return -EFAULT; > > + > > + if (ocxlpmem->ev_ctx) > > + return -EINVAL; > > EBUSY? > Ok > > > + > > + ocxlpmem->ev_ctx = eventfd_ctx_fdget(args.eventfd); > > + if (!ocxlpmem->ev_ctx) > > + return -EFAULT; > > Why not use what eventfd_ctx_fdget() returned? (through some > IS_ERR() > and PTR_ERR() convolution) > Ok > > > + > > + return 0; > > +} > > + > > +static int ioctl_event_check(struct ocxlpmem *ocxlpmem, u64 __user > > *uarg) > > +{ > > + u64 val = 0; > > + int rc; > > + u64 chi = 0; > > + > > + rc = ocxlpmem_chi(ocxlpmem, ); > > + if (rc < 0) > > + return rc; > > + > > + if (chi & GLOBAL_MMIO_CHI_ELA) > > + val |= IOCTL_OCXL_PMEM_EVENT_ERROR_LOG_AVAILABLE; > > + > > + if (chi & GLOBAL_MMIO_CHI_CDA) > > + val |= IOCTL_OCXL_PMEM_EVENT_CONTROLLER_DUMP_AVAILABLE; > > + > > + if (chi & GLOBAL_MMIO_CHI_CFFS) > > + val |= IOCTL_OCXL_PMEM_EVENT_FIRMWARE_FATAL; > > + > > + if (chi & GLOBAL_MMIO_CHI_CHFS) > > + val |= IOCTL_OCXL_PMEM_EVENT_HARDWARE_FATAL; > > + > > + rc = copy_to_user((u64 __user *) uarg, , sizeof(val)); > > + > > copy_to_user doesn't return an errno. Should be: > > if (copy_to_user((u64 __user *) uarg, , sizeof(val))) > return -EFAULT; > Ok > > > + return rc; > > +} > > + > > static long file_ioctl(struct file *file, unsigned int cmd, > > unsigned long args) > > { > > struct ocxlpmem *ocxlpmem = file->private_data; > > @@ -966,6 +1028,15 @@ static long file_ioctl(struct file *file, > > unsigned int cmd, unsigned long args) > > rc = ioctl_controller_stats(ocxlpmem, > > (struct > > ioctl_ocxl_pmem_controller_stats __user *)args); > > break; > > + > > + case IOCTL_OCXL_PMEM_EVENTFD: > > + rc = ioctl_eventfd(ocxlpmem, > > + (struct ioctl_ocxl_pmem_eventfd > > __user *)args); > > + break; > > + > > + case IOCTL_OCXL_PMEM_EVENT_CHECK: > > + rc = ioctl_event_check(ocxlpmem, (u64 __user *)args); > > + break; > > } > > > > return rc; > > @@ -1107,6
Re: [PATCH] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests
On Tue, Mar 10, 2020 at 04:11:28PM -0500, Michael Roth wrote: > The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest > via the guest/nested hypervisor. > > ./run-tests.sh -v > ... > TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp > 2,threads=2 -machine cap-htm=on -append "h_cede_tm" > FAIL h_cede_tm (2 tests, 1 unexpected failures) > > While the test relates to transactional memory instructions, the actual > failure is due to the return code of the H_CEDE hypercall, which is > reported as 224 instead of 0. This happens even when no TM instructions > are issued. > > 224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3 > is where the caller expects the return code to be placed upon return. > > In the case of guest running under a nested hypervisor, issuing H_CEDE > causes a return from H_ENTER_NESTED. In this case H_CEDE is > specially-handled immediately rather than later in > kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to > set the return code for the caller, hence why kvm-unit-test sees the > 224 return code and reports an error. > > Guest kernels generally don't check the return value of H_CEDE, so > that likely explains why this hasn't caused issues outside of > kvm-unit-tests so far. > > Fix this by setting r3 to 0 after we finish processing the H_CEDE. > > RHBZ: 1778556 > > Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when > nested") > Cc: linuxppc-...@ozlabs.org > Cc: David Gibson > Cc: Paul Mackerras > Signed-off-by: Michael Roth Reviewed-by: David Gibson > --- > arch/powerpc/kvm/book3s_hv.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 2cefd071b848..c0c43a733830 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -3616,6 +3616,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 > time_limit, > if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && > kvmppc_get_gpr(vcpu, 3) == H_CEDE) { > kvmppc_nested_cede(vcpu); > + kvmppc_set_gpr(vcpu, 3, 0); > trap = 0; > } > } else { -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH v3] powerpc: setup_64: set up PACA earlier to avoid kcov problems
>>> So: >>> - change the test when setting up a PACA to consider the actual value of >>>the MSR rather than the CPU feature. >>> >>> - move the PACA setup to before the cpu feature parsing. >> >> Hmm. Problem is that equally we want PACA to be sane before we call too >> far into the rest of the kernel ("generic dt parsing code"). > > But currently we call into that code with no paca at all. Or rather, > with r13 pointing somewhere random that will be interpreted as being a > paca. > > This took a while for Daniel to debug because depending on how you boot > r13 contains a different junk value. That junk value may not point to > memory at all, or if it does the memory it points to may or may not send > you down the wrong path, depending on which exact bit you're looking at > in some random location. > > So this is really not about kcov from my POV, that's just how we > discovered it. Ah, yes. I agree with mpe, and reading back over my commit message I think I did a pretty poor job of explaining it. How about this for a commit message: --- powerpc: setup_64: set up PACA before parsing device tree Currently, we set up the PACA after parsing the device tree for CPU features. Before that, r13 contains random data, which means there is random data in r13 while we're running the generic dt parsing code. This random data varies depending on whether we boot through a vmlinux or a zImage: for the vmlinux case it's usually around zero, but for zImages we see random values like 912a72603d420015. This is poor practice, and can also lead to difficult-to-debug crashes. For example, when kcov is enabled, the kcov instrumentation attempts to read preempt_count out of the current task, which goes via the PACA. This then crashes in the zImage case. To resolve this: - move the PACA setup to before the cpu feature parsing. - because we no longer have access to cpu feature flags in PACA setup, change the HV feature test in the PACA setup path to consider the actual value of the MSR rather than the CPU feature. Translations get switched on once we leave early_setup, so I think we'd already catch any other cases where the PACA or task aren't set up. Boot tested on a P9 guest and host. Fixes: fb0b0a73b223 ("powerpc: Enable kcov") Cc: Andrew Donnellan Cc: sta...@vger.kernel.org Reviewed-by: Andrew Donnellan Suggested-by: Michael Ellerman Signed-off-by: Daniel Axtens --- Regards, Daniel > > cheers
Re: [PATCH net v2] ibmvnic: Do not process device remove during device reset
From: Juliet Kim Date: Tue, 10 Mar 2020 09:23:58 -0500 > The ibmvnic driver does not check the device state when the device > is removed. If the device is removed while a device reset is being > processed, the remove may free structures needed by the reset, > causing an oops. > > Fix this by checking the device state before processing device remove. > > Signed-off-by: Juliet Kim Applied, thank you.
[PATCH] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests
The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest via the guest/nested hypervisor. ./run-tests.sh -v ... TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 2,threads=2 -machine cap-htm=on -append "h_cede_tm" FAIL h_cede_tm (2 tests, 1 unexpected failures) While the test relates to transactional memory instructions, the actual failure is due to the return code of the H_CEDE hypercall, which is reported as 224 instead of 0. This happens even when no TM instructions are issued. 224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3 is where the caller expects the return code to be placed upon return. In the case of guest running under a nested hypervisor, issuing H_CEDE causes a return from H_ENTER_NESTED. In this case H_CEDE is specially-handled immediately rather than later in kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to set the return code for the caller, hence why kvm-unit-test sees the 224 return code and reports an error. Guest kernels generally don't check the return value of H_CEDE, so that likely explains why this hasn't caused issues outside of kvm-unit-tests so far. Fix this by setting r3 to 0 after we finish processing the H_CEDE. RHBZ: 1778556 Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when nested") Cc: linuxppc-...@ozlabs.org Cc: David Gibson Cc: Paul Mackerras Signed-off-by: Michael Roth --- arch/powerpc/kvm/book3s_hv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2cefd071b848..c0c43a733830 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3616,6 +3616,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && kvmppc_get_gpr(vcpu, 3) == H_CEDE) { kvmppc_nested_cede(vcpu); + kvmppc_set_gpr(vcpu, 3, 0); trap = 0; } } else { -- 2.17.1
[PATCH 2/2] powerpc/fadump: consider reserved ranges while reserving memory
Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for memory reservations") enabled support to parse reserved-ranges DT node and reserve kernel memory falling in these ranges for F/W purposes. Memory reserved for FADump should not overlap with these ranges as it could corrupt memory meant for F/W or crash'ed kernel memory to be exported as vmcore. But since commit 579ca1a27675 ("powerpc/fadump: make use of memblock's bottom up allocation mode"), memblock_find_in_range() is being used to find the appropriate area to reserve memory for FADump, which can't account for reserved-ranges as these ranges are reserved only after FADump memory reservation. With reserved-ranges now being populated during early boot, look out for these memory ranges while reserving memory for FADump. Without this change, MPIPL on PowerNV systems aborts with hostboot failure, when memory reserved for FADump is less than 4096MB. Fixes: 579ca1a27675 ("powerpc/fadump: make use of memblock's bottom up allocation mode") Cc: sta...@vger.kernel.org # v5.4+ Signed-off-by: Hari Bathini --- arch/powerpc/kernel/fadump.c | 76 -- 1 file changed, 66 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 7fcf4a8f..ab83be9 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -443,10 +443,70 @@ static int __init fadump_get_boot_mem_regions(void) return ret; } +/* + * Returns true, if the given range overlaps with reserved memory ranges + * starting at idx. Also, updates idx to index of overlapping memory range + * with the given memory range. + * False, otherwise. + */ +static bool overlaps_reserved_ranges(u64 base, u64 end, int *idx) +{ + bool ret = false; + int i; + + for (i = *idx; i < reserved_mrange_info.mem_range_cnt; i++) { + u64 rbase = reserved_mrange_info.mem_ranges[i].base; + u64 rend = rbase + reserved_mrange_info.mem_ranges[i].size; + + if (end <= rbase) + break; + + if ((end > rbase) && (base < rend)) { + *idx = i; + ret = true; + break; + } + } + + return ret; +} + +/* + * Locate a suitable memory area to reserve memory for FADump. While at it, + * lookup reserved-ranges & avoid overlap with them, as they are used by F/W. + */ +static u64 __init fadump_locate_reserve_mem(u64 base, u64 size) +{ + struct fadump_memory_range *mrngs; + phys_addr_t mstart, mend; + int idx = 0; + u64 i; + + mrngs = reserved_mrange_info.mem_ranges; + for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, + , , NULL) { + pr_debug("%llu) mstart: %llx, mend: %llx, base: %llx\n", +i, mstart, mend, base); + + if (mstart > base) + base = PAGE_ALIGN(mstart); + + while ((mend > base) && ((mend - base) >= size)) { + if (!overlaps_reserved_ranges(base, base + size, )) + goto out; + + base = mrngs[idx].base + mrngs[idx].size; + base = PAGE_ALIGN(base); + } + } + +out: + return base; +} + int __init fadump_reserve_mem(void) { - u64 base, size, mem_boundary, bootmem_min, align = PAGE_SIZE; - bool is_memblock_bottom_up = memblock_bottom_up(); + u64 base, size, mem_boundary, bootmem_min; int ret = 1; if (!fw_dump.fadump_enabled) @@ -467,9 +527,9 @@ int __init fadump_reserve_mem(void) PAGE_ALIGN(fadump_calculate_reserve_size()); #ifdef CONFIG_CMA if (!fw_dump.nocma) { - align = FADUMP_CMA_ALIGNMENT; fw_dump.boot_memory_size = - ALIGN(fw_dump.boot_memory_size, align); + ALIGN(fw_dump.boot_memory_size, + FADUMP_CMA_ALIGNMENT); } #endif @@ -537,13 +597,9 @@ int __init fadump_reserve_mem(void) * Reserve memory at an offset closer to bottom of the RAM to * minimize the impact of memory hot-remove operation. */ - memblock_set_bottom_up(true); - base = memblock_find_in_range(base, mem_boundary, size, align); - - /* Restore the previous allocation mode */ - memblock_set_bottom_up(is_memblock_bottom_up); + base = fadump_locate_reserve_mem(base, size); - if (!base) { + if (base > (mem_boundary - size)) { pr_err("Failed to find memory chunk for reservation!\n"); goto error_out; }
[PATCH 1/2] powerpc/fadump: use static allocation for reserved memory ranges
At times, memory ranges have to be looked up during early boot, when kernel couldn't be initialized for dynamic memory allocation. In fact, reserved-ranges look up is needed during FADump memory reservation. Without accounting for reserved-ranges in reserving memory for FADump, MPIPL boot fails with memory corruption issues. So, extend memory ranges handling to support static allocation and populate reserved memory ranges during early boot. Fixes: dda9dbfeeb7a ("powerpc/fadump: consider reserved ranges while releasing memory") Cc: sta...@vger.kernel.org # v5.4+ Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/fadump-internal.h |4 + arch/powerpc/kernel/fadump.c | 77 2 files changed, 48 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index c814a2b..8d61c8f 100644 --- a/arch/powerpc/include/asm/fadump-internal.h +++ b/arch/powerpc/include/asm/fadump-internal.h @@ -64,12 +64,14 @@ struct fadump_memory_range { }; /* fadump memory ranges info */ +#define RNG_NAME_SZ16 struct fadump_mrange_info { - charname[16]; + charname[RNG_NAME_SZ]; struct fadump_memory_range *mem_ranges; u32 mem_ranges_sz; u32 mem_range_cnt; u32 max_mem_ranges; + boolis_static; }; /* Platform specific callback functions */ diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index ff0114a..7fcf4a8f 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -38,8 +38,17 @@ static void __init fadump_reserve_crash_area(u64 base); #ifndef CONFIG_PRESERVE_FA_DUMP static DEFINE_MUTEX(fadump_mutex); -struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 }; -struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 }; +struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0, false }; + +#define RESERVED_RNGS_SZ 16384 /* 16K - 128 entries */ +#define RESERVED_RNGS_CNT (RESERVED_RNGS_SZ / \ +sizeof(struct fadump_memory_range)) +static struct fadump_memory_range rngs[RESERVED_RNGS_CNT]; +struct fadump_mrange_info reserved_mrange_info = { "reserved", rngs, + RESERVED_RNGS_SZ, 0, + RESERVED_RNGS_CNT, true }; + +static void __init early_init_dt_scan_reserved_ranges(unsigned long node); #ifdef CONFIG_CMA static struct cma *fadump_cma; @@ -108,6 +117,11 @@ static int __init fadump_cma_init(void) { return 1; } int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, int depth, void *data) { + if (depth == 0) { + early_init_dt_scan_reserved_ranges(node); + return 0; + } + if (depth != 1) return 0; @@ -726,10 +740,14 @@ void fadump_free_cpu_notes_buf(void) static void fadump_free_mem_ranges(struct fadump_mrange_info *mrange_info) { + if (mrange_info->is_static) { + mrange_info->mem_range_cnt = 0; + return; + } + kfree(mrange_info->mem_ranges); - mrange_info->mem_ranges = NULL; - mrange_info->mem_ranges_sz = 0; - mrange_info->max_mem_ranges = 0; + memset((void *)((u64)mrange_info + RNG_NAME_SZ), 0, + (sizeof(struct fadump_mrange_info) - RNG_NAME_SZ)); } /* @@ -786,6 +804,12 @@ static inline int fadump_add_mem_range(struct fadump_mrange_info *mrange_info, if (mrange_info->mem_range_cnt == mrange_info->max_mem_ranges) { int ret; + if (mrange_info->is_static) { + pr_err("Reached array size limit for %s memory ranges\n", + mrange_info->name); + return -ENOSPC; + } + ret = fadump_alloc_mem_ranges(mrange_info); if (ret) return ret; @@ -1202,20 +1226,19 @@ static void sort_and_merge_mem_ranges(struct fadump_mrange_info *mrange_info) * Scan reserved-ranges to consider them while reserving/releasing * memory for FADump. */ -static inline int fadump_scan_reserved_mem_ranges(void) +static void __init early_init_dt_scan_reserved_ranges(unsigned long node) { - struct device_node *root; const __be32 *prop; int len, ret = -1; unsigned long i; - root = of_find_node_by_path("/"); - if (!root) - return ret; + /* reserved-ranges already scanned */ + if (reserved_mrange_info.mem_range_cnt != 0) +
Re: [PATCH v4 6/8] perf/tools: Enhance JSON/metric infrastructure to handle "?"
Em Mon, Mar 09, 2020 at 11:55:50AM +0530, Kajol Jain escreveu: > Patch enhances current metric infrastructure to handle "?" in the metric > expression. The "?" can be use for parameters whose value not known while > creating metric events and which can be replace later at runtime to > the proper value. It also add flexibility to create multiple events out > of single metric event added in json file. > > Patch adds function 'arch_get_runtimeparam' which is a arch specific > function, returns the count of metric events need to be created. > By default it return 1. > > One loop is added in function 'metricgroup__add_metric', which create > multiple events at run time depend on return value of > 'arch_get_runtimeparam' and merge that event in 'group_list'. > > This infrastructure needed for hv_24x7 socket/chip level events. > "hv_24x7" chip level events needs specific chip-id to which the > data is requested. Function 'arch_get_runtimeparam' implemented > in header.c which extract number of sockets from sysfs file > "sockets" under "/sys/devices/hv_24x7/interface/". > > Signed-off-by: Kajol Jain > --- > tools/perf/arch/powerpc/util/header.c | 22 + > tools/perf/util/expr.h| 1 + > tools/perf/util/expr.l| 19 +++- > tools/perf/util/metricgroup.c | 124 -- > tools/perf/util/metricgroup.h | 1 + > tools/perf/util/stat-shadow.c | 8 ++ > 6 files changed, 148 insertions(+), 27 deletions(-) > > diff --git a/tools/perf/arch/powerpc/util/header.c > b/tools/perf/arch/powerpc/util/header.c > index 3b4cdfc5efd6..036f6b2ce202 100644 > --- a/tools/perf/arch/powerpc/util/header.c > +++ b/tools/perf/arch/powerpc/util/header.c > @@ -7,6 +7,11 @@ > #include > #include > #include "header.h" > +#include "metricgroup.h" > +#include "evlist.h" > +#include > +#include "pmu.h" > +#include > > #define mfspr(rn) ({unsigned long rval; \ >asm volatile("mfspr %0," __stringify(rn) \ > @@ -16,6 +21,8 @@ > #define PVR_VER(pvr)(((pvr) >> 16) & 0x) /* Version field */ > #define PVR_REV(pvr)(((pvr) >> 0) & 0x) /* Revison field */ > > +#define SOCKETS_INFO_FILE_PATH "/devices/hv_24x7/interface/" > + > int > get_cpuid(char *buffer, size_t sz) > { > @@ -44,3 +51,18 @@ get_cpuid_str(struct perf_pmu *pmu __maybe_unused) > > return bufp; > } > + > +int arch_get_runtimeparam(void) > +{ > + int count; > + char path[PATH_MAX]; > + char filename[] = "sockets"; > + > + snprintf(path, PATH_MAX, > + SOCKETS_INFO_FILE_PATH "%s", filename); > + > + if (sysfs__read_ull(path, (unsigned long long *)) < 0) > + return 1; > + else > + return count; Why this cast dance? We have sysfs__read_int(path, ). Also this is more compact: return sysfs__read_int(path, ) < 0 ? 1 : count; > +} > diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h > index 9377538f4097..d17664e628db 100644 > --- a/tools/perf/util/expr.h > +++ b/tools/perf/util/expr.h > @@ -15,6 +15,7 @@ struct parse_ctx { > struct parse_id ids[MAX_PARSE_ID]; > }; > > +int expr__runtimeparam; > void expr__ctx_init(struct parse_ctx *ctx); > void expr__add_id(struct parse_ctx *ctx, const char *id, double val); > int expr__parse(double *final_val, struct parse_ctx *ctx, const char *expr); > diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l > index 1928f2a3dddc..ec4b00671f67 100644 > --- a/tools/perf/util/expr.l > +++ b/tools/perf/util/expr.l > @@ -45,6 +45,21 @@ static char *normalize(char *str) > *dst++ = '/'; > else if (*str == '\\') > *dst++ = *++str; > +else if (*str == '?') { > + > + int size = snprintf(NULL, 0, "%d", expr__runtimeparam); TIL that C99 allows for a NULL str to format and return the number of bytes it would write if the string was large enough... wonders never cease :-) > + char * paramval = (char *)malloc(size); No need for the cast, malloc returns void *, or has that changed? 8-) and please no space before the variable name. Humm this is all complicated, why not use asprintf and have something like: > + int i = 0; > + > + if(!paramval) > + *dst++ = '0'; > + else { > + sprintf(paramval, "%d", expr__runtimeparam); > + while(i < size) > + *dst++ = paramval[i++]; > + free(paramval); > + } char *paramval; int size = asprintf(, "%d", expr__runtimeparam); if (size < 0) *dst++ = '0'; else { while (i < size)
Re: [PATCH v4 0/8] powerpc/perf: Add json file metric support for the hv_24x7 socket/chip level events
On Tue, Mar 10, 2020 at 03:18:36PM -0300, Arnaldo Carvalho de Melo wrote: > Em Mon, Mar 09, 2020 at 10:35:06AM +0100, Jiri Olsa escreveu: > > On Mon, Mar 09, 2020 at 11:55:44AM +0530, Kajol Jain wrote: > > > First patch of the patchset fix inconsistent results we are getting when > > > we run multiple 24x7 events. > > > > > > Patchset adds json file metric support for the hv_24x7 socket/chip level > > > events. "hv_24x7" pmu interface events needs system dependent parameter > > > like socket/chip/core. For example, hv_24x7 chip level events needs > > > specific chip-id to which the data is requested should be added as part > > > of pmu events. > > > > > > So to enable JSON file support to "hv_24x7" interface, patchset expose > > > total number of sockets and chips per-socket details in sysfs > > > files (sockets, chips) under "/sys/devices/hv_24x7/interface/". > > > > > > To get sockets and number of chips per sockets, patchset adds a rtas call > > > with token "PROCESSOR_MODULE_INFO" to get these details. Patchset also > > > handles partition migration case to re-init these system depended > > > parameters by adding proper calls in post_mobility_fixup() (mobility.c). > > > > > > Patch 6 & 8 of the patchset handles perf tool plumbing needed to replace > > > the "?" character in the metric expression to proper value and hv_24x7 > > > json metric file for different Socket/chip resources. > > > > > > Patch set also enable Hz/hz prinitg for --metric-only option to print > > > metric data for bus frequency. > > > > > > Applied and tested all these patches cleanly on top of jiri's flex changes > > > with the changes done by Kan Liang for "Support metric group constraint" > > > patchset and made required changes. > > > > > > Changelog: > > > v3 -> v4 > > > - Made changes suggested by jiri. > > > > could you please mention them next time? ;-) > > > > > - Apply these patch on top of Kan liang changes. > > > > Arnaldo, could you please pull the expr flex changes and Kan's > > metric group constraint changes? it's both prereq of this patchset > > Both are now in my perf/core branch, will go upstream soon, should I go > and pickup the perf tooling bits in this patchkit? Kajol mentioned there are still some issues with the global variable, I plan to check on it this week thanks, jirka
Re: [PATCH] powerpc/pseries: fix of_read_drc_info_cell() to point at next record
On 3/10/20 10:25 AM, Nathan Lynch wrote: > Tyrel Datwyler writes: >> The expectation is that when calling of_read_drc_info_cell() >> repeatedly to parse multiple drc-info records that the in/out curval >> parameter points at the start of the next record on return. However, >> the current behavior has curval still pointing at the final value of >> the record just parsed. The result of which is that if the >> ibm,drc-info property contains multiple properties the parsed value >> of the drc_type for any record after the first has the power_domain >> value of the previous record appended to the type string. >> >> Ex: observed the following 0x prepended to PHB >> >> [ 69.485037] drc-info: type: \xff\xff\xff\xffPHB, prefix: PHB , >> index_start: 0x2001 >> [ 69.485038] drc-info: suffix_start: 1, sequential_elems: 3072, >> sequential_inc: 1 >> [ 69.485038] drc-info: power-domain: 0x, last_index: 0x2c00 >> >> Fix by incrementing curval past the power_domain value to point at >> drc_type string of next record. >> >> Fixes: a29396653b8bf ("pseries/drc-info: Search DRC properties for CPU >> indexes") > > I have a different commit hash for that: > e83636ac3334 pseries/drc-info: Search DRC properties for CPU indexes Oof, looks like I grabbed the commit hash from the SLES 15 SP1 branch. You have the correct upstream commit. Fixes: e83636ac3334 ("pseries/drc-info: Search DRC properties for CPU indexes") Michael, let me know if you want me to resubmit, or if you will fixup the Fixes tag on your end? -Tyrel > >> Signed-off-by: Tyrel Datwyler > > Otherwise: > Acked-by: Nathan Lynch >
Re: [PATCH v4 0/8] powerpc/perf: Add json file metric support for the hv_24x7 socket/chip level events
Em Mon, Mar 09, 2020 at 10:35:06AM +0100, Jiri Olsa escreveu: > On Mon, Mar 09, 2020 at 11:55:44AM +0530, Kajol Jain wrote: > > First patch of the patchset fix inconsistent results we are getting when > > we run multiple 24x7 events. > > > > Patchset adds json file metric support for the hv_24x7 socket/chip level > > events. "hv_24x7" pmu interface events needs system dependent parameter > > like socket/chip/core. For example, hv_24x7 chip level events needs > > specific chip-id to which the data is requested should be added as part > > of pmu events. > > > > So to enable JSON file support to "hv_24x7" interface, patchset expose > > total number of sockets and chips per-socket details in sysfs > > files (sockets, chips) under "/sys/devices/hv_24x7/interface/". > > > > To get sockets and number of chips per sockets, patchset adds a rtas call > > with token "PROCESSOR_MODULE_INFO" to get these details. Patchset also > > handles partition migration case to re-init these system depended > > parameters by adding proper calls in post_mobility_fixup() (mobility.c). > > > > Patch 6 & 8 of the patchset handles perf tool plumbing needed to replace > > the "?" character in the metric expression to proper value and hv_24x7 > > json metric file for different Socket/chip resources. > > > > Patch set also enable Hz/hz prinitg for --metric-only option to print > > metric data for bus frequency. > > > > Applied and tested all these patches cleanly on top of jiri's flex changes > > with the changes done by Kan Liang for "Support metric group constraint" > > patchset and made required changes. > > > > Changelog: > > v3 -> v4 > > - Made changes suggested by jiri. > > could you please mention them next time? ;-) > > > - Apply these patch on top of Kan liang changes. > > Arnaldo, could you please pull the expr flex changes and Kan's > metric group constraint changes? it's both prereq of this patchset Both are now in my perf/core branch, will go upstream soon, should I go and pickup the perf tooling bits in this patchkit? - Arnaldo
Re: [PATCH 1/4] powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs
On 3/10/20 4:09 PM, Greg Kurz wrote: > On Fri, 6 Mar 2020 16:01:40 +0100 > Cédric Le Goater wrote: > >> When a CPU is brought up, an IPI number is allocated and recorded >> under the XIVE CPU structure. Invalid IPI numbers are tracked with >> interrupt number 0x0. >> >> On the PowerNV platform, the interrupt number space starts at 0x10 and >> this works fine. However, on the sPAPR platform, it is possible to >> allocate the interrupt number 0x0 and this raises an issue when CPU 0 >> is unplugged. The XIVE spapr driver tracks allocated interrupt numbers >> in a bitmask and it is not correctly updated when interrupt number 0x0 >> is freed. It stays allocated and it is then impossible to reallocate. >> >> Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms. >> >> Reported-by: David Gibson >> Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt >> controller") >> Cc: sta...@vger.kernel.org # v4.14+ >> Signed-off-by: Cédric Le Goater >> --- > > This looks mostly good. I'm juste wondering about potential overlooks: Yes. Thanks for doing so. There are some non-obvious use of interrupt numbers under the hood. One thing we should be adding is a comment on the different interrupt number spaces. The HW interrupt numbers, the EISN numbers found on the XIVE even queue and the Linux interrupt numbers are different spaces and have different limits. XIVE_BAD_IRQ was introduced for the EISN numbers and the patch is using this value for HW numbers. This is ok because it's more a tag than a limit. > $ git grep 'if.*hw_i' arch/powerpc/ | egrep -v 'xics|XIVE_BAD_IRQ' > > > arch/powerpc/kvm/book3s_xive.h: if (out_hw_irq) > arch/powerpc/kvm/book3s_xive.h: if (out_hw_irq) > arch/powerpc/kvm/book3s_xive_template.c:else if (hw_irq && xd->flags > & XIVE_IRQ_FLAG_EOI_FW) > arch/powerpc/sysdev/xive/common.c: else if (hw_irq && xd->flags & > XIVE_IRQ_FLAG_EOI_FW) { > > This hw_irq check in xive_do_source_eoi() for example is related to: > > /* >* Note: We pass "0" to the hw_irq argument in order to >* avoid calling into the backend EOI code which we don't >* want to do in the case of a re-trigger. Backends typically >* only do EOI for LSIs anyway. >*/ > xive_do_source_eoi(0, xd); that's a hack to skip the following part of the code in case of EOI being done through FW: else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) { /* * The FW told us to call it. This happens for some * interrupt sources that need additional HW whacking * beyond the ESB manipulation. For example LPC interrupts * on P9 DD1.0 needed a latch to be clared in the LPC bridge * itself. The Firmware will take care of it. */ if (WARN_ON_ONCE(!xive_ops->eoi)) return; xive_ops->eoi(hw_irq); That was the case for PHB4 LSIs on P9 DD1 only. We could probably drop the code in Linux unless similar bugs show up on new platforms. > but it can get hw_irq from: > > xive_do_source_eoi(xc->hw_ipi, >ipi_data); That part is fine. It's an IPI EOI. > > It seems that these should use XIVE_BAD_IRQ as well or I'm missing > something ? > > arch/powerpc/sysdev/xive/common.c: if (hw_irq) This tests the XIVE IPI number which is mapped to 0 for all CPUs. See xive_request_ipi() and xive_irq_domain_map() C. > arch/powerpc/sysdev/xive/common.c: if (d->domain != > xive_irq_domain || hw_irq == 0) > > > >> arch/powerpc/sysdev/xive/xive-internal.h | 7 +++ >> arch/powerpc/sysdev/xive/common.c| 12 +++- >> arch/powerpc/sysdev/xive/native.c| 4 ++-- >> arch/powerpc/sysdev/xive/spapr.c | 4 ++-- >> 4 files changed, 14 insertions(+), 13 deletions(-) >> >> diff --git a/arch/powerpc/sysdev/xive/xive-internal.h >> b/arch/powerpc/sysdev/xive/xive-internal.h >> index 59cd366e7933..382980f4de2d 100644 >> --- a/arch/powerpc/sysdev/xive/xive-internal.h >> +++ b/arch/powerpc/sysdev/xive/xive-internal.h >> @@ -5,6 +5,13 @@ >> #ifndef __XIVE_INTERNAL_H >> #define __XIVE_INTERNAL_H >> >> +/* >> + * A "disabled" interrupt should never fire, to catch problems >> + * we set its logical number to this >> + */ >> +#define XIVE_BAD_IRQ0x7fff >> +#define XIVE_MAX_IRQ(XIVE_BAD_IRQ - 1) >> + >> /* Each CPU carry one of these with various per-CPU state */ >> struct xive_cpu { >> #ifdef CONFIG_SMP >> diff --git a/arch/powerpc/sysdev/xive/common.c >> b/arch/powerpc/sysdev/xive/common.c >> index fa49193206b6..550baba98ec9 100644 >> --- a/arch/powerpc/sysdev/xive/common.c >> +++ b/arch/powerpc/sysdev/xive/common.c >> @@ -68,13 +68,6 @@ static u32 xive_ipi_irq; >> /* Xive state for each CPU */ >> static DEFINE_PER_CPU(struct xive_cpu *, xive_cpu); >> >> -/* >> - * A "disabled"
[PATCH v3] powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits.
Reorder Linux PTE bits to (almost) match Hash PTE bits. RW Kernel : PP = 00 RO Kernel : PP = 00 RW User : PP = 01 RO User : PP = 11 So naturally, we should have _PAGE_USER = 0x001 _PAGE_RW = 0x002 Today 0x001 and 0x002 and _PAGE_PRESENT and _PAGE_HASHPTE which both are software only bits. Switch _PAGE_USER and _PAGE_PRESET Switch _PAGE_RW and _PAGE_HASHPTE This allows to remove a few insns. Signed-off-by: Christophe Leroy --- v3: rebased on today's powerpc/merge v2: rebased on today's powerpc/merge Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/hash.h | 8 arch/powerpc/kernel/head_32.S | 9 +++-- arch/powerpc/mm/book3s32/hash_low.S | 14 ++ 3 files changed, 13 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/hash.h b/arch/powerpc/include/asm/book3s/32/hash.h index 2a0a467d2985..34a7215ae81e 100644 --- a/arch/powerpc/include/asm/book3s/32/hash.h +++ b/arch/powerpc/include/asm/book3s/32/hash.h @@ -17,9 +17,9 @@ * updating the accessed and modified bits in the page table tree. */ -#define _PAGE_PRESENT 0x001 /* software: pte contains a translation */ -#define _PAGE_HASHPTE 0x002 /* hash_page has made an HPTE for this pte */ -#define _PAGE_USER 0x004 /* usermode access allowed */ +#define _PAGE_USER 0x001 /* usermode access allowed */ +#define _PAGE_RW 0x002 /* software: user write access allowed */ +#define _PAGE_PRESENT 0x004 /* software: pte contains a translation */ #define _PAGE_GUARDED 0x008 /* G: prohibit speculative access */ #define _PAGE_COHERENT 0x010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x020 /* I: cache inhibit */ @@ -27,7 +27,7 @@ #define _PAGE_DIRTY0x080 /* C: page changed */ #define _PAGE_ACCESSED 0x100 /* R: page referenced */ #define _PAGE_EXEC 0x200 /* software: exec allowed */ -#define _PAGE_RW 0x400 /* software: user write access allowed */ +#define _PAGE_HASHPTE 0x400 /* hash_page has made an HPTE for this pte */ #define _PAGE_SPECIAL 0x800 /* software: Special page */ #ifdef CONFIG_PTE_64BIT diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 97c887950c3c..daaa153950c2 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -348,7 +348,7 @@ BEGIN_MMU_FTR_SECTION andis. r0, r5, (DSISR_BAD_FAULT_32S | DSISR_DABRMATCH)@h #endif bne handle_page_fault_tramp_2 /* if not, try to put a PTE */ - rlwinm r3, r5, 32 - 15, 21, 21 /* DSISR_STORE -> _PAGE_RW */ + rlwinm r3, r5, 32 - 24, 30, 30 /* DSISR_STORE -> _PAGE_RW */ bl hash_page b handle_page_fault_tramp_1 FTR_SECTION_ELSE @@ -497,7 +497,6 @@ InstructionTLBMiss: andc. r1,r1,r0/* check access & ~permission */ bne-InstructionAddressInvalid /* return if access not permitted */ /* Convert linux-style PTE to low word of PPC-style PTE */ - rlwimi r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */ ori r1, r1, 0xe06 /* clear out reserved bits */ andcr1, r0, r1 /* PP = user? 1 : 0 */ BEGIN_FTR_SECTION @@ -565,9 +564,8 @@ DataLoadTLBMiss: * we would need to update the pte atomically with lwarx/stwcx. */ /* Convert linux-style PTE to low word of PPC-style PTE */ - rlwinm r1,r0,32-9,30,30/* _PAGE_RW -> PP msb */ - rlwimi r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */ - rlwimi r0,r0,32-1,31,31/* _PAGE_USER -> PP lsb */ + rlwinm r1,r0,0,30,30 /* _PAGE_RW -> PP msb */ + rlwimi r0,r0,1,30,30 /* _PAGE_USER -> PP msb */ ori r1,r1,0xe04 /* clear out reserved bits */ andcr1,r0,r1/* PP = user? rw? 1: 3: 0 */ BEGIN_FTR_SECTION @@ -645,7 +643,6 @@ DataStoreTLBMiss: * we would need to update the pte atomically with lwarx/stwcx. */ /* Convert linux-style PTE to low word of PPC-style PTE */ - rlwimi r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */ li r1,0xe06/* clear out reserved bits & PP msb */ andcr1,r0,r1/* PP = user? 1: 0 */ BEGIN_FTR_SECTION diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S index 877d880890fe..6d236080cb1a 100644 --- a/arch/powerpc/mm/book3s32/hash_low.S +++ b/arch/powerpc/mm/book3s32/hash_low.S @@ -35,7 +35,7 @@ mmu_hash_lock: /* * Load a PTE into the hash table, if possible. * The address is in r4, and r3 contains an access flag: - * _PAGE_RW (0x400) if a write. + * _PAGE_RW (0x002) if a write. * r9 contains the SRR1 value, from which we use the MSR_PR bit. * SPRG_THREAD contains the physical address of the current task's thread. * @@ -69,7 +69,7 @@ _GLOBAL(hash_page) blt+112f
Re: [PATCH] powerpc/pseries: fix of_read_drc_info_cell() to point at next record
Tyrel Datwyler writes: > The expectation is that when calling of_read_drc_info_cell() > repeatedly to parse multiple drc-info records that the in/out curval > parameter points at the start of the next record on return. However, > the current behavior has curval still pointing at the final value of > the record just parsed. The result of which is that if the > ibm,drc-info property contains multiple properties the parsed value > of the drc_type for any record after the first has the power_domain > value of the previous record appended to the type string. > > Ex: observed the following 0x prepended to PHB > > [ 69.485037] drc-info: type: \xff\xff\xff\xffPHB, prefix: PHB , > index_start: 0x2001 > [ 69.485038] drc-info: suffix_start: 1, sequential_elems: 3072, > sequential_inc: 1 > [ 69.485038] drc-info: power-domain: 0x, last_index: 0x2c00 > > Fix by incrementing curval past the power_domain value to point at > drc_type string of next record. > > Fixes: a29396653b8bf ("pseries/drc-info: Search DRC properties for CPU > indexes") I have a different commit hash for that: e83636ac3334 pseries/drc-info: Search DRC properties for CPU indexes > Signed-off-by: Tyrel Datwyler Otherwise: Acked-by: Nathan Lynch
Re: [PATCH 2/4] powerpc/xive: Fix xmon support on the PowerNV platform
On Fri, 6 Mar 2020 16:01:41 +0100 Cédric Le Goater wrote: > The PowerNV platform has multiple IRQ chips and the xmon command > dumping the state of the XIVE interrupt should only operate on the > XIVE IRQ chip. > > Fixes: 5896163f7f91 ("powerpc/xmon: Improve output of XIVE interrupts") > Cc: sta...@vger.kernel.org # v5.4+ > Signed-off-by: Cédric Le Goater > --- Reviewed-by: Greg Kurz > arch/powerpc/sysdev/xive/common.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/powerpc/sysdev/xive/common.c > b/arch/powerpc/sysdev/xive/common.c > index 550baba98ec9..8155adc2225a 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -261,11 +261,15 @@ notrace void xmon_xive_do_dump(int cpu) > > int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d) > { > + struct irq_chip *chip = irq_data_get_irq_chip(d); > int rc; > u32 target; > u8 prio; > u32 lirq; > > + if (!is_xive_irq(chip)) > + return -EINVAL; > + > rc = xive_ops->get_irq_config(hw_irq, , , ); > if (rc) { > xmon_printf("IRQ 0x%08x : no config rc=%d\n", hw_irq, rc);
Re: [PATCH 3/4] powerpc/xmon: Add source flags to output of XIVE interrupts
On Fri, 6 Mar 2020 16:01:42 +0100 Cédric Le Goater wrote: > Some firmwares or hypervisors can advertise different source > characteristics. Track their value under XMON. What we are mostly > interested in is the StoreEOI flag. > > Signed-off-by: Cédric Le Goater > --- Reviewed-by: Greg Kurz > arch/powerpc/sysdev/xive/common.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/sysdev/xive/common.c > b/arch/powerpc/sysdev/xive/common.c > index 8155adc2225a..c865ae554605 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -283,7 +283,10 @@ int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data > *d) > struct xive_irq_data *xd = irq_data_get_irq_handler_data(d); > u64 val = xive_esb_read(xd, XIVE_ESB_GET); > > - xmon_printf("PQ=%c%c", > + xmon_printf("flags=%c%c%c PQ=%c%c", > + xd->flags & XIVE_IRQ_FLAG_STORE_EOI ? 'S' : ' ', > + xd->flags & XIVE_IRQ_FLAG_LSI ? 'L' : ' ', > + xd->flags & XIVE_IRQ_FLAG_H_INT_ESB ? 'H' : ' ', > val & XIVE_ESB_VAL_P ? 'P' : '-', > val & XIVE_ESB_VAL_Q ? 'Q' : '-'); > }
Re: [PATCH 1/4] powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs
On Fri, 6 Mar 2020 16:01:40 +0100 Cédric Le Goater wrote: > When a CPU is brought up, an IPI number is allocated and recorded > under the XIVE CPU structure. Invalid IPI numbers are tracked with > interrupt number 0x0. > > On the PowerNV platform, the interrupt number space starts at 0x10 and > this works fine. However, on the sPAPR platform, it is possible to > allocate the interrupt number 0x0 and this raises an issue when CPU 0 > is unplugged. The XIVE spapr driver tracks allocated interrupt numbers > in a bitmask and it is not correctly updated when interrupt number 0x0 > is freed. It stays allocated and it is then impossible to reallocate. > > Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms. > > Reported-by: David Gibson > Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt > controller") > Cc: sta...@vger.kernel.org # v4.14+ > Signed-off-by: Cédric Le Goater > --- This looks mostly good. I'm juste wondering about potential overlooks: $ git grep 'if.*hw_i' arch/powerpc/ | egrep -v 'xics|XIVE_BAD_IRQ' arch/powerpc/kvm/book3s_xive.h: if (out_hw_irq) arch/powerpc/kvm/book3s_xive.h: if (out_hw_irq) arch/powerpc/kvm/book3s_xive_template.c:else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) arch/powerpc/sysdev/xive/common.c: else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) { This hw_irq check in xive_do_source_eoi() for example is related to: /* * Note: We pass "0" to the hw_irq argument in order to * avoid calling into the backend EOI code which we don't * want to do in the case of a re-trigger. Backends typically * only do EOI for LSIs anyway. */ xive_do_source_eoi(0, xd); but it can get hw_irq from: xive_do_source_eoi(xc->hw_ipi, >ipi_data); It seems that these should use XIVE_BAD_IRQ as well or I'm missing something ? arch/powerpc/sysdev/xive/common.c: if (hw_irq) arch/powerpc/sysdev/xive/common.c: if (d->domain != xive_irq_domain || hw_irq == 0) > arch/powerpc/sysdev/xive/xive-internal.h | 7 +++ > arch/powerpc/sysdev/xive/common.c| 12 +++- > arch/powerpc/sysdev/xive/native.c| 4 ++-- > arch/powerpc/sysdev/xive/spapr.c | 4 ++-- > 4 files changed, 14 insertions(+), 13 deletions(-) > > diff --git a/arch/powerpc/sysdev/xive/xive-internal.h > b/arch/powerpc/sysdev/xive/xive-internal.h > index 59cd366e7933..382980f4de2d 100644 > --- a/arch/powerpc/sysdev/xive/xive-internal.h > +++ b/arch/powerpc/sysdev/xive/xive-internal.h > @@ -5,6 +5,13 @@ > #ifndef __XIVE_INTERNAL_H > #define __XIVE_INTERNAL_H > > +/* > + * A "disabled" interrupt should never fire, to catch problems > + * we set its logical number to this > + */ > +#define XIVE_BAD_IRQ 0x7fff > +#define XIVE_MAX_IRQ (XIVE_BAD_IRQ - 1) > + > /* Each CPU carry one of these with various per-CPU state */ > struct xive_cpu { > #ifdef CONFIG_SMP > diff --git a/arch/powerpc/sysdev/xive/common.c > b/arch/powerpc/sysdev/xive/common.c > index fa49193206b6..550baba98ec9 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -68,13 +68,6 @@ static u32 xive_ipi_irq; > /* Xive state for each CPU */ > static DEFINE_PER_CPU(struct xive_cpu *, xive_cpu); > > -/* > - * A "disabled" interrupt should never fire, to catch problems > - * we set its logical number to this > - */ > -#define XIVE_BAD_IRQ 0x7fff > -#define XIVE_MAX_IRQ (XIVE_BAD_IRQ - 1) > - > /* An invalid CPU target */ > #define XIVE_INVALID_TARGET (-1) > > @@ -1153,7 +1146,7 @@ static int xive_setup_cpu_ipi(unsigned int cpu) > xc = per_cpu(xive_cpu, cpu); > > /* Check if we are already setup */ > - if (xc->hw_ipi != 0) > + if (xc->hw_ipi != XIVE_BAD_IRQ) > return 0; > > /* Grab an IPI from the backend, this will populate xc->hw_ipi */ > @@ -1190,7 +1183,7 @@ static void xive_cleanup_cpu_ipi(unsigned int cpu, > struct xive_cpu *xc) > /* Disable the IPI and free the IRQ data */ > > /* Already cleaned up ? */ > - if (xc->hw_ipi == 0) > + if (xc->hw_ipi == XIVE_BAD_IRQ) > return; > > /* Mask the IPI */ > @@ -1346,6 +1339,7 @@ static int xive_prepare_cpu(unsigned int cpu) > if (np) > xc->chip_id = of_get_ibm_chip_id(np); > of_node_put(np); > + xc->hw_ipi = XIVE_BAD_IRQ; > > per_cpu(xive_cpu, cpu) = xc; > } > diff --git a/arch/powerpc/sysdev/xive/native.c > b/arch/powerpc/sysdev/xive/native.c > index 0ff6b739052c..50e1a8e02497 100644 > --- a/arch/powerpc/sysdev/xive/native.c > +++ b/arch/powerpc/sysdev/xive/native.c > @@ -312,7 +312,7 @@ static void xive_native_put_ipi(unsigned int cpu, struct > xive_cpu *xc) > s64 rc; > > /* Free the IPI */ > - if (!xc->hw_ipi) > + if (xc->hw_ipi ==
Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9
On Thu 27-02-20 19:26:54, Michal Hocko wrote: > [Cc ppc maintainers] [...] > Please have a look at > http://lkml.kernel.org/r/52ef4673-7292-4c4c-b459-af583951b...@linux.vnet.ibm.com > for the boot log with the debugging patch which tracks set_numa_mem. > This seems to lead to a crash in the slab allocator bebcause > node_to_mem_node(0) for memory less node resolves to the memory less > node http://lkml.kernel.org/r/dd450314-d428-6776-af07-f92c04c7b...@suse.cz. > The original report is > http://lkml.kernel.org/r/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com ping -- Michal Hocko SUSE Labs
[PATCH net v2] ibmvnic: Do not process device remove during device reset
The ibmvnic driver does not check the device state when the device is removed. If the device is removed while a device reset is being processed, the remove may free structures needed by the reset, causing an oops. Fix this by checking the device state before processing device remove. Signed-off-by: Juliet Kim --- drivers/net/ethernet/ibm/ibmvnic.c | 24 ++-- drivers/net/ethernet/ibm/ibmvnic.h | 6 +- 2 files changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index c75239d8820f..4bd33245bad6 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2142,6 +2142,8 @@ static void __ibmvnic_reset(struct work_struct *work) { struct ibmvnic_rwi *rwi; struct ibmvnic_adapter *adapter; + bool saved_state = false; + unsigned long flags; u32 reset_state; int rc = 0; @@ -2153,17 +2155,25 @@ static void __ibmvnic_reset(struct work_struct *work) return; } - reset_state = adapter->state; - rwi = get_next_rwi(adapter); while (rwi) { + spin_lock_irqsave(>state_lock, flags); + if (adapter->state == VNIC_REMOVING || adapter->state == VNIC_REMOVED) { + spin_unlock_irqrestore(>state_lock, flags); kfree(rwi); rc = EBUSY; break; } + if (!saved_state) { + reset_state = adapter->state; + adapter->state = VNIC_RESETTING; + saved_state = true; + } + spin_unlock_irqrestore(>state_lock, flags); + if (rwi->reset_reason == VNIC_RESET_CHANGE_PARAM) { /* CHANGE_PARAM requestor holds rtnl_lock */ rc = do_change_param_reset(adapter, rwi, reset_state); @@ -5091,6 +5101,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) __ibmvnic_delayed_reset); INIT_LIST_HEAD(>rwi_list); spin_lock_init(>rwi_lock); + spin_lock_init(>state_lock); mutex_init(>fw_lock); init_completion(>init_done); init_completion(>fw_done); @@ -5163,8 +5174,17 @@ static int ibmvnic_remove(struct vio_dev *dev) { struct net_device *netdev = dev_get_drvdata(>dev); struct ibmvnic_adapter *adapter = netdev_priv(netdev); + unsigned long flags; + + spin_lock_irqsave(>state_lock, flags); + if (adapter->state == VNIC_RESETTING) { + spin_unlock_irqrestore(>state_lock, flags); + return -EBUSY; + } adapter->state = VNIC_REMOVING; + spin_unlock_irqrestore(>state_lock, flags); + rtnl_lock(); unregister_netdevice(netdev); diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h index 60eccaf91b12..f8416e1d4cf0 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.h +++ b/drivers/net/ethernet/ibm/ibmvnic.h @@ -941,7 +941,8 @@ enum vnic_state {VNIC_PROBING = 1, VNIC_CLOSING, VNIC_CLOSED, VNIC_REMOVING, -VNIC_REMOVED}; +VNIC_REMOVED, +VNIC_RESETTING}; enum ibmvnic_reset_reason {VNIC_RESET_FAILOVER = 1, VNIC_RESET_MOBILITY, @@ -1090,4 +1091,7 @@ struct ibmvnic_adapter { struct ibmvnic_tunables desired; struct ibmvnic_tunables fallback; + + /* Used for serializatin of state field */ + spinlock_t state_lock; }; -- 2.25.0
[Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
https://bugzilla.kernel.org/show_bug.cgi?id=206669 --- Comment #10 from John Paul Adrian Glaubitz (glaub...@physik.fu-berlin.de) --- (In reply to Aneesh Kumar KV from comment #9) > Also, can you try disabling THP. echo "never" > > /sys/kernel/mm/transparent_hugepage/enabled Yes. Just disabled. FWIW, the machine just crashed some minutes ago but I didn't have a serial console open to capture the trace. -- You are receiving this mail because: You are watching the assignee of the bug.
[Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
https://bugzilla.kernel.org/show_bug.cgi?id=206669 Aneesh Kumar KV (aneesh.ku...@linux.ibm.com) changed: What|Removed |Added CC||aneesh.ku...@linux.ibm.com --- Comment #9 from Aneesh Kumar KV (aneesh.ku...@linux.ibm.com) --- Also, can you try disabling THP. echo "never" > /sys/kernel/mm/transparent_hugepage/enabled -aneesh -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v2] powerpc/Makefile: Mark phony targets as PHONY
Masahiro Yamada writes: > On Fri, Mar 6, 2020 at 9:27 AM Michael Ellerman > wrote: >> >> On Wed, 2020-02-19 at 00:04:34 UTC, Michael Ellerman wrote: >> > Some of our phony targets are not marked as such. This can lead to >> > confusing errors, eg: >> > >> > $ make clean >> > $ touch install >> > $ make install >> > make: 'install' is up to date. >> > $ >> > >> > Fix it by adding them to the PHONY variable which is marked phony in >> > the top-level Makefile, or in scripts/Makefile.build for the boot >> > Makefile. >> > >> > Suggested-by: Masahiro Yamada >> > Signed-off-by: Michael Ellerman >> >> Applied to powerpc next. >> >> https://git.kernel.org/powerpc/c/d42c6d0f8d004c3661dde3c376ed637e9f292c22 >> > > You do not have to double your Signed-off-by. Oops :/ My scripts don't cope with applying my own patches very well. Will try to fix it. cheers
Re: [PATCH v3] powerpc: setup_64: set up PACA earlier to avoid kcov problems
Nicholas Piggin writes: > Daniel Axtens's on March 6, 2020 5:30 pm: >> kcov instrumentation is collected the __sanitizer_cov_trace_pc hook in >> kernel/kcov.c. The compiler inserts these hooks into every basic block >> unless kcov is disabled for that file. >> >> We then have a deep call-chain: >> - __sanitizer_cov_trace_pc calls to check_kcov_mode() >> - check_kcov_mode() (kernel/kcov.c) calls in_task() >> - in_task() (include/linux/preempt.h) calls preempt_count(). >> - preempt_count() (include/asm-generic/preempt.h) calls >> current_thread_info() >> - because powerpc has THREAD_INFO_IN_TASK, current_thread_info() >> (include/linux/thread_info.h) is defined to 'current' >> - current (arch/powerpc/include/asm/current.h) is defined to >> get_current(). >> - get_current (same file) loads an offset of r13. >> - arch/powerpc/include/asm/paca.h makes r13 a register variable >> called local_paca - it is the PACA for the current CPU, so >> this has the effect of loading the current task from PACA. >> - get_current returns the current task from PACA, >> - current_thread_info returns the task cast to a thread_info >> - preempt_count dereferences the thread_info to load preempt_count >> - that value is used by in_task and so on up the chain >> >> The problem is: >> >> - kcov instrumentation is enabled for arch/powerpc/kernel/dt_cpu_ftrs.c >> >> - even if it were not, dt_cpu_ftrs_init calls generic dt parsing code >>which should definitely have instrumentation enabled. >> >> - setup_64.c calls dt_cpu_ftrs_init before it sets up a PACA. >> >> - If we don't set up a paca, r13 will contain unpredictable data. >> >> - In a zImage compiled with kcov and KASAN, we see r13 containing a value >>that leads to dereferencing invalid memory (something like >>912a72603d420015). >> >> - Weirdly, the same kernel as a vmlinux loaded directly by qemu does not >>crash. Investigating with gdb, it seems that in the vmlinux boot case, >>r13 is near enough to zero that we just happen to be able to read that >>part of memory (we're operating with translation off at this point) and >>the current pointer also happens to land in readable memory and >>everything just works. >> >> - PACA setup refers to CPU features - setup_paca() looks at >>early_cpu_has_feature(CPU_FTR_HVMODE) >> >> There's no generic kill switch for kcov (as far as I can tell), and we >> don't want to have to turn off instrumentation in the generic dt parsing >> code (which lives outside arch/powerpc/) just because we don't have a real >> paca or task yet. >> >> So: >> - change the test when setting up a PACA to consider the actual value of >>the MSR rather than the CPU feature. >> >> - move the PACA setup to before the cpu feature parsing. > > Hmm. Problem is that equally we want PACA to be sane before we call too > far into the rest of the kernel ("generic dt parsing code"). But currently we call into that code with no paca at all. Or rather, with r13 pointing somewhere random that will be interpreted as being a paca. This took a while for Daniel to debug because depending on how you boot r13 contains a different junk value. That junk value may not point to memory at all, or if it does the memory it points to may or may not send you down the wrong path, depending on which exact bit you're looking at in some random location. So this is really not about kcov from my POV, that's just how we discovered it. cheers
Re: [PATCH] macintosh: windfarm: fix MODINFO regression
On Tue, Mar 03, 2020 at 01:50:46PM +0100, Wolfram Sang wrote: > Commit af503716ac14 made sure OF devices get an OF style modalias with > I2C events. It assumed all in-tree users were converted, yet it missed > some Macintosh drivers. > > Add an OF module device table for all windfarm drivers to make them > automatically load again. > > Fixes: af503716ac14 ("i2c: core: report OF style module alias for devices > registered via OF") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=199471 > Reported-by: Erhard Furtner > Tested-by: Erhard Furtner > Signed-off-by: Wolfram Sang Thanks, Michael! Applied to for-current. signature.asc Description: PGP signature
Re: ppc32 panic on boot on linux-next
Hello, On 10.03.2020 06:44, Michael Ellerman wrote: Christophe Leroy writes: Le 07/03/2020 à 09:42, Christophe Leroy a écrit : Le 06/03/2020 à 20:05, Nick Desaulniers a écrit : As a heads up, our CI went red last night, seems like a panic from free_initmem? Is this a known issue? Thanks for the heads up. No such issue with either 8xx or book3s/32. I've now been able to reproduce it with bamboo QEMU. Reverting 2efc7c085f05 makes it disappear. I'll investigate. Ok, I found the problem. virt_to_kpte() lacks a NULL pmd check. I'll send a patch for that. However, if there is no PMD I guess this area is mapped through some kind of block mapping. Therefore it should bail out of the function through: if (v_block_mapped(address)) return 0; Can someone who knows BOOKE investigate that ? Not sure we have anyone left? cheers With latest linux-next, PPC32 Book-E boots to prompt: https://paste.debian.net/1134272/ --- Best Regards, Laurentiu
Re: [PATCH] macintosh: windfarm: fix MODINFO regression
Wolfram Sang writes: > On Tue, Mar 03, 2020 at 01:50:46PM +0100, Wolfram Sang wrote: >> Commit af503716ac14 made sure OF devices get an OF style modalias with >> I2C events. It assumed all in-tree users were converted, yet it missed >> some Macintosh drivers. >> >> Add an OF module device table for all windfarm drivers to make them >> automatically load again. >> >> Fixes: af503716ac14 ("i2c: core: report OF style module alias for devices >> registered via OF") >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199471 >> Reported-by: Erhard Furtner >> Tested-by: Erhard Furtner >> Signed-off-by: Wolfram Sang > > Michael, I can take this via I2C again, if you ack it. Thanks, done. cheers
Re: [PATCH] macintosh: windfarm: fix MODINFO regression
Wolfram Sang writes: > Commit af503716ac14 made sure OF devices get an OF style modalias with > I2C events. It assumed all in-tree users were converted, yet it missed > some Macintosh drivers. > > Add an OF module device table for all windfarm drivers to make them > automatically load again. > > Fixes: af503716ac14 ("i2c: core: report OF style module alias for devices > registered via OF") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=199471 > Reported-by: Erhard Furtner > Tested-by: Erhard Furtner > Signed-off-by: Wolfram Sang > --- > > This should also help with this: > https://lists.debian.org/debian-powerpc/2020/01/msg00062.html > Some more testing would be appreciated because lm75 also has some code > changes I can't test myself obviusly. > > By grepping, I found some more potential candidates (using a "MAC," > prefix but not defining a OF MODULE DEVICE TABLE). Does someone know > about bugreports filed for those? I don't want to change them for no > reason: > > drivers/macintosh/ams/ams-i2c.c > drivers/macintosh/therm_adt746x.c > sound/aoa/codecs/onyx.c > sound/aoa/codecs/tas.c > sound/ppc/keywest.c > > Happy hacking, > >Wolfram > > drivers/macintosh/windfarm_ad7417_sensor.c | 7 +++ > drivers/macintosh/windfarm_fcu_controls.c | 7 +++ > drivers/macintosh/windfarm_lm75_sensor.c| 16 +++- > drivers/macintosh/windfarm_lm87_sensor.c| 7 +++ > drivers/macintosh/windfarm_max6690_sensor.c | 7 +++ > drivers/macintosh/windfarm_smu_sat.c| 7 +++ > 6 files changed, 50 insertions(+), 1 deletion(-) Acked-by: Michael Ellerman (powerpc) cheers
CDC ethernet gadget: complete system freeze
We have an embedded T1042 NXP CDC ethernet gadget which seems to completely freeze when an usb0 I/F is established and one do 1 of two things: 1) reboot the connected Linux laptop -> CDC gadget appears to enter complete system freeze. 2) on laptop, ifconfig usb0 down; rmmod cdc_ether -> CDC gaget freezes I cannot finns and OOPS or other trace in console or syslog. I could really use som ideas here. Linux kernel 4.19.108 (4.19.76 has the same issue) Jocke
Re: [PATCH] i2c: powermac: correct comment about custom handling
On Tue, Feb 25, 2020 at 03:26:13PM +0100, Wolfram Sang wrote: > The comment had some flaws which are now fixed: > - the prefix is 'MAC' not 'AAPL' > - no kernel coding style and too short length > - 'we do' instead of 'we to' > > Signed-off-by: Wolfram Sang Applied to for-next, thanks! signature.asc Description: PGP signature
Re: [PATCH] macintosh: windfarm: fix MODINFO regression
On Tue, Mar 03, 2020 at 01:50:46PM +0100, Wolfram Sang wrote: > Commit af503716ac14 made sure OF devices get an OF style modalias with > I2C events. It assumed all in-tree users were converted, yet it missed > some Macintosh drivers. > > Add an OF module device table for all windfarm drivers to make them > automatically load again. > > Fixes: af503716ac14 ("i2c: core: report OF style module alias for devices > registered via OF") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=199471 > Reported-by: Erhard Furtner > Tested-by: Erhard Furtner > Signed-off-by: Wolfram Sang Michael, I can take this via I2C again, if you ack it. Thanks, Wolfram signature.asc Description: PGP signature