Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
> I also checked all the other .ko files and they were properly aligned. So I > think this should hopefully work, and I like that its not a per-arch fix. > > Sachin, sorry to bother you again, but I'm hoping you can try David's latest > patch to scripts/module-common.lds, just to test in your setup. I tested the patch on 2 different systems where I ran into this problem. In both cases the system boots without any warning. A quick module load/unload test also worked correctly. Tested-by: Sachin SantThanks -Sachin
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
Jason Baronwrites: ... > I also checked all the other .ko files and they were properly aligned. > So I think this should hopefully work, and I like that its not a > per-arch fix. > > Sachin, sorry to bother you again, but I'm hoping you can try David's > latest patch to scripts/module-common.lds, just to test in your setup. It does fix the problem. I was reproducing with crc_t10dif: [ 695.890552] [ cut here ] [ 695.890709] WARNING: CPU: 15 PID: 3019 at ../kernel/jump_label.c:287 static_key_set_entries+0x74/0xa0 [ 695.890710] Modules linked in: crc_t10dif(+) crct10dif_generic crct10dif_common ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c kvm virtio_balloon binfmt_misc autofs4 virtio_net virtio_pci virtio_ring virtio Which had: [21] __jump_table PROGBITS 0004e8 18 00 WA 0 0 1 And now has: [18] __jump_table PROGBITS 0004d0 18 00 WA 0 0 8 And all other modules have an alignment of 8 on __jump_table, as expected. I'm inclined to merge a version of the balign patch for powerpc anyway, just to be on the safe side. I guess the old code was coping fine with the unaligned keys, but it still makes me nervous. cheers
Re: [PATCH v2] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails
Stewart Smithwrites: > Vipin K Parashar writes: >> Added check for OPAL_WRONG_STATE error code returned from OPAL. >> Currently Linux flashes "unexpected error" over console for this >> error. This will avoid throwing such message and return I/O error >> for such OPAL failures. >> >> Signed-off-by: Vipin K Parashar >> --- >> Changes in v2: >> - Added log message indicating sleeping/offline core >>for OPAL_WRONG_STATE >> >> arch/powerpc/platforms/powernv/opal.c | 5 - >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/powernv/opal.c >> b/arch/powerpc/platforms/powernv/opal.c >> index 86d9fde..8af230e 100644 >> --- a/arch/powerpc/platforms/powernv/opal.c >> +++ b/arch/powerpc/platforms/powernv/opal.c >> @@ -869,8 +869,11 @@ int opal_error_code(int rc) >> case OPAL_UNSUPPORTED: return -EIO; >> case OPAL_HARDWARE: return -EIO; >> case OPAL_INTERNAL_ERROR: return -EIO; >> +case OPAL_WRONG_STATE: >> +pr_notice("%s: Core sleeping/offline\n", __func__); >> +return -EIO; > > Since this is part of opal_error_code() though, this will be printed for > any OPAL call that returns that. > > Why not have the sensor code do this: > > rc = opal_sensor_read(foo) > if (rc == OPAL_WRONG_STATE) >return -EIO; > else >return oal_error_code(rc); > > ? Yeah, that's exactly what I said to do, though perhaps I didn't say it clearly enough? ... changing just opal_get_sensor_data() to handle OPAL_WRONG_STATE would be OK, with a comment explaining that we might be asked to read a sensor on an offline CPU and we aren't able to detect that. cheers
[GIT PULL] Please pull powerpc/linux.git powerpc-4.11-2 tag
Hi Linus, Please pull the second set of powerpc updates for 4.11. This includes an update of the disassembly code in xmon, from binutils. We've received permission from all the authors of the relevant binutils changes to relicense their changes to the relevant files from GPLv3 to GPLv2. That was the reason I split the pull request in two, so I could make 100% sure we had all those approvals. The rest is fairly boring arch stuff, plus a few changes to the pnv-php PCI hotplug driver, which Bjorn has said he's OK with us merging. This does generate a fairly ugly conflict in asm-offsets.c, between changes in your tree and a big patch I merged to switch that file to use the OFFSET macro. In hindsight I should have done the OFFSET change after rc1. The resolution is to drop TSK_STACK_CANARY which was removed in your tree, and then for both 32 & 64-bit accounting.user_time was renamed to accounting.utime, and accounting.system_time was renamed to accounting.stime. Stephen has the correct resolution in linux-next: http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/diff/arch/powerpc/kernel/asm-offsets.c?id=99b07077f0c2cae8d4ff7263bf45df49f79f4c0d=9f3768e02335ddd6ebe1d85d5cb3a68ee6264004 cheers The following changes since commit 438e69b52be776c035aa2a851ccc1709033d729b: powerpc/mm/radix: Skip ptesync in pte update helpers (2017-02-15 20:02:41 +1100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.11-2 for you to fetch changes up to 9f3768e02335ddd6ebe1d85d5cb3a68ee6264004: powerpc: Remove leftover cputime_to_nsecs call causing build error (2017-02-23 08:33:20 +1100) powerpc updates for 4.11 part 2 Highlights include: - An update of the disassembly code used by xmon to the latest versions in binutils. We've received permission from all the authors of the relevant binutils changes to relicense their changes to the relevant files from GPLv3 to GPLv2, for inclusion in Linux. Thanks to Peter Bergner for doing the leg work to get permission from everyone. - Addition of the "architected" Power9 CPU table entry, allowing us to boot in Power9 architected mode under a hypervisor. - Updates to the Power9 PMU code. - Implementation of clear_bit_unlock_is_negative_byte() to optimise unlock_page(). - Freescale updates from Scott: "Highlights include 8xx breakpoints and perf, t1042rdb display support, and board updates." Thanks to: Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas Miller, Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan, Michael Roth, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Peter Bergner, Paul E. McKenney, Rashmica Gupta, Russell Currey, Sahil Mehta, Stewart Smith. Al Viro (1): powerpc/spufs: Get rid of broken fasync stuff Andrew Donnellan (1): cxl: fix nested locking hang during EEH hotplug Aneesh Kumar K.V (1): powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU Balbir Singh (3): powerpc/xmon: Update ppc-dis/opc.c and ppc.h powerpc/xmon: Apply binutils changes to upgrade disassembly powerpc/xmon: Enable disassembly files (compilation changes) Christophe Leroy (4): powerpc/32: Enable HW_BREAKPOINT on BOOK3S powerpc/8xx: Implement hw_breakpoint powerpc/32: Remove FIX_SRR1 powerpc/8xx: Perf events on PPC 8xx Douglas Miller (1): powerpc/xmon: Dump memory in CPU endian format Frédéric Weisbecker (1): powerpc: Remove leftover cputime_to_nsecs call causing build error Gavin Shan (9): drivers/pci/hotplug: Handle presence detection change properly drivers/pci/hotplug: Fix initial state for empty slot drivers/pci/hotplug: Mask PDC interrupt if required pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot() pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts pci/hotplug/pnv-php: Disable MSI and PCI device properly powerpc/mm: Fix typo in set_pte_at() powerpc/kernel: Remove error message in pcibios_setup_phb_resources() powerpc/powernv: Remove unused variable in pnv_pci_sriov_disable() Jason Jin (1): powerpc/85xx: Enable display support for t1042rdb Madhavan Srinivasan (8): powerpc/perf: Factor out event_alternative function powerpc/perf: Add PM_INST_DISP event to Power9 event list powerpc/perf: Add alternative event table and function for power9 powerpc/perf: Use PM_INST_DISP for generic instructions sample powerpc/perf: Use Instruction Counter value powerpc/perf: Add restrictions to PMC5 in power9 DD1 powerpc/perf: Avoid FAB_*_MATCH checks for power9 powerpc/perf: use is_kernel_addr macro in perf_get_misc_flags() Michael Ellerman (6): powerpc/64e: Fix bogus usage of
Re: [PATCH V3 2/2] powerpc: Update to new option-vector-5 format for CAS
On Tue, Feb 28, 2017 at 05:03:48PM +1100, Suraj Jitindar Singh wrote: > On POWER9 the ibm,client-architecture-support (CAS) negotiation process > has been updated to change how the host to guest negotiation is done for > the new hash/radix mmu as well as the nest mmu, process tables and guest > translation shootdown (GTSE). > > The host tells the guest which options it supports in > ibm,arch-vec-5-platform-support. The guest then chooses a subset of these > to request in the CAS call and these are agreed to in the > ibm,architecture-vec-5 property of the chosen node. > > Thus we read ibm,arch-vec-5-platform-support and make our selection before > calling CAS. We then parse the ibm,architecture-vec-5 property of the > chosen node to check whether we should run as hash or radix. > > ibm,arch-vec-5-platform-support format: > > index value pairs:... > > index: Option vector 5 byte number > val: Some representation of supported values > > Signed-off-by: Suraj Jitindar Singh It would be good if the description mentioned the PAPR architecture change request "CAS option vector additions for P9". Apart from that, Acked-by: Paul Mackerras
Re: [RFC 3/3] cxl: Reset freeze counters before adapter PERST for flashing new image
[resending this since it didn't get delivered to the list] On Tue, 2017-02-28 at 12:52 +0530, Vaibhav Jain wrote: > The patch resets the freeze counter on eeh_pe struct for PHB > associated with the cxl pci adapter. This would enable re-flashing of > the cxl-adapter beyond the default limit of 5. > > Signed-off-by: Vaibhav Jain> --- > drivers/misc/cxl/pci.c | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c > index 679afc9..3b14688 100644 > --- a/drivers/misc/cxl/pci.c > +++ b/drivers/misc/cxl/pci.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > > #include "cxl.h" > #include > @@ -1229,6 +1230,8 @@ static void cxl_pci_remove_afu(struct cxl_afu *afu) > int cxl_pci_reset(struct cxl *adapter) > { > struct pci_dev *dev = to_pci_dev(adapter->dev.parent); > + struct eeh_dev *eehdev = pci_dev_to_eeh_dev(dev); > + struct eeh_pe *devpe = eeh_dev_to_pe(eehdev); EEH code typically uses "edev" and "pe" for these variable names > int rc; > > if (adapter->perst_same_image) { > @@ -1242,6 +1245,18 @@ int cxl_pci_reset(struct cxl *adapter) > /* the adapter is about to be reset, so ignore errors */ > cxl_data_cache_flush(adapter); > > + /* If loading a new image, reset freeze counters for the PHB > + * associated with the adapter. > + */ > + if (devpe && adapter->perst_loads_image) { > + /* Find the pe associated with the device PHB */ > + while (devpe->parent != NULL && (devpe->type & EEH_PE_PHB) == > 0) > + devpe = devpe->parent; > + > + dev_info(>dev, "Resetting freeze counters for the > PHB\n"); Would be good to mention "EEH" here to help with grepping, alternatively a similar message could be printed in eeh_pe_reset_freeze_counter() displaying the PHB information. > + eeh_pe_reset_freeze_counter(devpe); > + } > + > /* pcie_warm_reset requests a fundamental pci reset which includes a > * PERST assert/deassert. PERST triggers a loading of the image > * if "user" or "factory" is selected in sysfs */
Re: [RFC 2/3] powerpc/eeh: Introduce function eeh_pe_reset_freeze_counter
On Tue, 2017-02-28 at 12:32 +0530, Vaibhav Jain wrote: > This patch introduces function eeh_pe_reset_freeze_counter which can > be used to reset the PE's freeze count variable outside eeh code. This > is useful for devices that can acquire a different personality after > a PERST event (e.g FPGA Adapters). Presently an existing freeze > count for an adapter with personality N will be taken into account > when the adapter acquired personality N+1. > > By calling eeh_pe_reset_freeze_counter drivers can reset the freeze > counter for an adapter once it has acquired a new personality and > ideally wont be plagued by the failures similar to the one before. Same comment as before about adding () to the end of function names > > Signed-off-by: Vaibhav Jain> --- > arch/powerpc/include/asm/eeh.h | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h > index 68806be..19ac6d0 100644 > --- a/arch/powerpc/include/asm/eeh.h > +++ b/arch/powerpc/include/asm/eeh.h > @@ -266,6 +266,13 @@ struct eeh_pe *eeh_pe_get(struct eeh_dev *edev); > int eeh_add_to_parent_pe(struct eeh_dev *edev); > int eeh_rmv_from_parent_pe(struct eeh_dev *edev); > int eeh_pe_update_freeze_counter(struct eeh_pe *pe); > + > +/* Reset the PE freeze counter */ I would like to see a comment here noting that doing this is in general a bad idea, and this shouldn't be called for regular PCI devices. > +static inline void eeh_pe_reset_freeze_counter(struct eeh_pe *pe) > +{ > + pe->freeze_count = 0; > +} > + > void *eeh_pe_traverse(struct eeh_pe *root, > eeh_traverse_func fn, void *flag); > void *eeh_pe_dev_traverse(struct eeh_pe *root, > @@ -339,6 +346,8 @@ static inline int eeh_check_failure(const volatile void > __iomem *token) > return 0; > } > > +static inline void eeh_pe_reset_freeze_counter(struct eeh_pe *pe) { } > + > #define eeh_dev_check_failure(x) (0) > > static inline void eeh_addr_cache_build(void) { }
Re: [RFC 1/3] powerpc/eeh: Refactor eeh_pe_update_time_stamp to update freeze_count
On Tue, 2017-02-28 at 12:32 +0530, Vaibhav Jain wrote: > This patch introduces a new function named > eeh_pe_update_freeze_counter replacing existing function > eeh_pe_update_time_stamp. The new function also manages the value of > freeze_count along with tstamp to track the number of times the PE > froze in last one hour and if the freeze_count > eeh_max_freezes then > reports an error(-ENOTRECOVERABLE) to indicate that the PE should be > permanently disabled. You should add parens to the end of function names in commit messages, i.e. eeh_pe_update_freeze_counter(). Same goes for the title > > This patch should not introduce any behavioral change. > > Signed-off-by: Vaibhav Jain> --- > arch/powerpc/include/asm/eeh.h | 2 +- > arch/powerpc/kernel/eeh_driver.c | 20 +++- > arch/powerpc/kernel/eeh_pe.c | 50 ++- > - > 3 files changed, 37 insertions(+), 35 deletions(-) > > diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h > index 8e37b71..68806be 100644 > --- a/arch/powerpc/include/asm/eeh.h > +++ b/arch/powerpc/include/asm/eeh.h > @@ -265,7 +265,7 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb); > struct eeh_pe *eeh_pe_get(struct eeh_dev *edev); > int eeh_add_to_parent_pe(struct eeh_dev *edev); > int eeh_rmv_from_parent_pe(struct eeh_dev *edev); > -void eeh_pe_update_time_stamp(struct eeh_pe *pe); > +int eeh_pe_update_freeze_counter(struct eeh_pe *pe); > void *eeh_pe_traverse(struct eeh_pe *root, > eeh_traverse_func fn, void *flag); > void *eeh_pe_dev_traverse(struct eeh_pe *root, > diff --git a/arch/powerpc/kernel/eeh_driver.c > b/arch/powerpc/kernel/eeh_driver.c > index b948871..326b4e4 100644 > --- a/arch/powerpc/kernel/eeh_driver.c > +++ b/arch/powerpc/kernel/eeh_driver.c > @@ -739,10 +739,9 @@ static void eeh_handle_normal_event(struct eeh_pe *pe) > return; > } > > - eeh_pe_update_time_stamp(pe); > - pe->freeze_count++; > - if (pe->freeze_count > eeh_max_freezes) > - goto excess_failures; > + /* Update freeze counters and see if we have tripped max-freeze limit > */ > + if (eeh_pe_update_freeze_counter(pe) < 0) I would prefer dropping the "< 0" check here > + goto perm_error; > pr_warn("EEH: This PCI device has failed %d times in the last > hour\n", > pe->freeze_count); > > @@ -872,19 +871,6 @@ static void eeh_handle_normal_event(struct eeh_pe *pe) > > return; > > -excess_failures: > - /* > - * About 90% of all real-life EEH failures in the field > - * are due to poorly seated PCI cards. Only 10% or so are > - * due to actual, failed cards. > - */ > - pr_err("EEH: PHB#%x-PE#%x has failed %d times in the\n" > - "last hour and has been permanently disabled.\n" > - "Please try reseating or replacing it.\n", > - pe->phb->global_number, pe->addr, > - pe->freeze_count); > - goto perm_error; > - > hard_fail: > pr_err("EEH: Unable to recover from failure from PHB#%x-PE#%x.\n" > "Please try reseating or replacing it\n", > diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c > index cc4b206..cf70a8b 100644 > --- a/arch/powerpc/kernel/eeh_pe.c > +++ b/arch/powerpc/kernel/eeh_pe.c > @@ -504,30 +504,46 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev) > } > > /** > - * eeh_pe_update_time_stamp - Update PE's frozen time stamp > + * eeh_pe_update_freeze_counter - Update PE's frozen time stamp > + * and freeze counter > * @pe: EEH PE > * > - * We have time stamp for each PE to trace its time of getting > - * frozen in last hour. The function should be called to update > - * the time stamp on first error of the specific PE. On the other > - * handle, we needn't account for errors happened in last hour. > + * We have a freeze-counter and time stamp for each PE to trace drop the hyphen between freeze and counter > + * number of times the PE was frozen in last one hour. This function this should probably "in the last hour", I know it was copied from the existing comment but since you're in here, you may as well fix it > + * updates the PE's freeze counter and if its > eeh_max_freezes then this reads awkwardly. perhaps "This function updates the PE's freeze counter and returns an error if the number of freezes is greater than eeh_max_freezes." > + * returns an error. The function should be called to once every-time > + * a specific PE freezes. a hyphen between "every time" is unnecessary > */ > -void eeh_pe_update_time_stamp(struct eeh_pe *pe) > +int eeh_pe_update_freeze_counter(struct eeh_pe *pe) > { > struct timeval tstamp; > > - if (!pe) return; > + if (!pe) > + return -EINVAL; > > - if (pe->freeze_count <= 0) { > - pe->freeze_count = 0; > - do_gettimeofday(>tstamp); > - }
Re: [PATCH v3] powerpc/powernv: add hdat attribute to sysfs
On 27/02/17 21:56, Michael Ellerman wrote: +/* HDAT attribute for sysfs */ +static struct bin_attribute hdat_attr = { + .attr = {.name = "hdat", .mode = 0444}, ajd and oohal report to my office. ...ACK... :/ -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH] powerpc: booke: fix boot crash due to null hugepd
On Tue, 2017-02-28 at 14:55 +, Laurentiu Tudor wrote: > Hi, > > Some more information on the crash, inline. > > On 02/17/2017 02:18 PM, Aneesh Kumar K.V wrote: > > > > laurentiu.tu...@nxp.com writes: > > > > > > > > From: Laurentiu Tudor> > > > > > On 32-bit book-e machines, hugepd_ok() does not take > > > into account null hugepd values, causing this crash at boot: > > > > > > Unable to handle kernel paging request for data at address 0x8000 > > > Faulting instruction address: 0xc00182a8 > > > Oops: Kernel access of bad area, sig: 11 [#1] > > > SMP NR_CPUS=24 > > > CoreNet Generic > > > Modules linked in: > > > CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW 4.10.0-rc8- > > > 00016-g69b1f87 #11 > > > task: e505 task.stack: e5058000 > > > NIP: c00182a8 LR: c001829c CTR: 7ffe > > > REGS: e5059c50 TRAP: 0300 Tainted: GW(4.10.0-rc8- > > > 00016-g69b1f87) > > > MSR: 00021002 > > > CR: 88428e82 XER: > > > DEAR: 8000 ESR: > > > GPR00: c0107510 e5059d00 e505 8000 bff1 e5059d0c e5059d08 > > > 2017 > > > GPR08: 28428e82 c00027d0 > > > > > > GPR16: 88a28e82 2000 48422e82 88a28e84 > > > dd004000 > > > GPR24: e5059e38 bff1 dd004000 0001 00029002 > > > bff1 > > > NIP [c00182a8] follow_huge_addr+0x38/0xf0 > > > LR [c001829c] follow_huge_addr+0x2c/0xf0 > > > Call Trace: > > > [e5059d00] [e5059d00] 0xe5059d00 (unreliable) > > > [e5059d20] [c0107510] follow_page_mask+0x40/0x3c0 > > > [e5059d80] [c0107958] __get_user_pages+0xc8/0x420 > > > [e5059de0] [c010817c] get_user_pages_remote+0x8c/0x230 > > > [e5059e30] [c013f170] copy_strings+0x110/0x3a0 > > > [e5059ea0] [c013f42c] copy_strings_kernel+0x2c/0x50 > > > [e5059ec0] [c0141324] do_execveat_common+0x474/0x620 > > > [e5059f10] [c01414fc] do_execve+0x2c/0x40 > > > [e5059f20] [c0001f68] try_to_run_init_process+0x18/0x60 > > > [e5059f30] [c000289c] kernel_init+0xcc/0x120 > > > [e5059f40] [c000f1e8] ret_from_kernel_thread+0x5c/0x64 > > > Instruction dump: > > > bfc10018 7c9f2378 90010024 7fc000a6 7c000146 80630020 38a1000c 38c10008 > > > 4bfff869 2c03 41c20090 81210008 <8143> 81630004 3860ffea > > > 2f89 > > > ---[ end trace 4bf94e15fd9fa824 ]--- > > > > Which code path is that. That null should be filtered by the if > > (pmd_none(pmd)) check in find_linux_pte_or_hugepte right ? > The crash happens when __find_linux_pte_or_hugepte() calls hugepd_ok(), > on this line [1]. It's triggered when __find_linux_pte_or_hugepte() is > first called, when the kernel tries to spawn the init process. The input > effective address (ea arg) is bff1. This is the call stack: What is the pmd value? There's a pmd_none() check before that line. That said, regardless of what's going wrong here, it would be simpler and more robust if is_hugepd() returned false for empty ptes rather than assuming the caller explicitly checked pmd_none(). -Scott
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
On 02/28/2017 03:15 PM, David Daney wrote: On 02/28/2017 11:34 AM, Jason Baron wrote: On 02/28/2017 02:22 PM, David Daney wrote: On 02/28/2017 11:05 AM, David Daney wrote: On 02/28/2017 10:39 AM, Jason Baron wrote: [...] I suspect that the alignment of the __jump_table section in the .ko files is not correct, and you are seeing some sort of problem due to that. Hi, Yes, if you look at the trace that Sachin sent the module being loaded that does the WARN_ON() is nfsd.ko. That module from Sachin's trace has: [31] __jump_table PROGBITS 03fd77 c0 18 WAM 0 0 1 The problem is then the section alignment (last column) for power. On mips with no patches applied, we get: [17] __jump_table PROGBITS 00d2c0 48 00 WA 0 0 8 Look, proper alignment! The question I have is why do the power ".llong" and ".long" assembler directives not force section alignment? Is there an alternative that could be used that would result in the proper alignment? Would ".word" work? If not, then I would say patch only power with your balign thing. 8-byte alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel I think the proper fix is either: A) Modify scripts/module-common.lds to force __jump_table alignment for all architectures. B) Add arch/powerpc/kernel/module.lds to force __jump_table alignment for powerpc only. David. Ok, I can try adding it to the linger script. FWIW, here is my before and after with the .balign thing for the nfsd.ko module on powperc (using a cross-compiler): before: [31] __jump_table PROGBITS 03ee3e f0 00 WA 0 0 1 after: [31] __jump_table PROGBITS 03ee40 f0 00 WA 0 0 4 Try the (lightly tested) attached. If it works and Steven likes it, perhaps someone can merge it. David. So before your module.lds script: # powerpc64-linux-readelf -eW fs/nfsd/nfsd.o | grep jump [31] __jump_table PROGBITS 03edfe f0 00 WA 0 0 1 # powerpc64-linux-readelf -eW fs/nfsd/nfsd.ko | grep jump [32] __jump_table PROGBITS 044046 f0 00 WA 0 0 1 With your patch: # powerpc64-linux-readelf -eW fs/nfsd/nfsd.o | grep jump [31] __jump_table PROGBITS 03edfe f0 00 WA 0 0 1 # powerpc64-linux-readelf -eW fs/nfsd/nfsd.ko | grep jump [18] __jump_table PROGBITS 03e358 f0 00 WA 0 0 8 I also checked all the other .ko files and they were properly aligned. So I think this should hopefully work, and I like that its not a per-arch fix. Sachin, sorry to bother you again, but I'm hoping you can try David's latest patch to scripts/module-common.lds, just to test in your setup. Thanks, -Jason
Re: [PATCH V3 1/2] powerpc: Parse the command line before calling CAS
On 28/02/17 17:03, Suraj Jitindar Singh wrote: > On POWER9 the hypervisor requires the guest to decide whether it would > like to use a hash or radix mmu model at the time it calls > ibm,client-architecture-support (CAS) based on what the hypervisor has > said it's allowed to do. It is possible to disable radix by passing > "disable_radix" on the command line. The next patch will add support for > the new CAS format, thus we need to parse the command line before calling > CAS so we can correctly select which mmu we would like to use. > > Signed-off-by: Suraj Jitindar Singh> Reviewed-by: Paul Mackerras > > --- > > V1 -> V3: > - Reword commit message for clarity. No functional change > --- Acked-by: Balbir Singh
Re: [PATCH v2] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails
Vipin K Parasharwrites: > Added check for OPAL_WRONG_STATE error code returned from OPAL. > Currently Linux flashes "unexpected error" over console for this > error. This will avoid throwing such message and return I/O error > for such OPAL failures. > > Signed-off-by: Vipin K Parashar > --- > Changes in v2: > - Added log message indicating sleeping/offline core >for OPAL_WRONG_STATE > > arch/powerpc/platforms/powernv/opal.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/powernv/opal.c > b/arch/powerpc/platforms/powernv/opal.c > index 86d9fde..8af230e 100644 > --- a/arch/powerpc/platforms/powernv/opal.c > +++ b/arch/powerpc/platforms/powernv/opal.c > @@ -869,8 +869,11 @@ int opal_error_code(int rc) > case OPAL_UNSUPPORTED: return -EIO; > case OPAL_HARDWARE: return -EIO; > case OPAL_INTERNAL_ERROR: return -EIO; > + case OPAL_WRONG_STATE: > + pr_notice("%s: Core sleeping/offline\n", __func__); > + return -EIO; Since this is part of opal_error_code() though, this will be printed for any OPAL call that returns that. Why not have the sensor code do this: rc = opal_sensor_read(foo) if (rc == OPAL_WRONG_STATE) return -EIO; else return oal_error_code(rc); ? > default: > - pr_err("%s: unexpected OPAL error %d\n", __func__, rc); > + pr_err("%s: Unexpected OPAL error %d\n", __func__, rc); Do we need this? -- Stewart Smith OPAL Architect, IBM.
Re: [Patch v4] powerpc/powernv: add hdat attribute to sysfs
Matt Brownwrites: > The HDAT data area is consumed by skiboot and turned into a device-tree. > In some cases we would like to look directly at the HDAT, so this patch > adds a sysfs node to allow it to be viewed. This is not possible through > /dev/mem as it is reserved memory which is stopped by the /dev/mem filter. > > Signed-off-by: Matt Brown > --- > Changes between v3 and v4: > - changed sysfs attribute permissions from 0444 to 0400 > - fixed makefile to be on same line > - fixed authorship/copyright info > - re-ordered includes > - changed hdat_info struct to a static struct > > --- > arch/powerpc/include/asm/opal.h| 1 + > arch/powerpc/platforms/powernv/Makefile| 2 +- > arch/powerpc/platforms/powernv/opal-hdat.c | 60 > ++ > arch/powerpc/platforms/powernv/opal.c | 2 + > 4 files changed, 64 insertions(+), 1 deletion(-) > create mode 100644 arch/powerpc/platforms/powernv/opal-hdat.c > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 5c7db0f..b26944e 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -277,6 +277,7 @@ extern int opal_async_comp_init(void); > extern int opal_sensor_init(void); > extern int opal_hmi_handler_init(void); > extern int opal_event_init(void); > +extern void opal_hdat_sysfs_init(void); > > extern int opal_machine_check(struct pt_regs *regs); > extern bool opal_mce_check_early_recovery(struct pt_regs *regs); > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index b5d98cb..3826b70 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o > opal-async.o idle.o > obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o > opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o > opal-sysparam.o opal-sensor.o > obj-y+= opal-msglog.o opal-hmi.o opal-power.o > opal-irqchip.o > -obj-y+= opal-kmsg.o > +obj-y+= opal-kmsg.o opal-hdat.o > > obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o > obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o > diff --git a/arch/powerpc/platforms/powernv/opal-hdat.c > b/arch/powerpc/platforms/powernv/opal-hdat.c > new file mode 100644 > index 000..19647fd > --- /dev/null > +++ b/arch/powerpc/platforms/powernv/opal-hdat.c > @@ -0,0 +1,60 @@ > +/* > + * PowerNV OPAL HDAT interface > + * > + * Copyright 2017, Matt Brown, IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#include > +#include > +#include > +#include > + > +static struct { > + char *base; > + u64 size; > +} hdat_info; > + > +/* Read function for HDAT attribute in sysfs */ > +static ssize_t hdat_read(struct file *file, struct kobject *kobj, > + struct bin_attribute *bin_attr, char *to, > + loff_t pos, size_t count) > +{ > + if (!hdat_info.base) > + return -ENODEV; > + > + return memory_read_from_buffer(to, count, , hdat_info.base, > + hdat_info.size); > +} > + > +/* HDAT attribute for sysfs */ > +static struct bin_attribute hdat_attr = { > + .attr = {.name = "hdat", .mode = 0400}, > + .read = hdat_read > +}; Why not BIN_ATTR_RO (like opal_export_symmap() does) ? (I have comments/thoughts on the OPAL side as well) -- Stewart Smith OPAL Architect, IBM.
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
On 02/28/2017 11:34 AM, Jason Baron wrote: On 02/28/2017 02:22 PM, David Daney wrote: On 02/28/2017 11:05 AM, David Daney wrote: On 02/28/2017 10:39 AM, Jason Baron wrote: [...] I suspect that the alignment of the __jump_table section in the .ko files is not correct, and you are seeing some sort of problem due to that. Hi, Yes, if you look at the trace that Sachin sent the module being loaded that does the WARN_ON() is nfsd.ko. That module from Sachin's trace has: [31] __jump_table PROGBITS 03fd77 c0 18 WAM 0 0 1 The problem is then the section alignment (last column) for power. On mips with no patches applied, we get: [17] __jump_table PROGBITS 00d2c0 48 00 WA 0 0 8 Look, proper alignment! The question I have is why do the power ".llong" and ".long" assembler directives not force section alignment? Is there an alternative that could be used that would result in the proper alignment? Would ".word" work? If not, then I would say patch only power with your balign thing. 8-byte alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel I think the proper fix is either: A) Modify scripts/module-common.lds to force __jump_table alignment for all architectures. B) Add arch/powerpc/kernel/module.lds to force __jump_table alignment for powerpc only. David. Ok, I can try adding it to the linger script. FWIW, here is my before and after with the .balign thing for the nfsd.ko module on powperc (using a cross-compiler): before: [31] __jump_table PROGBITS 03ee3e f0 00 WA 0 0 1 after: [31] __jump_table PROGBITS 03ee40 f0 00 WA 0 0 4 Try the (lightly tested) attached. If it works and Steven likes it, perhaps someone can merge it. David. >From 89d4deafbc920351b93afb1ac4b4124995e1f19d Mon Sep 17 00:00:00 2001 From: David DaneyDate: Tue, 28 Feb 2017 12:10:02 -0800 Subject: [PATCH] module: set __jump_table alignment to 8 For powerpc the __jump_table section in modules is not aligned, this causes an OOPS when loading a module containing a __jump_table. Fix by forcing __jump_table to 8, which is the same alignment used for this section in the kernel proper. Signed-off-by: David Daney --- scripts/module-common.lds | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scripts/module-common.lds b/scripts/module-common.lds index 73a2c7d..53234e8 100644 --- a/scripts/module-common.lds +++ b/scripts/module-common.lds @@ -19,4 +19,6 @@ SECTIONS { . = ALIGN(8); .init_array 0 : { *(SORT(.init_array.*)) *(.init_array) } + + __jump_table 0 : ALIGN(8) { KEEP(*(__jump_table)) } } -- 2.9.3
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
On 02/28/2017 02:22 PM, David Daney wrote: > On 02/28/2017 11:05 AM, David Daney wrote: >> On 02/28/2017 10:39 AM, Jason Baron wrote: >>> > [...] I suspect that the alignment of the __jump_table section in the .ko files is not correct, and you are seeing some sort of problem due to that. >>> >>> Hi, >>> >>> Yes, if you look at the trace that Sachin sent the module being loaded >>> that does the WARN_ON() is nfsd.ko. >>> >>> That module from Sachin's trace has: >>> >>> [31] __jump_table PROGBITS 03fd77 c0 >>> 18 WAM 0 0 1 >> >> The problem is then the section alignment (last column) for power. >> >> On mips with no patches applied, we get: >> >> [17] __jump_table PROGBITS 00d2c0 48 >> 00 WA 0 0 8 >> >> Look, proper alignment! >> >> The question I have is why do the power ".llong" and ".long" assembler >> directives not force section alignment? Is there an alternative that >> could be used that would result in the proper alignment? Would ".word" >> work? >> >> If not, then I would say patch only power with your balign thing. 8-byte >> alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel >> > > I think the proper fix is either: > > A) Modify scripts/module-common.lds to force __jump_table alignment for > all architectures. > > B) Add arch/powerpc/kernel/module.lds to force __jump_table alignment > for powerpc only. > > David. > > Ok, I can try adding it to the linger script. FWIW, here is my before and after with the .balign thing for the nfsd.ko module on powperc (using a cross-compiler): before: [31] __jump_table PROGBITS 03ee3e f0 00 WA 0 0 1 after: [31] __jump_table PROGBITS 03ee40 f0 00 WA 0 0 4 Thanks, -Jason > >> >>> >>> So its not the size but rather the start offset '03fd77', that is the >>> problem here. That is what the WARN_ON triggers on, that the start of >>> the table is not 4-byte aligned. >>> >>> Using a ppc cross-compiler and the ENTSIZE patch that line does not >>> change, however if I use the initial patch posted in this thread, the >>> start does align to 4-bytes and thus the warning goes away, as Sachin >>> verified. In fact, without the patch I found several modules that don't >>> start at the proper alignment, however with the patch that started this >>> thread they were all properly aligned. >>> >>> In terms of the '.balign' causing holes, we originally added the >>> '_ASM_ALIGN' to x86 for precisely this reason. See commit: >>> ef64789 jump label: Add _ASM_ALIGN for x86 and x86_64 and discussion. >>> >>> In addition, we have a lot of runtime with the .balign in the tree and >>> I'm not aware of any holes in the table. I think the code would blow up >>> pretty badly if there were. >>> >>> A number of arches were already using the '.balign', and the patch I >>> proposed simply added it to remaining ones, now that we added a >>> WARN_ON() to catch this condition. >>> >>> Thanks, >>> >>> -Jason >>> >>> >>> >>> >> >> >> ___ >> linux-arm-kernel mailing list >> linux-arm-ker...@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
On 02/28/2017 11:05 AM, David Daney wrote: On 02/28/2017 10:39 AM, Jason Baron wrote: [...] I suspect that the alignment of the __jump_table section in the .ko files is not correct, and you are seeing some sort of problem due to that. Hi, Yes, if you look at the trace that Sachin sent the module being loaded that does the WARN_ON() is nfsd.ko. That module from Sachin's trace has: [31] __jump_table PROGBITS 03fd77 c0 18 WAM 0 0 1 The problem is then the section alignment (last column) for power. On mips with no patches applied, we get: [17] __jump_table PROGBITS 00d2c0 48 00 WA 0 0 8 Look, proper alignment! The question I have is why do the power ".llong" and ".long" assembler directives not force section alignment? Is there an alternative that could be used that would result in the proper alignment? Would ".word" work? If not, then I would say patch only power with your balign thing. 8-byte alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel I think the proper fix is either: A) Modify scripts/module-common.lds to force __jump_table alignment for all architectures. B) Add arch/powerpc/kernel/module.lds to force __jump_table alignment for powerpc only. David. So its not the size but rather the start offset '03fd77', that is the problem here. That is what the WARN_ON triggers on, that the start of the table is not 4-byte aligned. Using a ppc cross-compiler and the ENTSIZE patch that line does not change, however if I use the initial patch posted in this thread, the start does align to 4-bytes and thus the warning goes away, as Sachin verified. In fact, without the patch I found several modules that don't start at the proper alignment, however with the patch that started this thread they were all properly aligned. In terms of the '.balign' causing holes, we originally added the '_ASM_ALIGN' to x86 for precisely this reason. See commit: ef64789 jump label: Add _ASM_ALIGN for x86 and x86_64 and discussion. In addition, we have a lot of runtime with the .balign in the tree and I'm not aware of any holes in the table. I think the code would blow up pretty badly if there were. A number of arches were already using the '.balign', and the patch I proposed simply added it to remaining ones, now that we added a WARN_ON() to catch this condition. Thanks, -Jason ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
On 02/28/2017 10:39 AM, Jason Baron wrote: On 02/28/2017 01:16 PM, David Daney wrote: On 02/28/2017 08:21 AM, Steven Rostedt wrote: On Tue, 28 Feb 2017 10:25:46 +0530 Sachin Santwrote: File: ./net/ipv4/xfrm4_input.o [12] __jump_table PROGBITS 000639 18 18 WAM 0 0 1 File: ./net/ipv4/udplite.o File: ./net/ipv4/xfrm4_output.o [ 9] __jump_table PROGBITS 000481 18 18 WAM 0 0 1 Looks like there's some issues right there. Those look good to me 18/18 = 1 with no remainder. The odd numbers are the offset of the section in the ELF file. If you look at the stack trace, it seems that it is during module loading. Are the primitives for generating the tables doing something different for the module case? I am not familiar enough with the powerpc ABIs to know. Try this: $ perl -n -e 's/\[ /\[/; my @f = split " "; print hex($f[5]) % 0x18 if $#f > 5; print $_' <~/jump_table.log There are no entries with size that is not a multiple of 0x18. I think my patch to add the ENTSIZE is not doing anything here. I suspect that the alignment of the __jump_table section in the .ko files is not correct, and you are seeing some sort of problem due to that. Hi, Yes, if you look at the trace that Sachin sent the module being loaded that does the WARN_ON() is nfsd.ko. That module from Sachin's trace has: [31] __jump_table PROGBITS 03fd77 c0 18 WAM 0 0 1 The problem is then the section alignment (last column) for power. On mips with no patches applied, we get: [17] __jump_table PROGBITS 00d2c0 48 00 WA 0 0 8 Look, proper alignment! The question I have is why do the power ".llong" and ".long" assembler directives not force section alignment? Is there an alternative that could be used that would result in the proper alignment? Would ".word" work? If not, then I would say patch only power with your balign thing. 8-byte alignment for 64-bit kernel, 4-byte alignment for 32-bit kernel So its not the size but rather the start offset '03fd77', that is the problem here. That is what the WARN_ON triggers on, that the start of the table is not 4-byte aligned. Using a ppc cross-compiler and the ENTSIZE patch that line does not change, however if I use the initial patch posted in this thread, the start does align to 4-bytes and thus the warning goes away, as Sachin verified. In fact, without the patch I found several modules that don't start at the proper alignment, however with the patch that started this thread they were all properly aligned. In terms of the '.balign' causing holes, we originally added the '_ASM_ALIGN' to x86 for precisely this reason. See commit: ef64789 jump label: Add _ASM_ALIGN for x86 and x86_64 and discussion. In addition, we have a lot of runtime with the .balign in the tree and I'm not aware of any holes in the table. I think the code would blow up pretty badly if there were. A number of arches were already using the '.balign', and the patch I proposed simply added it to remaining ones, now that we added a WARN_ON() to catch this condition. Thanks, -Jason
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
On 02/28/2017 01:16 PM, David Daney wrote: On 02/28/2017 08:21 AM, Steven Rostedt wrote: On Tue, 28 Feb 2017 10:25:46 +0530 Sachin Santwrote: File: ./net/ipv4/xfrm4_input.o [12] __jump_table PROGBITS 000639 18 18 WAM 0 0 1 File: ./net/ipv4/udplite.o File: ./net/ipv4/xfrm4_output.o [ 9] __jump_table PROGBITS 000481 18 18 WAM 0 0 1 Looks like there's some issues right there. Those look good to me 18/18 = 1 with no remainder. The odd numbers are the offset of the section in the ELF file. If you look at the stack trace, it seems that it is during module loading. Are the primitives for generating the tables doing something different for the module case? I am not familiar enough with the powerpc ABIs to know. Try this: $ perl -n -e 's/\[ /\[/; my @f = split " "; print hex($f[5]) % 0x18 if $#f > 5; print $_' <~/jump_table.log There are no entries with size that is not a multiple of 0x18. I think my patch to add the ENTSIZE is not doing anything here. I suspect that the alignment of the __jump_table section in the .ko files is not correct, and you are seeing some sort of problem due to that. Hi, Yes, if you look at the trace that Sachin sent the module being loaded that does the WARN_ON() is nfsd.ko. That module from Sachin's trace has: [31] __jump_table PROGBITS 03fd77 c0 18 WAM 0 0 1 So its not the size but rather the start offset '03fd77', that is the problem here. That is what the WARN_ON triggers on, that the start of the table is not 4-byte aligned. Using a ppc cross-compiler and the ENTSIZE patch that line does not change, however if I use the initial patch posted in this thread, the start does align to 4-bytes and thus the warning goes away, as Sachin verified. In fact, without the patch I found several modules that don't start at the proper alignment, however with the patch that started this thread they were all properly aligned. In terms of the '.balign' causing holes, we originally added the '_ASM_ALIGN' to x86 for precisely this reason. See commit: ef64789 jump label: Add _ASM_ALIGN for x86 and x86_64 and discussion. In addition, we have a lot of runtime with the .balign in the tree and I'm not aware of any holes in the table. I think the code would blow up pretty badly if there were. A number of arches were already using the '.balign', and the patch I proposed simply added it to remaining ones, now that we added a WARN_ON() to catch this condition. Thanks, -Jason
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
On 02/28/2017 08:21 AM, Steven Rostedt wrote: On Tue, 28 Feb 2017 10:25:46 +0530 Sachin Santwrote: File: ./net/ipv4/xfrm4_input.o [12] __jump_table PROGBITS 000639 18 18 WAM 0 0 1 File: ./net/ipv4/udplite.o File: ./net/ipv4/xfrm4_output.o [ 9] __jump_table PROGBITS 000481 18 18 WAM 0 0 1 Looks like there's some issues right there. Those look good to me 18/18 = 1 with no remainder. The odd numbers are the offset of the section in the ELF file. If you look at the stack trace, it seems that it is during module loading. Are the primitives for generating the tables doing something different for the module case? I am not familiar enough with the powerpc ABIs to know. Try this: $ perl -n -e 's/\[ /\[/; my @f = split " "; print hex($f[5]) % 0x18 if $#f > 5; print $_' <~/jump_table.log There are no entries with size that is not a multiple of 0x18. I think my patch to add the ENTSIZE is not doing anything here. I suspect that the alignment of the __jump_table section in the .ko files is not correct, and you are seeing some sort of problem due to that. -- Steve
Re: [PATCH 5/5] powerpc/64s: fix POWER9 machine check handler from stop state
Hi Nick, On Fri, Feb 17, 2017 at 12:09 AM, Nicholas Pigginwrote: > The ISA specifies power save wakeup can cause a machine check interrupt. > The machine check handler currently has code to handle that for POWER8, > but POWER9 crashes when trying to execute the P8 style sleep > instructions. > > So queue up the machine check, then call into the idle code to wake up > as the system reset interrupt does, rather than attempting to sleep > again without going through the main idle path. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/kernel/exceptions-64s.S | 71 > ++-- > 1 file changed, 36 insertions(+), 35 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 5f775783f744..0388843c8d12 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -329,6 +329,35 @@ EXC_COMMON_BEGIN(machine_check_common) > /* restore original r1. */ \ > ld r1,GPR1(r1) > > +#ifdef CONFIG_PPC_P7_NAP > +EXC_COMMON_BEGIN(machine_check_idle_common) > + bl machine_check_queue_event > + /* > +* Queue the machine check, then reload SRR1 and use it to set > +* CR3 according to pnv_powersave_wakeup convention. > +*/ > + ld r12,_MSR(r1) > + rlwinm r11,r12,47-31,30,31 > + cmpwi cr3,r11,2 > + > + /* > +* Now have to make SRR1 wake up reason look like a system reset > +* interrupt. Put 0xf in there, which is reserved (and does not > +* match HMI). The only places where the wakeup reason is presently checked on the way out of idle-exit are in KVM guest entry path in kvmppc_check_wake_reason() and on the CPU-Hotplug exit path pnv_smp_cpu_kill_self(). In both places we do a positive check for EE, Doorbell, HVEE . The kvm case is also interested in HMI. We ignore all the other reasons at the moment. So this should be fine. > +*/ > + li r11,0xf > + insrdi r12,r11,4,45 Shouldn't this be insrdi r12,r11,4,42? The exception bits are 42:45. I always have trouble wrapping my head around these nifty rotate-shift-mask-insert instructions. So I might as well be wrong! > + mtspr r12,SPRN_SRR1 > + std r12,_MSR(r1) > + > + /* > +* Decrement MCE nesting after finishing with the stack. > +*/ > + lhz r11,PACA_IN_MCE(r13) > + subir11,r11,1 > + sth r11,PACA_IN_MCE(r13) > + b pnv_powersave_wakeup > +#endif > /* > * Handle machine check early in real mode. We come here with > * ME=1, MMU (IR=0 and DR=0) off and using MC emergency stack. > @@ -341,6 +370,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early) > bl machine_check_early > std r3,RESULT(r1) /* Save result */ > ld r12,_MSR(r1) > + > #ifdef CONFIG_PPC_P7_NAP > /* > * Check if thread was in power saving mode. We come here when any > @@ -351,43 +381,14 @@ EXC_COMMON_BEGIN(machine_check_handle_early) > * > * Go back to nap/sleep/winkle mode again if (b) is true. > */ > - rlwinm. r11,r12,47-31,30,31 /* Was it in power saving mode? */ > - beq 4f /* No, it wasn't */ > - /* Thread was in power saving mode. Go back to nap again. */ > - cmpwi r11,2 > - blt 3f > - /* Supervisor/Hypervisor state loss */ > - li r0,1 > - stb r0,PACA_NAPSTATELOST(r13) > -3: bl machine_check_queue_event > - MACHINE_CHECK_HANDLER_WINDUP > - GET_PACA(r13) > - ld r1,PACAR1(r13) > - /* > -* Check what idle state this CPU was in and go back to same mode > -* again. > -*/ > - lbz r3,PACA_THREAD_IDLE_STATE(r13) > - cmpwi r3,PNV_THREAD_NAP > - bgt 10f > - IDLE_STATE_ENTER_SEQ(PPC_NAP) > - /* No return */ > -10: > - cmpwi r3,PNV_THREAD_SLEEP > - bgt 2f > - IDLE_STATE_ENTER_SEQ(PPC_SLEEP) > - /* No return */ > - > -2: > - /* > -* Go back to winkle. Please note that this thread was woken up in > -* machine check from winkle and have not restored the per-subcore > -* state. > -*/ > - IDLE_STATE_ENTER_SEQ(PPC_WINKLE) > - /* No return */ > + BEGIN_FTR_SECTION > + rlwinm. r11,r12,47-31,30,31 > + beq-4f > + BRANCH_TO_COMMON(r10, machine_check_idle_common) > 4: > + END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) > #endif > + > /* > * Check if we are coming from hypervisor userspace. If yes then we > * continue in host kernel in V mode to deliver the MC event. > -- > 2.11.0 > Otherwise, the patch looks correct to me. Reviewed-by: Gautham R. Shenoy --
Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes
On Tue, 28 Feb 2017 10:25:46 +0530 Sachin Santwrote: > File: ./net/ipv4/xfrm4_input.o > [12] __jump_table PROGBITS 000639 18 18 > WAM 0 0 1 > File: ./net/ipv4/udplite.o > File: ./net/ipv4/xfrm4_output.o > [ 9] __jump_table PROGBITS 000481 18 18 > WAM 0 0 1 Looks like there's some issues right there. -- Steve
Re: [PATCH 2/5] powerpc/64: stop using bit in HSPRG0 to test winkle
On Tue, 28 Feb 2017 20:38:19 +0530 Gautham R Shenoywrote: > Hi Nick, > > On Fri, Feb 17, 2017 at 12:08 AM, Nicholas Piggin wrote: > > The POWER8 idle code has a neat trick of programming the power on engine > > to restore a low bit into HSPRG0, so idle wakeup code can test and see > > if it has been programmed this way and therefore lost all state. Restore > > time can be reduced if winkle has not been reached. > > > > > However this messes with our r13 PACA pointer, and requires HSPRG0 to > > be written to. It also optimizes the slowest and most uncommon case at > > the expense of another SPR write in the common nap state wakeup. > > Actually, this optimization was needed to reduce the guest entry time > for a KVM guest vcore. > > While running KVM on POWER8, the host is in SMT=1 mode with the > secondary threads > being hotplugged out and supposed to have entered the winkle state. > Since the primary thread is still running, the core would still be in Nap. > > However, each time a guest vcore is scheduled on the core, the > secondary threads are sent an IPI to wake them up from the idle state. > > On waking up, they use the last bit of HSPRG0 and conclude that they > don't need to restore resources before executing the guest entry code. > > Thus, the HSPRG0 trick help the secondary threads avoid spending > extra cycles restoring the SLBs and per-thread SPRs. Ah, thanks. Until now I didn't really understand why that case was optimimized. My mistake and apologies for assuming it was not a good reason :) > > Remove this complexity and assume winkle sleeps always require a state > > restore. This speedup could be made entirely contained within the winkle > > idle code by counting per-core winkles and setting a thread bitmap when > > all have gone to winkle. > > This is a good idea! We are anyway taking the lock in pnv_wakeup_tb_loss() > to check if we are the first thread in the core waking up from a deep-state. > > We can check another bitmap if we are the first thread waking up from winkle > and set cr4 to the result of that comparison while holding the lock. > > That shouldn't cost us much, at least for the fake-wakeup-from-winkle > for KVM case alluded to above. > > Otherwise, the patch looks good to me. Okay, I had a half-done patch for this. I'll finish it and send it out for review. Thanks, Nick
Re: [PATCH 4/5] powerpc/64s: consolidate pnv_restore_hyp_resource into pnv_powersave_wakeup
Hi Nick, On Fri, Feb 17, 2017 at 12:09 AM, Nicholas Pigginwrote: > There is only one caller, so this reduces spaghetti of subsequent > callees returning into the caller. > > Signed-off-by: Nicholas Piggin This patch is good! Reviewed-by: Gautham R. Shenoy -- Thanks and Regards gautham.
Re: [PATCH 3/5] powerpc/64s: use alternative feature patching
Hi Nick, On Fri, Feb 17, 2017 at 12:08 AM, Nicholas Pigginwrote: > This reduces the number of nops for POWER8 > > Signed-off-by: Nicholas Piggin This change looks ok to me. Reviewed-by: Gautham R. Shenoy > --- > arch/powerpc/kernel/idle_book3s.S | 15 +++ > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/kernel/idle_book3s.S > b/arch/powerpc/kernel/idle_book3s.S > index 1271344e5523..ab15dee371c9 100644 > --- a/arch/powerpc/kernel/idle_book3s.S > +++ b/arch/powerpc/kernel/idle_book3s.S > @@ -417,13 +417,8 @@ BEGIN_FTR_SECTION > rldicl r5,r5,4,60 > cmpdcr4,r5,r4 > bge cr4,pnv_wakeup_tb_loss > - /* > -* Waking up without hypervisor state loss. Return to > -* reset vector > -*/ > - blr > > -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > +FTR_SECTION_ELSE > > /* > * POWER ISA 2.07 or less. > @@ -440,9 +435,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > * indicates we are waking with hypervisor state loss from nap. > */ > bgt cr3,. > +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) > > - blr /* Return back to System Reset vector from where > - pnv_restore_hyp_resource was invoked */ > + /* > +* Waking up without hypervisor state loss. Return to > +* reset vector > +*/ > + blr > > /* > * Called if waking up from idle state which can cause either partial or > -- > 2.11.0 > -- Thanks and Regards gautham.
Re: [PATCH 2/5] powerpc/64: stop using bit in HSPRG0 to test winkle
Hi Nick, On Fri, Feb 17, 2017 at 12:08 AM, Nicholas Pigginwrote: > The POWER8 idle code has a neat trick of programming the power on engine > to restore a low bit into HSPRG0, so idle wakeup code can test and see > if it has been programmed this way and therefore lost all state. Restore > time can be reduced if winkle has not been reached. > > However this messes with our r13 PACA pointer, and requires HSPRG0 to > be written to. It also optimizes the slowest and most uncommon case at > the expense of another SPR write in the common nap state wakeup. Actually, this optimization was needed to reduce the guest entry time for a KVM guest vcore. While running KVM on POWER8, the host is in SMT=1 mode with the secondary threads being hotplugged out and supposed to have entered the winkle state. Since the primary thread is still running, the core would still be in Nap. However, each time a guest vcore is scheduled on the core, the secondary threads are sent an IPI to wake them up from the idle state. On waking up, they use the last bit of HSPRG0 and conclude that they don't need to restore resources before executing the guest entry code. Thus, the HSPRG0 trick help the secondary threads avoid spending extra cycles restoring the SLBs and per-thread SPRs. > > Remove this complexity and assume winkle sleeps always require a state > restore. This speedup could be made entirely contained within the winkle > idle code by counting per-core winkles and setting a thread bitmap when > all have gone to winkle. This is a good idea! We are anyway taking the lock in pnv_wakeup_tb_loss() to check if we are the first thread in the core waking up from a deep-state. We can check another bitmap if we are the first thread waking up from winkle and set cr4 to the result of that comparison while holding the lock. That shouldn't cost us much, at least for the fake-wakeup-from-winkle for KVM case alluded to above. Otherwise, the patch looks good to me. > > Signed-off-by: Nicholas Piggin -- Thanks and Regards gautham.
Re: [PATCH] powerpc: booke: fix boot crash due to null hugepd
Hi, Some more information on the crash, inline. On 02/17/2017 02:18 PM, Aneesh Kumar K.V wrote: > laurentiu.tu...@nxp.com writes: > >> From: Laurentiu Tudor>> >> On 32-bit book-e machines, hugepd_ok() does not take >> into account null hugepd values, causing this crash at boot: >> >> Unable to handle kernel paging request for data at address 0x8000 >> Faulting instruction address: 0xc00182a8 >> Oops: Kernel access of bad area, sig: 11 [#1] >> SMP NR_CPUS=24 >> CoreNet Generic >> Modules linked in: >> CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW >> 4.10.0-rc8-00016-g69b1f87 #11 >> task: e505 task.stack: e5058000 >> NIP: c00182a8 LR: c001829c CTR: 7ffe >> REGS: e5059c50 TRAP: 0300 Tainted: GW >> (4.10.0-rc8-00016-g69b1f87) >> MSR: 00021002 >>CR: 88428e82 XER: >> DEAR: 8000 ESR: >> GPR00: c0107510 e5059d00 e505 8000 bff1 e5059d0c e5059d08 >> 2017 >> GPR08: 28428e82 c00027d0 >> >> GPR16: 88a28e82 2000 48422e82 88a28e84 >> dd004000 >> GPR24: e5059e38 bff1 dd004000 0001 00029002 >> bff1 >> NIP [c00182a8] follow_huge_addr+0x38/0xf0 >> LR [c001829c] follow_huge_addr+0x2c/0xf0 >> Call Trace: >> [e5059d00] [e5059d00] 0xe5059d00 (unreliable) >> [e5059d20] [c0107510] follow_page_mask+0x40/0x3c0 >> [e5059d80] [c0107958] __get_user_pages+0xc8/0x420 >> [e5059de0] [c010817c] get_user_pages_remote+0x8c/0x230 >> [e5059e30] [c013f170] copy_strings+0x110/0x3a0 >> [e5059ea0] [c013f42c] copy_strings_kernel+0x2c/0x50 >> [e5059ec0] [c0141324] do_execveat_common+0x474/0x620 >> [e5059f10] [c01414fc] do_execve+0x2c/0x40 >> [e5059f20] [c0001f68] try_to_run_init_process+0x18/0x60 >> [e5059f30] [c000289c] kernel_init+0xcc/0x120 >> [e5059f40] [c000f1e8] ret_from_kernel_thread+0x5c/0x64 >> Instruction dump: >> bfc10018 7c9f2378 90010024 7fc000a6 7c000146 80630020 38a1000c 38c10008 >> 4bfff869 2c03 41c20090 81210008 <8143> 81630004 3860ffea 2f89 >> ---[ end trace 4bf94e15fd9fa824 ]--- > > > Which code path is that. That null should be filtered by the if > (pmd_none(pmd)) check in find_linux_pte_or_hugepte right ? The crash happens when __find_linux_pte_or_hugepte() calls hugepd_ok(), on this line [1]. It's triggered when __find_linux_pte_or_hugepte() is first called, when the kernel tries to spawn the init process. The input effective address (ea arg) is bff1. This is the call stack: [e5059cd0] [c0017b60] __find_linux_pte_or_hugepte+0x60/0x120 (unreliable) [e5059d00] [c001832c] follow_huge_addr+0x2c/0xf0 [e5059d20] [c0107590] follow_page_mask+0x40/0x3c0 [e5059d80] [c01079d8] __get_user_pages+0xc8/0x420 [e5059de0] [c01081fc] get_user_pages_remote+0x8c/0x230 [e5059e30] [c013f210] copy_strings+0x110/0x3a0 [e5059ea0] [c013f4cc] copy_strings_kernel+0x2c/0x50 [e5059ec0] [c01413c4] do_execveat_common+0x474/0x620 [e5059f10] [c014159c] do_execve+0x2c/0x40 [e5059f20] [c0001f68] try_to_run_init_process+0x18/0x60 [e5059f30] [c000289c] kernel_init+0xcc/0x120 [e5059f40] [c000f1e8] ret_from_kernel_thread+0x5c/0x64 Thanks in advance for any pointers. [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/mm/hugetlbpage.c#n918 --- Best Regards, Laurentiu
Re: [PATCH v3 1/2] powerpc: split ftrace bits into a separate file
On Tue, 28 Feb 2017 15:04:15 +1100 Michael Ellermanwrote: kernel/trace/ftrace.c more obvious. > > I don't know if it's really worth keeping the names the same across > arches, especially as we already have: > > arch/arm64/kernel/entry-ftrace.S > arch/arm/kernel/entry-ftrace.S > arch/blackfin/kernel/ftrace-entry.S > arch/metag/kernel/ftrace_stub.S > > But we can rename it if you feel strongly about it. Hmm, perhaps "entry-ftrace.S" would be the better name. I never liked the "mcount.S" name. -- Steve
Re: [bug report] v4.10.1 Oops on P1010 - huge tlb - there's already a fix ...
Hi Wolfgang, Thanks for reporting! On 02/28/2017 01:47 PM, Wolfgang Ocker wrote:> On Tue, 2017-02-28 at 10:57 +0100, Wolfgang Ocker wrote: >> With kernel v4.10.1 and huge tlb enabled (CONFIG_HUGETLBFS=y) I see >> the >> following oops on a P1010: > > Just saw that there is already a fix: > The fix is not accepted but you can use it until we come up with a better one. Please note that 4.7 & 4.9 stable kernels are also impacted so if you plan to use them you'll need the fix. --- Best Regards, Laurentiu
Re: [bug report] v4.10.1 Oops on P1010 - huge tlb - there's already a fix ...
On Tue, 2017-02-28 at 10:57 +0100, Wolfgang Ocker wrote: > With kernel v4.10.1 and huge tlb enabled (CONFIG_HUGETLBFS=y) I see > the > following oops on a P1010: Just saw that there is already a fix: https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/154204.html Thanks!
Re: [PATCH 1/5] powerpc/64s: move remaining system reset idle code into idle_book3s.S
Hi Nick, This patch is fine by me. Reviewed-by: Gautham R. ShenoyOn Fri, Feb 17, 2017 at 12:08 AM, Nicholas Piggin wrote: > Should be no functional change. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/kernel/exceptions-64s.S | 26 +--- > arch/powerpc/kernel/idle_book3s.S| 39 > +++- > 2 files changed, 35 insertions(+), 30 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 0e1350b30160..d0d89047befe 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -130,31 +130,7 @@ EXC_VIRT_NONE(0x4100, 0x4200) > > #ifdef CONFIG_PPC_P7_NAP > EXC_COMMON_BEGIN(system_reset_idle_common) > -BEGIN_FTR_SECTION > - GET_PACA(r13) /* Restore HSPRG0 to get the winkle bit in r13 */ > -END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) > - bl pnv_restore_hyp_resource > - > - li r0,PNV_THREAD_RUNNING > - stb r0,PACA_THREAD_IDLE_STATE(r13) /* Clear thread state */ > - > -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > - li r0,KVM_HWTHREAD_IN_KERNEL > - stb r0,HSTATE_HWTHREAD_STATE(r13) > - /* Order setting hwthread_state vs. testing hwthread_req */ > - sync > - lbz r0,HSTATE_HWTHREAD_REQ(r13) > - cmpwi r0,0 > - beq 1f > - b kvm_start_guest > -1: > -#endif > - > - /* Return SRR1 from power7_nap() */ > - mfspr r3,SPRN_SRR1 > - blt cr3,2f > - b pnv_wakeup_loss > -2: b pnv_wakeup_noloss > + b pnv_powersave_wakeup > #endif > > EXC_COMMON_BEGIN(system_reset_common) > diff --git a/arch/powerpc/kernel/idle_book3s.S > b/arch/powerpc/kernel/idle_book3s.S > index 72dac0b58061..ea3562f83c57 100644 > --- a/arch/powerpc/kernel/idle_book3s.S > +++ b/arch/powerpc/kernel/idle_book3s.S > @@ -115,7 +115,7 @@ core_idle_lock_held: > * > * Address to 'rfid' to in r5 > */ > -_GLOBAL(pnv_powersave_common) > +pnv_powersave_common: > /* Use r3 to pass state nap/sleep/winkle */ > /* NAP is a state loss, we create a regs frame on the > * stack, fill it up with the state we care about and > @@ -365,6 +365,34 @@ _GLOBAL(power9_idle_stop) > LOAD_REG_ADDR(r5,power_enter_stop) > b pnv_powersave_common > /* No return */ > + > +.global pnv_powersave_wakeup > +pnv_powersave_wakeup: > +BEGIN_FTR_SECTION > + GET_PACA(r13) /* Restore HSPRG0 to get the winkle bit in r13 */ > +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) > + bl pnv_restore_hyp_resource > + > + li r0,PNV_THREAD_RUNNING > + stb r0,PACA_THREAD_IDLE_STATE(r13) /* Clear thread state */ > + > +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > + li r0,KVM_HWTHREAD_IN_KERNEL > + stb r0,HSTATE_HWTHREAD_STATE(r13) > + /* Order setting hwthread_state vs. testing hwthread_req */ > + sync > + lbz r0,HSTATE_HWTHREAD_REQ(r13) > + cmpwi r0,0 > + beq 1f > + b kvm_start_guest > +1: > +#endif > + > + /* Return SRR1 from power7_nap() */ > + mfspr r3,SPRN_SRR1 > + blt cr3,pnv_wakeup_noloss > + b pnv_wakeup_loss > + > /* > * Called from reset vector. Check whether we have woken up with > * hypervisor state loss. If yes, restore hypervisor state and return > @@ -373,7 +401,7 @@ _GLOBAL(power9_idle_stop) > * r13 - Contents of HSPRG0 > * cr3 - set to gt if waking up with partial/complete hypervisor state loss > */ > -_GLOBAL(pnv_restore_hyp_resource) > +pnv_restore_hyp_resource: > BEGIN_FTR_SECTION > ld r2,PACATOC(r13); > /* > @@ -436,7 +464,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > * cr3 - gt if waking up with partial/complete hypervisor state loss > * cr4 - gt or eq if waking up from complete hypervisor state loss. > */ > -_GLOBAL(pnv_wakeup_tb_loss) > +pnv_wakeup_tb_loss: > ld r1,PACAR1(r13) > /* > * Before entering any idle state, the NVGPRs are saved in the stack > @@ -640,7 +668,8 @@ fastsleep_workaround_at_exit: > * R3 here contains the value that will be returned to the caller > * of power7_nap. > */ > -_GLOBAL(pnv_wakeup_loss) > +.global pnv_wakeup_loss > +pnv_wakeup_loss: > ld r1,PACAR1(r13) > BEGIN_FTR_SECTION > CHECK_HMI_INTERRUPT > @@ -660,7 +689,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) > * R3 here contains the value that will be returned to the caller > * of power7_nap. > */ > -_GLOBAL(pnv_wakeup_noloss) > +pnv_wakeup_noloss: > lbz r0,PACA_NAPSTATELOST(r13) > cmpwi r0,0 > bne pnv_wakeup_loss > -- > 2.11.0 > -- Thanks and Regards gautham.
[bug report] v4.10.1 Oops on P1010 - huge tlb
With kernel v4.10.1 and huge tlb enabled (CONFIG_HUGETLBFS=y) I see the following oops on a P1010: Freeing unused kernel memory: 428K This architecture does not have kernel memory protection. Unable to handle kernel paging request for data at address 0x8000 Faulting instruction address: 0xc001b3a0 Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT SMP NR_CPUS=8 DSP01 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 4.10.1-weo-00024-g4de55c3 #1 task: eb0e4000 task.stack: eb0e2000 NIP: c001b3a0 LR: c001b390 CTR: 7ffe REGS: eb0e3c50 TRAP: 0300 Tainted: GW (4.10.1-weo-00024-g4de55c3) MSR: 00021000CR: 88428022 XER: DEAR: 8000 ESR: GPR00: c001b390 eb0e3d00 eb0e4000 8000 bff1 eb0e3d0c eb0e3d08 GPR08: c097 0080 28428022 c0002a5c GPR16: eb0e3e48 0001 88002022 2000 48422022 88002024 ef0a8000 GPR24: eb0e3e48 bff1 0001 bff1 00029000 ffea NIP [c001b3a0] follow_huge_addr+0x54/0x114 LR [c001b390] follow_huge_addr+0x44/0x114 Call Trace: [eb0e3d00] [c001b390] follow_huge_addr+0x44/0x114 (unreliable) [eb0e3d30] [c014ffbc] follow_page_mask+0x4c/0x4a4 [eb0e3d90] [c01504e4] __get_user_pages+0xd0/0x3d8 [eb0e3df0] [c0150ca0] get_user_pages_remote+0x98/0x238 [eb0e3e40] [c018abf4] copy_strings+0x108/0x2bc [eb0e3ea0] [c018ade0] copy_strings_kernel+0x38/0x50 [eb0e3ec0] [c018ce00] do_execveat_common+0x468/0x6c4 [eb0e3f10] [c018d094] do_execve+0x38/0x48 [eb0e3f20] [c00021cc] try_to_run_init_process+0x24/0x64 [eb0e3f30] [c0002b24] kernel_init+0xc8/0x11c [eb0e3f40] [c00112d0] ret_from_kernel_thread+0x5c/0x64 --- interrupt: 0 at (null) LR = (null) Instruction dump: 7fc000a6 7c000146 480edde9 807f0020 7fa4eb78 38a1000c 38c10008 4bfff851 2c03 418200bc 81410008 3be0ffea <80c3> 80e30004 2f8a 409e002c ---[ end trace 79d29a00645dff1b ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b Line numbers: NIP [c001b3a0] follow_huge_addr+0x54/0x114 linux/include/linux/compiler.h:243 [eb0e3d00] [c001b390] follow_huge_addr+0x44/0x114 (unreliable) linux/arch/powerpc/mm/hugetlbpage.c:635 [eb0e3d30] [c014ffbc] follow_page_mask+0x4c/0x4a4 linux/mm/gup.c:238 The problem does not occur with 4.9.x. It does not occure with 4.10.1 and CONFIG_HUGETLBFS disabled. The problem does not occur after reverting the following patches: "powerpc: get hugetlbpage handling more generic" (03bb2d65900c) "powerpc/8xx: Implement support of hugepages" (4b9142869947) "powerpc: Fix pgtable pmd cache init" (bf5ca68dd2ee) "powerpc/mm/hugetlb: Don't panic when we don't find the default huge page size" (ff8b85796dad) "powerpc/mm: Fix little-endian 4K hugetlb" (20717e1ff526) Best regards, Wolfgang
Re: [PATCH] staging: fsl-mc: fix warning in DT ranges parser
Hi Arnd, On 02/27/2017 10:42 PM, Arnd Bergmann wrote: > The fsl-mc-bus driver in staging contains a copy of the standard > 'ranges' property parsing algorithm with a hack to treat a missing > property the same way as an empty one. This code produces false-positive > warnings for me in an allmodconfig build: > > drivers/staging/fsl-mc/bus/fsl-mc-bus.c: In function 'fsl_mc_bus_probe': > drivers/staging/fsl-mc/bus/fsl-mc-bus.c:645:6: error: 'mc_size_cells' may be > used uninitialized in this function [-Werror=maybe-uninitialized] > drivers/staging/fsl-mc/bus/fsl-mc-bus.c:682:8: error: 'mc_addr_cells' may be > used uninitialized in this function [-Werror=maybe-uninitialized] > drivers/staging/fsl-mc/bus/fsl-mc-bus.c:644:6: note: 'mc_addr_cells' was > declared here > drivers/staging/fsl-mc/bus/fsl-mc-bus.c:684:8: error: 'paddr_cells' may be > used uninitialized in this function [-Werror=maybe-uninitialized] > drivers/staging/fsl-mc/bus/fsl-mc-bus.c:643:6: note: 'paddr_cells' was > declared here > > To avoid the warnings, I'm simplifying the argument handling to pass > the number of valid ranges in the property as the function return code > rather than passing it by reference. With this change, gcc can see that > we don't evaluate the cell numbers for an missing ranges property. > > Signed-off-by: Arnd BergmannLooks good to me, i've tested it and did not see any issues, so here's an: Acked-by: Laurentiu Tudor --- Thanks & Best Regards, Laurentiu
Re: [PATCH] kbuild: avoid unrecognized option error for external DTC
Ben Hutchingswrites: > [ Unknown signature status ] > On Mon, 2017-02-27 at 14:40 +0900, Masahiro Yamada wrote: >> Since commit 6b22b3d1614a ("kbuild: Allow using host dtc instead of >> kernel's copy"), it is possible to use an external dtc. In this >> case, we do not know which options are supported on it. >> >> Commit bc553986a2f7 ("dtc: turn off dtc unit address warnings by >> default") gives -Wno-unit_address_vs_reg, but this options is only >> recognized by v1.4.2 or later. >> >> If an older version is specified, the build fails: > > But the option to use an external dtc was intended to allow testing of > newer versions. If there's no reason to use this option to run an > older version, why bother trying to support that? +1 That's a lot of added complexity, when the answer could just be "use the kernel dtc". cheers
[PATCH v2] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails
Added check for OPAL_WRONG_STATE error code returned from OPAL. Currently Linux flashes "unexpected error" over console for this error. This will avoid throwing such message and return I/O error for such OPAL failures. Signed-off-by: Vipin K Parashar--- Changes in v2: - Added log message indicating sleeping/offline core for OPAL_WRONG_STATE arch/powerpc/platforms/powernv/opal.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c index 86d9fde..8af230e 100644 --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -869,8 +869,11 @@ int opal_error_code(int rc) case OPAL_UNSUPPORTED: return -EIO; case OPAL_HARDWARE: return -EIO; case OPAL_INTERNAL_ERROR: return -EIO; + case OPAL_WRONG_STATE: + pr_notice("%s: Core sleeping/offline\n", __func__); + return -EIO; default: - pr_err("%s: unexpected OPAL error %d\n", __func__, rc); + pr_err("%s: Unexpected OPAL error %d\n", __func__, rc); return -EIO; } } -- 2.7.4
Re: [PATCH] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails
Thanks!! for review. Sending out v2 with suggested changes. On Thursday 23 February 2017 09:22 AM, Stewart Smith wrote: Michael Ellermanwrites: Stewart Smith writes: Vipin K Parashar writes: On Monday 13 February 2017 06:13 AM, Michael Ellerman wrote: Vipin K Parashar writes: OPAL returns OPAL_WRONG_STATE for XSCOM operations done to read any core FIR which is sleeping, offline. OK. Do we know why Linux is causing that to happen? This issue is originally seen upon running STAF (Software Test Automation Framework) stress tests and off-lining some cores with stress tests running. It can also be re-created after off-lining few cores and following one of below methods. 1. Executing Linux "sensors" command 2. Reading contents of file /sys/class/hwmon/hwmon0/tempX_input, where X is offline CPU. Its "opal_get_sensor_data" Linux API that that triggers OPAL call "opal_sensor_read", performing XSCOM ops here. If core is found sleeping/offline Linux throws up "opal_error_code: Unexpected OPAL error" error onto console. Currently Linux isn't aware about OPAL_WRONG_STATE return code from OPAL. Thus it prints "Unexpected OPAL error" message, same as it would log for any unknown OPAL return codes. Seeing this error over console has been a concern for Test and would puzzle real user as well. This patch makes Linux aware about OPAL_WRONG_STATE return code from OPAL and stops printing "Unexpected OPAL error" message onto console for OPAL fails with OPAL_WRONG_STATE Ahh... so this is a DTS sensor, which indeed is just XSCOMs and we return the xscom_read return code in event of error. I would argue that converting to EIO in that instance is probably correct... or EAGAIN? EAGAIN may be more correct in the situation where the core is just sleeping. What kind of offlining are you doing? Arguably, the correct behaviour would be to remove said sensors when the core is offline. Right, that would be ideal. There appear to be at least two other hwmon drivers that are CPU hotplug aware (coretemp and via-cputemp). But perhaps it's not possible to work out which sensors are attached to which CPU etc., I haven't looked in detail. Each core-temp@ sensor has a ibm,pir property, so linking back to what core shouldn't be too hard. For mem-temp@ sensors, we have the chip-id. In that case changing just opal_get_sensor_data() to handle OPAL_WRONG_STATE would be OK, with a comment explaining that we might be asked to read a sensor on an offline CPU and we aren't able to detect that. Agree.
Re: [PATCH 1/3] powerpc/64s: fix handling of non-synchronous machine checks
On Tue, 28 Feb 2017 11:27:29 +0530 Mahesh Jagannath Salgaonkarwrote: > On 02/28/2017 07:30 AM, Nicholas Piggin wrote: > > A synchronous machine check is an exception raised by the attempt to > > execute the current instruction. If the error can't be corrected, it > > can make sense to SIGBUS the currently running process. > > > > In other cases, the error condition is not related to the current > > instruction, so killing the current process is not the right thing to > > do. > > > > Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no > > practical change. It will be used to handle POWER9 asynchronous > > machine checks. > > > > Signed-off-by: Nicholas Piggin > > --- > > arch/powerpc/platforms/powernv/opal.c | 21 ++--- > > 1 file changed, 6 insertions(+), 15 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/opal.c > > b/arch/powerpc/platforms/powernv/opal.c > > index 86d9fde93c17..e0f856bfbfe8 100644 > > --- a/arch/powerpc/platforms/powernv/opal.c > > +++ b/arch/powerpc/platforms/powernv/opal.c > > @@ -395,7 +395,6 @@ static int opal_recover_mce(struct pt_regs *regs, > > struct machine_check_event *evt) > > { > > int recovered = 0; > > - uint64_t ea = get_mce_fault_addr(evt); > > > > if (!(regs->msr & MSR_RI)) { > > /* If MSR_RI isn't set, we cannot recover */ > > @@ -404,26 +403,18 @@ static int opal_recover_mce(struct pt_regs *regs, > > } else if (evt->disposition == MCE_DISPOSITION_RECOVERED) { > > /* Platform corrected itself */ > > recovered = 1; > > - } else if (ea && !is_kernel_addr(ea)) { > > + } else if (evt->severity == MCE_SEV_FATAL) { > > + /* Fatal machine check */ > > + pr_err("Machine check interrupt is fatal\n"); > > + recovered = 0; > > Setting recovered = 0 would trigger kernel panic. Should we panic the > kernel for asynchronous errors ? If it's not recoverable, I don't see what other option we have. SRR0 is meaningless for async machine checks. So it's much the same thing we do as if we don't have a process to kill or were running in kernel when a synchronous MCE occurred. Thanks, Nick