RE: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board support
> -Original Message- > From: Wood Scott-B07421 > Sent: Friday, July 17, 2015 1:06 AM > To: Jain Priyanka-B32167 > Cc: linuxppc-dev@lists.ozlabs.org > Subject: Re: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB > board support > > On Thu, 2015-07-16 at 04:34 -0500, Jain Priyanka-B32167 wrote: > > > > -Original Message- > > From: Wood Scott-B07421 > > Sent: Wednesday, July 15, 2015 11:17 PM > > To: Jain Priyanka-B32167 > > Cc: linuxppc-dev@lists.ozlabs.org > > Subject: Re: [PATCH][v2] powerpc/fsl-booke: Add > T1040D4RDB/T1042D4RDB > > board support > > > > On Wed, 2015-07-15 at 15:00 +0530, Priyanka Jain wrote: > > > T1040D4RDB/T1042D4RDB are Freescale Reference Design Board which > can > > > support T1040/T1042 QorIQ Power Architecture™ processor respectively > > > > > > T1040D4RDB/T1042D4RDB board Overview > > > - > > > - SERDES Connections, 8 lanes supporting: > > > - PCI > > > - SGMII > > > - SATA 2.0 > > > - QSGMII(only for T1040D4RDB) > > > - DDR Controller > > > - Supports rates of up to 1600 MHz data-rate > > > - Supports one DDR4 UDIMM > > > -IFC/Local Bus > > > - NAND flash: 1GB 8-bit NAND flash > > > - NOR: 128MB 16-bit NOR Flash > > > - Ethernet > > > - Two on-board RGMII 10/100/1G ethernet ports. > > > - PHY #0 remains powered up during deep-sleep > > > - CPLD > > > - Clocks > > > - System and DDR clock (SYSCLK, “DDRCLK”) > > > - SERDES clocks > > > - Power Supplies > > > - USB > > > - Supports two USB 2.0 ports with integrated PHYs > > > - Two type A ports with 5V@1.5Aperport. > > > - SDHC > > > - SDHC/SDXC connector > > > - SPI > > > - On-board 64MB SPI flash > > > - I2C > > > - Devices connected: EEPROM, thermal monitor, VID controller > > > - Other IO > > > - Two Serial ports > > > - ProfiBus port > > > > > > Add support for T1040/T1042D4RDB board: > > > -add device tree > > > -Add entry in corenet_generic.c > > > > > > Signed-off-by: Priyanka Jain > > > --- > > > Changes for v2: > > > Incorporated Scott's comments on device tree > > > > You didn't respond to the comments on the CPLD node. > > [Priyanka] > > T1042D4RDB, T1040D4RDB are derivatives of same board , CPLD is same > > for both. > > So, I have moved below node having compatible and reg field together > > in t104xd4rdb.dtsi. > > Is this fine? > > cpld@3,0 { > > compatible = "fsl,t1040d4rdb-cpld"; > > reg = <3 0 0x300>; > > }; > > If the CPLD image is exactly the same on both, this is fine. > > > > +i2c@118100{ > > > + mux@77{ > > > + compatible = "nxp,pca9546"; > > > + reg = <0x77>; > > > + #address-cells = <1>; > > > + #size-cells = <0>; > > > + }; > > > + }; > > > > A mux with no nodes under it (and yet it has #address-cells/#size-cells)? > > What is it multiplexing? > > [Priyanka]: PCA9546 is i2c mux device , to which other i2c devices > > (up-to 8 > > ) can be further connected on output channels On T104xD4RDB, channel > > 0, 1, 3 line are connected to PEX device, Channel 2 to hdmi interface > > (initialization is done in u-boot only), other channels are grounded. > > So, as such Linux is not using the second level I2C devices connected > > on this MUX device. So, I have not shown next level hierarchy. > > Should I replace 'mux' with some other name? . Please suggest. > > The device tree describes the hardware, not just what Linux uses... but what > I don't understand is why you describe the mux at all if you're not going to > describe what goes underneath it. > [Jain Priyanka-B32167] : Is below looks OK? i2c@118100{ + i2c@77{ + compatible = "nxp,pca9546"; + reg = <0x77>; + #address-cells = <1>; + #size-cells = <0>; + }; + }; > -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V9 10/11] powerpc/eeh: Support error recovery for VF PE
Different from PCI bus dependent PE, PE for VFs doesn't have the primary bus, on which the PCI hotplug is implemented. The patch supports error recovery, especially the PCI hotplug for VF's PE. The hotplug on VF's PE is implemented based on VFs, instead of PCI bus any more. [gwshan: changelog and code refactoring] Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/include/asm/eeh.h |1 + arch/powerpc/kernel/eeh.c|8 +++ arch/powerpc/kernel/eeh_driver.c | 100 ++ arch/powerpc/kernel/eeh_pe.c |3 +- 4 files changed, 90 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 331c856..ea1f13c4 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -142,6 +142,7 @@ struct eeh_dev { struct pci_controller *phb; /* Associated PHB */ struct pci_dn *pdn; /* Associated PCI device node */ struct pci_dev *pdev; /* Associated PCI device*/ + intin_error;/* Error flag for eeh_dev */ struct pci_dev *physfn; /* Associated PF PORT */ struct pci_bus *bus;/* PCI bus for partial hotplug */ }; diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index af9b597..28e4d73 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1227,6 +1227,14 @@ void eeh_remove_device(struct pci_dev *dev) * from the parent PE during the BAR resotre. */ edev->pdev = NULL; + + /* +* The flag "in_error" is used to trace EEH devices for VFs +* in error state or not. It's set in eeh_report_error(). If +* it's not set, eeh_report_{reset,resume}() won't be called +* for the VF EEH device. +*/ + edev->in_error = 0; dev->dev.archdata.edev = NULL; if (!(edev->pe->state & EEH_PE_KEEP)) eeh_rmv_from_parent_pe(edev); diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 89eb4bc..99868e2 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata) if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; if (*res == PCI_ERS_RESULT_NONE) *res = rc; + edev->in_error = 1; eeh_pcid_put(dev); return NULL; } @@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata) if (!driver->err_handler || !driver->err_handler->slot_reset || - (edev->mode & EEH_DEV_NO_HANDLER)) { + (edev->mode & EEH_DEV_NO_HANDLER) || + (!edev->in_error)) { eeh_pcid_put(dev); return NULL; } @@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata) if (!driver->err_handler || !driver->err_handler->resume || - (edev->mode & EEH_DEV_NO_HANDLER)) { + (edev->mode & EEH_DEV_NO_HANDLER) || + (!edev->in_error)) { edev->mode &= ~EEH_DEV_NO_HANDLER; - eeh_pcid_put(dev); - return NULL; + goto out; } driver->err_handler->resume(dev); +out: + edev->in_error = 0; eeh_pcid_put(dev); return NULL; } @@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void *userdata) return NULL; } +static void *eeh_add_virt_device(void *data, void *userdata) +{ + struct pci_driver *driver; + struct eeh_dev *edev = (struct eeh_dev *)data; + struct pci_dev *dev = eeh_dev_to_pci_dev(edev); + struct pci_dn *pdn = eeh_dev_to_pdn(edev); + + if (!(edev->physfn)) { + pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n", + __func__, edev->phb->global_number, pdn->busno, + PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn)); + return NULL; + } + + driver = eeh_pcid_get(dev); + if (driver) { + eeh_pcid_put(dev); + if (driver->err_handler) + return NULL; + } + + pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0); + return NULL; +} + static void *eeh_rmv_device(void *data, void *userdata) { struct pci_driver *driver; struct eeh_dev *edev = (struct eeh_dev *)data; struct pci_dev *dev = eeh_dev_to_pci_dev(edev); int *removed = (int *)userdata; + struct pci_dn *pdn = eeh_dev_to_pdn(edev); /* * Actually, we should remove the PCI bridges as well. @@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata) driver = eeh_pcid_get(dev); if (driver) { eeh_pcid_put(dev); - if (driver->err_handler) + if (removed && driver
[PATCH V9 11/11] powerpc/powernv: compound PE for VFs
When VF BAR size is larger than 64MB, we group VFs in terms of M64 BAR, which means those VFs in a group should form a compound PE. This patch links those VF PEs into compound PE in this case. [gwshan: code refactoring for a bit] Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/platforms/powernv/pci-ioda.c | 46 + arch/powerpc/platforms/powernv/pci.c | 17 +-- 2 files changed, 56 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 5738d31..d1530cb 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1359,9 +1359,20 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs) } list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) { + struct pnv_ioda_pe *s, *sn; if (pe->parent_dev != pdev) continue; + if ((pe->flags & PNV_IODA_PE_MASTER) && + (pe->flags & PNV_IODA_PE_VF)) { + list_for_each_entry_safe(s, sn, &pe->slaves, list) { + pnv_pci_ioda2_release_dma_pe(pdev, s); + list_del(&s->list); + pnv_ioda_deconfigure_pe(phb, s); + pnv_ioda_free_pe(phb, s->pe_number); + } + } + pnv_pci_ioda2_release_dma_pe(pdev, pe); /* Remove from list */ @@ -1414,7 +1425,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs) struct pci_bus*bus; struct pci_controller *hose; struct pnv_phb*phb; - struct pnv_ioda_pe*pe; + struct pnv_ioda_pe*pe, *master_pe; intpe_num; u16vf_index; struct pci_dn *pdn; @@ -1456,10 +1467,13 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs) continue; } - /* Put PE to the list */ - mutex_lock(&phb->ioda.pe_list_mutex); - list_add_tail(&pe->list, &phb->ioda.pe_list); - mutex_unlock(&phb->ioda.pe_list_mutex); + /* Put PE to the list, or postpone it for compound PEs */ + if ((pdn->m64_per_iov != M64_PER_IOV) || + (num_vfs <= M64_PER_IOV)) { + mutex_lock(&phb->ioda.pe_list_mutex); + list_add_tail(&pe->list, &phb->ioda.pe_list); + mutex_unlock(&phb->ioda.pe_list_mutex); + } pnv_pci_ioda2_setup_dma_pe(phb, pe); } @@ -1472,10 +1486,32 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs) vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov; for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) { + master_pe = NULL; + for (vf_index = vf_group * vf_per_group; vf_index < (vf_group + 1) * vf_per_group && vf_index < num_vfs; vf_index++) { + + /* +* Figure out the master PE and put all slave +* PEs to master PE's list. +*/ + pe = &phb->ioda.pe_array[pdn->offset + vf_index]; + if (!master_pe) { + pe->flags |= PNV_IODA_PE_MASTER; + INIT_LIST_HEAD(&pe->slaves); + master_pe = pe; + mutex_lock(&phb->ioda.pe_list_mutex); + list_add_tail(&pe->list, &phb->ioda.pe_list); + mutex_unlock(&phb->ioda.pe_list_mutex); + } else { + pe->flags |= PNV_IODA_PE_SLAVE; + pe->master = master_pe; + list_add_tail(&pe->list, + &master_pe->slaves); + } + for (vf_index1 = vf_group * vf_per_group; vf_index1 < (vf_group + 1) * vf_per_group && vf_index1 < num_vfs; diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index 0e4f42e..f3aead0 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -739,7 +739,7 @@ void pnv_pci_dma_dev_setup(struct pci_dev *pdev) struct pci_controller *hose = pci_bus_to_
[PATCH V9 04/11] powerpc/pci: Remove VFs prior to PF
As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove") indicates, VFs, which might be hooked to same PCI bus as their PF should be removed before the PF. Otherwise, the PCI hot unplugging on the PCI bus would cause kernel crash. The patch applies the above pattern to PowerPC PCI hotplug path. [gwshan: changelog] Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/kernel/pci-hotplug.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c index 7f9ed0c..59c4361 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -55,7 +55,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus) pr_debug("PCI: Removing devices on bus %04x:%02x\n", pci_domain_nr(bus), bus->number); - list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { + list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) { pr_debug(" Removing %s...\n", pci_name(dev)); pci_stop_and_remove_bus_device(dev); } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V9 09/11] powerpc/powernv: Support PCI config restore for VFs
After PE reset, OPAL API opal_pci_reinit() is called on all devices contained in the PE to reinitialize them. However, VFs can't be seen from skiboot firmware. We have to implement the functions, similar those in skiboot firmware, to reinitialize VFs after reset on PE for VFs. [gwshan: changelog and code refactoring] Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/include/asm/pci-bridge.h|1 + arch/powerpc/platforms/powernv/eeh-powernv.c | 70 +- arch/powerpc/platforms/powernv/pci.c | 18 +++ 3 files changed, 88 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 7a72f68..c927d5b 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -220,6 +220,7 @@ struct pci_dn { #define IODA_INVALID_M64(-1) int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; #endif /* CONFIG_PCI_IOV */ + int mps; #endif struct list_head child_list; struct list_head list; diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index 8d88be1..b09c0d1 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -1616,6 +1616,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe) return ret; } +static int pnv_eeh_restore_vf_config(struct pci_dn *pdn) +{ + struct eeh_dev *edev = pdn_to_eeh_dev(pdn); + u32 devctl, cmd, cap2, aer_capctl; + int old_mps; + + /* Restore MPS */ + if (edev->pcie_cap) { + old_mps = (ffs(pdn->mps) - 8) << 5; + eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL, +2, &devctl); + devctl &= ~PCI_EXP_DEVCTL_PAYLOAD; + devctl |= old_mps; + eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL, + 2, devctl); + } + + /* Disable Completion Timeout */ + if (edev->pcie_cap) { + eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2, +4, &cap2); + if (cap2 & 0x10) { + eeh_ops->read_config(pdn, + edev->pcie_cap + PCI_EXP_DEVCTL2, + 4, &cap2); + cap2 |= 0x10; + eeh_ops->write_config(pdn, + edev->pcie_cap + PCI_EXP_DEVCTL2, + 4, cap2); + } + } + + /* Enable SERR and parity checking */ + eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd); + cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR); + eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd); + + /* Enable report various errors */ + if (edev->pcie_cap) { + eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL, + 2, &devctl); + devctl &= ~PCI_EXP_DEVCTL_CERE; + devctl |= (PCI_EXP_DEVCTL_NFERE | + PCI_EXP_DEVCTL_FERE | + PCI_EXP_DEVCTL_URRE); + eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL, + 2, devctl); + } + + /* Enable ECRC generation and check */ + if (edev->pcie_cap && edev->aer_cap) { + eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP, + 4, &aer_capctl); + aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE); + eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP, + 4, aer_capctl); + } + + return 0; +} + static int pnv_eeh_restore_config(struct pci_dn *pdn) { struct eeh_dev *edev = pdn_to_eeh_dev(pdn); @@ -1626,7 +1687,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn) return -EEXIST; phb = edev->phb->private_data; - ret = opal_pci_reinit(phb->opal_id, + /* +* We have to restore the PCI config space after reset since the +* firmware can't see SRIOV VFs. +*/ + if (edev->physfn) + ret = pnv_eeh_restore_vf_config(pdn); + else + ret = opal_pci_reinit(phb->opal_id, OPAL_REINIT_PCI_DEV, edev->config_addr); if (ret) { pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n", diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index 765d8ed..0e4f42e 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -788,6 +788,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev) } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk); +#ifdef
[PATCH V9 08/11] powerpc/powernv: Support EEH reset for VF PE
PEs for VFs don't have primary bus. So they have to have their own reset backend, which is used during EEH recovery. The patch implements the reset backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained in the PE. [gwshan: changelog and code refactoring] Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/include/asm/eeh.h |1 + arch/powerpc/platforms/powernv/eeh-powernv.c | 134 +- 2 files changed, 134 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index ec21f8f..331c856 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -136,6 +136,7 @@ struct eeh_dev { int pcix_cap; /* Saved PCIx capability*/ int pcie_cap; /* Saved PCIe capability*/ int aer_cap;/* Saved AER capability */ + int af_cap; /* Saved AF capability */ struct eeh_pe *pe; /* Associated PE*/ struct list_head list; /* Form link list in the PE */ struct pci_controller *phb; /* Associated PHB */ diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index e9aec1d..8d88be1 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -404,6 +404,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data) edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX); edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP); edev->aer_cap = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR); + edev->af_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF); if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) { edev->mode |= EEH_DEV_BRIDGE; if (edev->pcie_cap) { @@ -893,6 +894,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option) return 0; } +static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos, +u16 mask, bool af_flr_rst) +{ + struct eeh_dev *edev = pdn_to_eeh_dev(pdn); + int status, i; + + /* Wait for Transaction Pending bit to be cleared */ + for (i = 0; i < 4; i++) { + eeh_ops->read_config(pdn, pos, 2, &status); + if (!(status & mask)) + return; + + msleep((1 << i) * 100); + } + + pr_warn("%s: Pending transaction while issuing %s FLR to " + "%04x:%02x:%02x.%01x\n", + __func__, af_flr_rst ? "AF" : "", + edev->phb->global_number, pdn->busno, + PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn)); +} + +static int pnv_eeh_do_flr(struct pci_dn *pdn, int option) +{ + struct eeh_dev *edev = pdn_to_eeh_dev(pdn); + u32 reg; + + if (!edev->pcie_cap) + return -ENOTTY; + + eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, ®); + if (!(reg & PCI_EXP_DEVCAP_FLR)) + return -ENOTTY; + + switch (option) { + case EEH_RESET_HOT: + case EEH_RESET_FUNDAMENTAL: + pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA, +PCI_EXP_DEVSTA_TRPND, false); + eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL, +4, ®); + reg |= PCI_EXP_DEVCTL_BCR_FLR; + eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL, + 4, reg); + msleep(EEH_PE_RST_HOLD_TIME); + break; + case EEH_RESET_DEACTIVATE: + eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL, +4, ®); + reg &= ~PCI_EXP_DEVCTL_BCR_FLR; + eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL, + 4, reg); + msleep(EEH_PE_RST_SETTLE_TIME); + break; + } + + return 0; +} + +static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option) +{ + struct eeh_dev *edev = pdn_to_eeh_dev(pdn); + u32 cap; + + if (!edev->af_cap) + return -ENOTTY; + + eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap); + if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR)) + return -ENOTTY; + + switch (option) { + case EEH_RESET_HOT: + case EEH_RESET_FUNDAMENTAL: + /* +* Wait for Transaction Pending bit to clear. A word-aligned +* test is used, so we use the conrol offset rather than status +* and shift the test bit to match. +*/ + pnv_eeh_wait_for_pending(pdn, edev->af_cap
[PATCH V9 07/11] powerpc/eeh: Create PE for VFs
Current EEH recovery code works with the assumption: the PE has primary bus. Unfortunately, that's not true for VF PEs, which generally contains one or multiple VFs (for VF group case). The patch creates PEs for VFs in the weak function pcibios_bus_add_device(). Those PEs for VFs are identified with newly introduced flag EEH_PE_VF so that we handle them differently during EEH recovery. [gwshan: changelog and code refactoring] Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/include/asm/eeh.h |1 + arch/powerpc/kernel/eeh_pe.c | 10 -- arch/powerpc/platforms/powernv/eeh-powernv.c | 16 3 files changed, 25 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index 6c383ad..ec21f8f 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -72,6 +72,7 @@ struct pci_dn; #define EEH_PE_PHB (1 << 1)/* PHB PE*/ #define EEH_PE_DEVICE (1 << 2)/* Device PE */ #define EEH_PE_BUS (1 << 3)/* Bus PE*/ +#define EEH_PE_VF (1 << 4)/* VF PE */ #define EEH_PE_ISOLATED(1 << 0)/* Isolated PE */ #define EEH_PE_RECOVERING (1 << 1)/* Recovering PE*/ diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 35f0b62..260a701 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev) * EEH device already having associated PE, but * the direct parent EEH device doesn't have yet. */ - pdn = pdn ? pdn->parent : NULL; + if (edev->physfn) + pdn = pci_get_pdn(edev->physfn); + else + pdn = pdn ? pdn->parent : NULL; while (pdn) { /* We're poking out of PCI territory */ parent = pdn_to_eeh_dev(pdn); @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) } /* Create a new EEH PE */ - pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE); + if (edev->physfn) + pe = eeh_pe_alloc(edev->phb, EEH_PE_VF); + else + pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE); if (!pe) { pr_err("%s: out of memory!\n", __func__); return -ENOMEM; diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index 5cf5e6e..e9aec1d 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -1524,6 +1524,22 @@ static struct eeh_ops pnv_eeh_ops = { .restore_config = pnv_eeh_restore_config }; +void pcibios_bus_add_device(struct pci_dev *pdev) +{ + struct pci_dn *pdn = pci_get_pdn(pdev); + + if (!pdev->is_virtfn) + return; + + /* +* The following operations will fail if VF's sysfs files +* aren't created or its resources aren't finalized. +*/ + eeh_add_device_early(pdn); + eeh_add_device_late(pdev); + eeh_sysfs_add_device(pdev); +} + /** * eeh_powernv_init - Register platform dependent EEH operations * -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V9 06/11] powerpc/powernv: EEH device for VF
VFs and their corresponding pci_dn instances are created and released dynamically as their PF's SRIOV capability is enabled and disabled. The patch creates and releases EEH devices for VFs when creating and releasing their pci_dn instances, which means EEH devices and pci_dn instances have same life cycle. Also, VF's EEH device is identified by (struct eeh_dev::physfn). [gwshan: changelog and removed CONFIG_PCI_IOV] Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/include/asm/eeh.h |1 + arch/powerpc/kernel/pci_dn.c | 12 2 files changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index c5eb86f..6c383ad 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -140,6 +140,7 @@ struct eeh_dev { struct pci_controller *phb; /* Associated PHB */ struct pci_dn *pdn; /* Associated PCI device node */ struct pci_dev *pdev; /* Associated PCI device*/ + struct pci_dev *physfn; /* Associated PF PORT */ struct pci_bus *bus;/* PCI bus for partial hotplug */ }; diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index f771130..f0ddde7 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent, struct pci_dn *add_dev_pci_data(struct pci_dev *pdev) { #ifdef CONFIG_PCI_IOV + struct pci_controller *hose = pci_bus_to_host(pdev->bus); struct pci_dn *parent, *pdn; + struct eeh_dev *edev; int i; /* Only support IOV for now */ @@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev) __func__, i); return NULL; } + eeh_dev_init(pdn, hose); + edev = pdn_to_eeh_dev(pdn); + edev->physfn = pdev; } #endif /* CONFIG_PCI_IOV */ @@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev) for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) { list_for_each_entry_safe(pdn, tmp, &parent->child_list, list) { + struct eeh_dev *edev; if (pdn->busno != pci_iov_virtfn_bus(pdev, i) || pdn->devfn != pci_iov_virtfn_devfn(pdev, i)) continue; + edev = pdn_to_eeh_dev(pdn); + if (edev) { + pdn->edev = NULL; + kfree(edev); + } + if (!list_empty(&pdn->list)) list_del(&pdn->list); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V9 05/11] powerpc/eeh: Cache only BARs, not windows or IOV BARs
EEH address cache, which helps to locate the PCI device according to the given (physical) MMIO address, didn't cover PCI bridges. Also, it shouldn't return PF with address in PF's IOV BARs. Instead, the VFs should be returned. Also, by doing so, it removes the type check in eeh_addr_cache_insert_dev(), since bridge's window would not be cached. The patch restricts the address cache to cover first 7 BARs for the above purposes. [gwshan: changelog] Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/kernel/eeh_cache.c |6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index a1e86e1..e6887f0 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev) } /* Walk resources on this device, poke them into the tree */ - for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { + for (i = 0; i <= PCI_ROM_RESOURCE; i++) { resource_size_t start = pci_resource_start(dev,i); resource_size_t end = pci_resource_end(dev,i); unsigned long flags = pci_resource_flags(dev,i); @@ -222,10 +222,6 @@ void eeh_addr_cache_insert_dev(struct pci_dev *dev) { unsigned long flags; - /* Ignore PCI bridges */ - if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) - return; - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); __eeh_addr_cache_insert_dev(dev); spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V9 03/11] powerpc/pci: Cache VF index in pci_dn
The patch caches the VF index in pci_dn, which can be used to calculate VF's bus, device and function number. Those information helps to locate the VF's PCI device instance when doing hotplug during EEH recovery if necessary. Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/include/asm/pci-bridge.h |1 + arch/powerpc/kernel/pci_dn.c |4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 712add5..7a72f68 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -210,6 +210,7 @@ struct pci_dn { #define IODA_INVALID_PE(-1) #ifdef CONFIG_PPC_POWERNV int pe_number; + int vf_index; /* VF index in the PF */ #ifdef CONFIG_PCI_IOV u16 vfs_expanded; /* number of VFs IOV BAR expanded */ u16 num_vfs;/* number of VFs enabled*/ diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index b3b4df9..f771130 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev) #ifdef CONFIG_PCI_IOV static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent, struct pci_dev *pdev, + int vf_index, int busno, int devfn) { struct pci_dn *pdn; @@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent, pdn->parent = parent; pdn->busno = busno; pdn->devfn = devfn; + pdn->vf_index = vf_index; #ifdef CONFIG_PPC_POWERNV pdn->pe_number = IODA_INVALID_PE; #endif @@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev) return NULL; for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) { - pdn = add_one_dev_pci_data(parent, NULL, + pdn = add_one_dev_pci_data(parent, NULL, i, pci_iov_virtfn_bus(pdev, i), pci_iov_virtfn_devfn(pdev, i)); if (!pdn) { -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V9 01/11] PCI/IOV: Rename and export virtfn_add/virtfn_remove
During EEH recovery, hotplug is applied to the devices which don't have drivers or their drivers don't support EEH. However, the hotplug, which was implemented based on PCI bus, can't be applied to VF directly. The patch renames virtn_{add,remove}() and exports them so that they can be used in PCI hotplug during EEH recovery. [gwshan: changelog] Signed-off-by: Wei Yang Reviewed-by: Gavin Shan Acked-by: Bjorn Helgaas --- drivers/pci/iov.c | 10 +- include/linux/pci.h |8 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index ee0ebff..cc941dd 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno) return dev->sriov->barsz[resno - PCI_IOV_RESOURCES]; } -static int virtfn_add(struct pci_dev *dev, int id, int reset) +int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset) { int i; int rc = -ENOMEM; @@ -183,7 +183,7 @@ failed: return rc; } -static void virtfn_remove(struct pci_dev *dev, int id, int reset) +void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset) { char buf[VIRTFN_ID_LEN]; struct pci_dev *virtfn; @@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) } for (i = 0; i < initial; i++) { - rc = virtfn_add(dev, i, 0); + rc = pci_iov_virtfn_add(dev, i, 0); if (rc) goto failed; } @@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) failed: for (j = 0; j < i; j++) - virtfn_remove(dev, j, 0); + pci_iov_virtfn_remove(dev, j, 0); iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); @@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev) return; for (i = 0; i < iov->num_VFs; i++) - virtfn_remove(dev, i, 0); + pci_iov_virtfn_remove(dev, i, 0); pcibios_sriov_disable(dev); diff --git a/include/linux/pci.h b/include/linux/pci.h index 8a0321a..3fed437 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1668,6 +1668,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); +int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset); +void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset); int pci_num_vf(struct pci_dev *dev); int pci_vfs_assigned(struct pci_dev *dev); int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); @@ -1685,6 +1687,12 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id) static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn) { return -ENODEV; } static inline void pci_disable_sriov(struct pci_dev *dev) { } +static inline int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset) +{ + return -ENOSYS; +} +static inline void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset) +{ } static inline int pci_num_vf(struct pci_dev *dev) { return 0; } static inline int pci_vfs_assigned(struct pci_dev *dev) { return 0; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V9 02/11] PCI: Add pcibios_bus_add_device() weak function
This patch adds a weak function pcibios_bus_add_device() for arch dependent code could do proper setup. For example, powerpc could setup EEH related resources. Signed-off-by: Wei Yang Acked-by: Bjorn Helgaas --- drivers/pci/bus.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 6fbd3f2..b7e30a7 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -267,6 +267,7 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx) void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { } +void __weak pcibios_bus_add_device(struct pci_dev *dev) { } /** * pci_bus_add_device - start driver for a single device * @dev: device to add @@ -277,6 +278,8 @@ void pci_bus_add_device(struct pci_dev *dev) { int retval; + pcibios_bus_add_device(dev); + /* * Can not put in pci_device_add yet because resources * are not assigned yet for some devices. -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V9 00/11] VF EEH on Power8
This patchset enables EEH on SRIOV VFs. The general idea is to create proper VF edev and VF PE and handle them properly. Different from the Bus PE, VF PE just contain one VF. This introduces the difference of EEH error handling on a VF PE. Generally, it has several differences. First, the VF's removal and re-enumerate rely on its PF. VF has a tight relationship between its PF. This is not proper to enumerate a VF by usual scan procedure. That's why virtfn_add/virtfn_remove are exported in this patch set. Second, the reset/restore of a VF is done in kernel space. FW is not aware of the VF, this means the usual reset function done in FW will not work. One of the patch will imitate the reset/restore function in kernel space. Third, the VF may be removed during the PF's error_detected function. In this case, the original error_detected->slot_reset->resume sequence is not proper to those removed VFs, since they are re-created by PF in a fresh state. A flag in eeh_dev is introduce to mark the eeh_dev is in error state. By doing so, we track whether this device needs to be reset or not. This has been tested both on host and in guest on Power8 with latest kernel version. v9: * split pcibios_bus_add_device() into a separate patch * Bjorn acked the PCI part and agreed this patch set to be merged from ppc tree * rebased on mpe/linux.git next branch v8: * fix on checking the return value of pnv_eeh_do_flr() * introduced a weak function pcibios_bus_add_device() to create PE for VFs v7: * fix compile error when PCI_IOV is not set v6: * code / commit log refactor by Gavin v5: * remove the compound field, iterate on Master VF PE instead * some code refine on PCI config restore and reset on VF the wait time for assert and deassert PCI device address format check on edev->pcie_cap and edev->aer_cap before access them v4: * refine the change logs, comment and code style * change pnv_pci_fixup_vf_eeh() to pnv_eeh_vf_final_fixup() and remove the CONFIG_PCI_IOV macro * reorder patch 5/6 to make the logic more reasonable * remove remove_dev_pci_data() * remove the EEH_DEV_VF flag, use edev->physfn to identify a VF EEH DEV and remove related CONFIG_PCI_IOV macro * add the option for VF reset * fix the pnv_eeh_cfg_blocked() logic * replace pnv_pci_cfg_{read,write} with eeh_ops->{read,write}_config in pnv_eeh_vf_restore_config() * rename pnv_eeh_vf_restore_config() to pnv_eeh_restore_vf_config() * rename pnv_pci_fixup_vf_caps() to pnv_pci_vf_header_fixup() and move it to arch/powerpc/platforms/powernv/pci.c * add a field compound in pnv_ioda_pe to link compound PEs * handle compound PE for VF PEs v3: * add back vf_index in pci_dn to track the VF's index * rename ppdev in eeh_dev to physfn for consistency * move edev->physfn assignment before dev->dev.archdata.edev is set * move pnv_pci_fixup_vf_eeh() and pnv_pci_fixup_vf_caps() to eeh-powernv.c * more clear and detail in commit log and comment in code * merge eeh_rmv_virt_device() with eeh_rmv_device() * move the cfg_blocked check logic from pnv_eeh_read/write_config() to pnv_eeh_cfg_blocked() * move the vf reset/restore logic into its own patch, two patches are created. powerpc/powernv: Support PCI config restore for VFs powerpc/powernv: Support EEH reset for VFs * simplify the vf reset logic v2: * add prefix pci_iov_ to virtfn_add/virtfn_remove * use EEH_DEV_VF as a flag for a VF's eeh_dev * use eeh_dev instead of edev in change log * remove vf_index in eeh_dev, calculate it from pdn->busno and devfn * do eeh_add_device_late() and eeh_sysfs_add_device() both after pci_dev is well initialized * do FLR to reset a VF PE * imitate the restore function in FW for VF * remove the reverse order patch, since it is still under discussion Wei Yang (11): PCI/IOV: Rename and export virtfn_add/virtfn_remove PCI: Add pcibios_bus_add_device() weak function powerpc/pci: Cache VF index in pci_dn powerpc/pci: Remove VFs prior to PF powerpc/eeh: Cache only BARs, not windows or IOV BARs powerpc/powernv: EEH device for VF powerpc/eeh: Create PE for VFs powerpc/powernv: Support EEH reset for VF PE powerpc/powernv: Support PCI config restore for VFs powerpc/eeh: Support error recovery for VF PE powerpc/powernv: compound PE for VFs arch/powerpc/include/asm/eeh.h |4 + arch/powerpc/include/asm/pci-bridge.h|2 + arch/powerpc/kernel/eeh.c|8 + arch/powerpc/kernel/eeh_cache.c |6 +- arch/powerpc/kernel/eeh_driver.c | 100 +--- arch/powerpc/kernel/eeh_pe.c | 13 +- arch/powerpc/kernel/pci-hotplug.c|2 +- arch/powerpc/kernel/pci_dn.c | 16 +- arch/powerpc/platforms/powernv/eeh-powernv.c | 220 +- arch/powerpc/platforms/powernv/pci-ioda.c
Re: [PATCH v5 3/3] leds/powernv: Add driver for PowerNV platform
On 07/16/2015 02:17 PM, Michael Ellerman wrote: > On Thu, 2015-07-16 at 10:27 +0200, Jacek Anaszewski wrote: >> On 07/16/2015 08:54 AM, Vasant Hegde wrote: > +static enum led_brightness powernv_led_get(struct led_classdev *led_cdev) > +{ > +char *loc_code; > +int rc, led_type; > +__be64 led_mask, led_value, max_led_type; > + > +led_type = powernv_get_led_type(led_cdev); > +if (led_type == -1) > +return LED_OFF; > + > +loc_code = powernv_get_location_code(led_cdev); > +if (!loc_code) > +return LED_OFF; > + > +/* Fetch all LED status */ > +led_mask = cpu_to_be64(0); > +led_value = cpu_to_be64(0); > +max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX); > + > +rc = opal_leds_get_ind(loc_code, &led_mask, &led_value, > &max_led_type); > +if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) { > +dev_err(led_cdev->dev, > +"%s: OPAL get led call failed [rc=%d]\n", > +__func__, rc); > +goto led_fail; > +} > + > +led_mask = be64_to_cpu(led_mask); > +led_value = be64_to_cpu(led_value); be64_to_cpu result should be assigned to the variable of u64/s64 type. >>> >>> PowerNV platform is capable of running both big/little endian mode.. But >>> presently our firmware is big endian. These variable contains big endian >>> values. >>> Hence I have created as __be64 .. (This is the convention we follow in other >>> places as well). >> >> It is correct that the argument is of __be64 type, but be64_to_cpu >> returns u64 type, whereas you assign it to __be64. > > Yeah that's wrong. You are using led_mask etc as __be64 when you pass them to > firmware, which is correct, but then you're also using them as the lvalue of > be64_to_cpu() which returns a u64. > Yep. Got it. > Sparse should warn you about that if you use it, please do. > > $ apt-get install sparse > $ cd kernel > $ make C=2 CF=-D__CHECK_ENDIAN__ > Thanks! -Vasant ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 3/3] leds/powernv: Add driver for PowerNV platform
On 07/16/2015 01:57 PM, Jacek Anaszewski wrote: > Hi Vasan, > Hello Jacek, .../... >> >> I have added as >> - compatible : "ibm,opal-v3-led". > > Please retain "Should be :". > Done. .../... >>> >>> Please parse the led type once upon initialization and add related >>> property to the struct powernv_led_data that will hold the value. >> >> I thought we can get location code and type using class dev name itself. >> Hence I >> didn't add these two properties to structure.. > > This way you are doing extra work for parsing the name each time > the brightness is set. Agreed. I have added them to structure now. > >> Do you want me to add them to structure itself? > > Yes, please add them. Done. > >>> +loc_code = powernv_get_location_code(led_cdev); +if (!loc_code) +return; >>> >>> The same situation as in case of led type. >>> +/* Prepare for the OPAL call */ +max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX); >>> >>> This value could be also calculated only once. >> >> Yeah. May be I can move this to powernv_leds_priv structure. >> >>> +led_mask = OPAL_SLOT_LED_STATE_ON << led_type; +if (value) +led_value = led_mask; + +/* OPAL async call */ +token = opal_async_get_token_interruptible(); +if (token < 0) { +if (token != -ERESTARTSYS) +dev_err(led_cdev->dev, +"%s: Couldn't get OPAL async token\n", +__func__); +goto out_loc; +} + +rc = opal_leds_set_ind(token, loc_code, + led_mask, led_value, &max_led_type); +if (rc != OPAL_ASYNC_COMPLETION) { +dev_err(led_cdev->dev, +"%s: OPAL set LED call failed for %s [rc=%d]\n", +__func__, loc_code, rc); +goto out_token; +} + +rc = opal_async_wait_response(token, &msg); +if (rc) { +dev_err(led_cdev->dev, +"%s: Failed to wait for the async response [rc=%d]\n", +__func__, rc); +goto out_token; +} + +rc = be64_to_cpu(msg.params[1]); +if (rc != OPAL_SUCCESS) +dev_err(led_cdev->dev, +"%s : OAPL async call returned failed [rc=%d]\n", +__func__, rc); + +out_token: +opal_async_release_token(token); + +out_loc: +kfree(loc_code); +} + +/* + * This function fetches the LED state for a given LED type for + * mentioned LED classdev structure. + */ +static enum led_brightness powernv_led_get(struct led_classdev *led_cdev) +{ +char *loc_code; +int rc, led_type; +__be64 led_mask, led_value, max_led_type; + +led_type = powernv_get_led_type(led_cdev); +if (led_type == -1) +return LED_OFF; + +loc_code = powernv_get_location_code(led_cdev); +if (!loc_code) +return LED_OFF; + +/* Fetch all LED status */ +led_mask = cpu_to_be64(0); +led_value = cpu_to_be64(0); +max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX); + +rc = opal_leds_get_ind(loc_code, &led_mask, &led_value, &max_led_type); +if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) { +dev_err(led_cdev->dev, +"%s: OPAL get led call failed [rc=%d]\n", +__func__, rc); +goto led_fail; +} + +led_mask = be64_to_cpu(led_mask); +led_value = be64_to_cpu(led_value); >>> >>> be64_to_cpu result should be assigned to the variable of u64/s64 type. >> >> PowerNV platform is capable of running both big/little endian mode.. But >> presently our firmware is big endian. These variable contains big endian >> values. >> Hence I have created as __be64 .. (This is the convention we follow in other >> places as well). > > It is correct that the argument is of __be64 type, but be64_to_cpu > returns u64 type, whereas you assign it to __be64. > Got it .. Fixed. >>> +/* LED status available */ +if (!((led_mask >> led_type) & OPAL_SLOT_LED_STATE_ON)) { +dev_err(led_cdev->dev, +"%s: LED status not available for %s\n", +__func__, led_cdev->name); +goto led_fail; +} + +/* LED status value */ +if ((led_value >> led_type) & OPAL_SLOT_LED_STATE_ON) { +kfree(loc_code); +return LED_FULL; +} + +led_fail: +kfree(loc_code); +return LED_OFF; +} + +/* Execute LED set task for given led classdev */ +static void powernv_deferred_led_set(struct work_struct *work) +{ +struct powernv_led_data *powernv_led = +container_of(work, struct powernv_led_
Re: BUG: perf error on syscalls for powerpc64.
On 2015年07月17日 12:07, Michael Ellerman wrote: On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote: On 2015年07月16日 17:04, Michael Ellerman wrote: On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote: Hi All, 1028ccf5 did a change for sys_call_table from a pointer to an array of unsigned long, I think it's not proper, here is my reason: sys_call_table defined as a label in assembler should be pointer array rather than an array as described in 1028ccf5. If we defined it as an array, then arch_syscall_addr will return the address of sys_call_table[], actually the content of sys_call_table[] is demanded by arch_syscall_addr. so 'perf list' will ignore all syscalls since find_syscall_meta will return null in init_ftrace_syscalls because of the wrong arch_syscall_addr. Did I miss something, or Gcc compiler has done something newer ? Hi Zumeng, It works for me with the code as it is in mainline. I don't quite follow your explanation, so if you're seeing a bug please send some information about what you're actually seeing. And include the disassembly of arch_syscall_addr() and your compiler version etc. Hi Michael, Hi Zumeng, Yeah, it seems it was not a good explanation, I'll explain more this time: 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer to sys_call_table rather than sys_call_table self in assemble level. No it's not a pointer. Then what is the second one in the following: zchen@pek-yocto-build2:$ cat System.map |grep sys_call_table c0009590 T .sys_call_table <-this is a real sys_call_table. c14e1b48 D sys_call_table <-this should be referred by arch_syscall_addr The c14e1b48[0] = c0009590 A pointer is a location in memory that contains the address of another location in memory. Yeah, this definition is right. arch/powerpc/kernel/systbl.S 47 .globl sys_call_table <--- see here 48 sys_call_table: Which gives us a .o that looks like: : 0: R_PPC64_ADDR64 sys_restart_syscall 8: R_PPC64_ADDR64 sys_restart_syscall 10: R_PPC64_ADDR64 sys_exit 18: R_PPC64_ADDR64 sys_exit ie. at the location in memory called sys_call_table we have *the contents of the syscall table*. We do not have *the address* of the syscall table. You can also see in the System.map: c0bb0798 R sys_call_table c0bb1e58 r cache_type_info Please refer to `cat System.map` above ie. sys_call_table occupies 5824 bytes. If it was a pointer it would only occupy 8 bytes. Compare to SYS_CALL_TABLE, which *is* a pointer. c1172bf8 d SYS_CALL_TABLE c1172c00 d exception_marker Note, 8 bytes. Finally if you look at a running system using xmon: 0:mon> d $sys_call_table c08f0798 c00a85a0 c00a85a0 || c08f07a8 c0099b40 c0099b40 |...@...@| This is right sys_call_table. but not what I'm talking about. What I'm talking about is that the definition of sys_call_table by that commit will incur the following result: sys_call_table[0]= 0xc14e1b48[0] = c0009590la c00a85a0 c00a85a0: .sys_restart_syscall+0x0/0x40 0:mon> la c0099b40 c0099b40: .SyS_exit+0x0/0x20 0:mon> d $SYS_CALL_TABLE c0ec68f8 c08f0798 7265677368657265 |regshere| ^ this is the address of sys_call_table As another example, see hcall_real_table, which is basically identical, and is also declared as an array in C. 3. What I have seen in 3.14.x kernel, == And so far, no more difference to 4.x kernel from me about this part if I'm right. *) With 1028ccf5 perf list|grep -i syscall got me nothing. *) Without 1028ccf5 root@localhost:~# perf list|grep -i syscall syscalls:sys_enter_socket [Tracepoint event] syscalls:sys_exit_socket [Tracepoint event] syscalls:sys_enter_socketpair [Tracepoint event] syscalls:sys_exit_socketpair [Tracepoint event] syscalls:sys_enter_bind[Tracepoint event] syscalls:sys_exit_bind [Tracepoint event] syscalls:sys_enter_listen [Tracepoint event] syscalls:sys_exit_listen [Tracepoint event] ... ... I don't know why that
Re: [PATCH] powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*
On Thu, 2015-07-16 at 22:12 +1000, Paul Mackerras wrote: > The hardware RNG on POWER8 and POWER7+ can be relatively slow, since > it can only supply one 64-bit value per microsecond. Currently we > read it in arch_get_random_long(), but that slows down reading from > /dev/urandom since the code in random.c calls arch_get_random_long() > for every longword read from /dev/urandom. > > Since the hardware RNG supplies high-quality entropy on every read, it > matches the semantics of arch_get_random_seed_long() better than those > of arch_get_random_long(). Therefore this commit makes the code use > the hardware RNG only for arch_get_random_seed_{long,int} and not for > arch_get_random_{long,int}. > > Signed-off-by: Paul Mackerras Yep seems sensible. Can you resend and CC some of the random folks, just in case they care. eg: ty...@mit.edu, keesc...@chromium.org, h...@linux.intel.com. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 01/12] powerpc/kernel: Get pt_regs from r9 before calling do_syscall_trace_enter()
On Fri, 2015-07-17 at 08:40 +1000, Benjamin Herrenschmidt wrote: > On Wed, 2015-07-15 at 17:37 +1000, Michael Ellerman wrote: > > To call do_syscall_trace_enter() we need pt_regs in r3, but we don't need > > to recalculate it based on r1, it's already in r9. > > > > Signed-off-by: Michael Ellerman > > Is there any performance difference ? No. I'm not going to bother measuring it :) > I find the addi a bit more robust in case the code gets moved around or > the "previous" code gets changed to either not use r9 or clobber it, > which would have the potential to > introduce a subtle bug ... Yeah true. There is an "invariant" in that entry code that r9 contains pt_regs, you can see for example the DTL code goes to pains to ensure it puts pt_regs back in r9 after it clobbers it. As does the current syscall_dotrace. But looking closer I don't see where we actually use that (prior to this patch). So yeah I'll drop this and send a clean up to just get rid of all the r9 reloading. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: BUG: perf error on syscalls for powerpc64.
On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote: > On 2015年07月16日 17:04, Michael Ellerman wrote: > > On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote: > >> Hi All, > >> > >> 1028ccf5 did a change for sys_call_table from a pointer to an array of > >> unsigned long, I think it's not proper, here is my reason: > >> > >> sys_call_table defined as a label in assembler should be pointer array > >> rather than an array as described in 1028ccf5. If we defined it as an > >> array, then arch_syscall_addr will return the address of sys_call_table[], > >> actually the content of sys_call_table[] is demanded by arch_syscall_addr. > >> so 'perf list' will ignore all syscalls since find_syscall_meta will > >> return null > >> in init_ftrace_syscalls because of the wrong arch_syscall_addr. > >> > >> Did I miss something, or Gcc compiler has done something newer ? > > Hi Zumeng, > > > > It works for me with the code as it is in mainline. > > > > I don't quite follow your explanation, so if you're seeing a bug please send > > some information about what you're actually seeing. And include the > > disassembly > > of arch_syscall_addr() and your compiler version etc. > > Hi Michael, Hi Zumeng, > Yeah, it seems it was not a good explanation, I'll explain more this time: > > 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer > to sys_call_table rather than sys_call_table self in assemble level. No it's not a pointer. A pointer is a location in memory that contains the address of another location in memory. > arch/powerpc/kernel/systbl.S > 47 .globl sys_call_table <--- see here > 48 sys_call_table: Which gives us a .o that looks like: : 0: R_PPC64_ADDR64 sys_restart_syscall 8: R_PPC64_ADDR64 sys_restart_syscall 10: R_PPC64_ADDR64 sys_exit 18: R_PPC64_ADDR64 sys_exit ie. at the location in memory called sys_call_table we have *the contents of the syscall table*. We do not have *the address* of the syscall table. You can also see in the System.map: c0bb0798 R sys_call_table c0bb1e58 r cache_type_info ie. sys_call_table occupies 5824 bytes. If it was a pointer it would only occupy 8 bytes. Compare to SYS_CALL_TABLE, which *is* a pointer. c1172bf8 d SYS_CALL_TABLE c1172c00 d exception_marker Note, 8 bytes. Finally if you look at a running system using xmon: 0:mon> d $sys_call_table c08f0798 c00a85a0 c00a85a0 || c08f07a8 c0099b40 c0099b40 |...@...@| 0:mon> la c00a85a0 c00a85a0: .sys_restart_syscall+0x0/0x40 0:mon> la c0099b40 c0099b40: .SyS_exit+0x0/0x20 0:mon> d $SYS_CALL_TABLE c0ec68f8 c08f0798 7265677368657265 |regshere| ^ this is the address of sys_call_table As another example, see hcall_real_table, which is basically identical, and is also declared as an array in C. > 3. What I have seen in 3.14.x kernel, > == > And so far, no more difference to 4.x kernel from me about this part if > I'm right. > > *) With 1028ccf5 > > perf list|grep -i syscall got me nothing. > > > *) Without 1028ccf5 > root@localhost:~# perf list|grep -i syscall >syscalls:sys_enter_socket [Tracepoint event] >syscalls:sys_exit_socket [Tracepoint event] >syscalls:sys_enter_socketpair [Tracepoint event] >syscalls:sys_exit_socketpair [Tracepoint event] >syscalls:sys_enter_bind[Tracepoint event] >syscalls:sys_exit_bind [Tracepoint event] >syscalls:sys_enter_listen [Tracepoint event] >syscalls:sys_exit_listen [Tracepoint event] >... ... I don't know why that's happening. Please just test 4.2-rc2 for now, so that there are not too many variables. Assuming you have CONFIG_FTRACE_SYSCALLS=y, you can see the tracepoints in debugfs with: $ ls -la /sys/kernel/debug/tracing/events/syscalls total 0 drwxr-xr-x 596 root root 0 Jul 17 13:11 . drwxr-xr-x 45 root root 0 Jul 17 13:11 .. -rw-r--r-- 1 root root 0 Jul 17 13:33 enable -rw-r--r-- 1 root root 0 Jul 17 13:11 filter drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_accept drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_accept4 drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_access drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_add_key ... cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3 2/2] powerpc/kexec: Reset HILE before entering target kernel
On Fri, 2015-07-17 at 11:53 +1000, Benjamin Herrenschmidt wrote: > On Fri, 2015-07-10 at 15:19 +1000, Samuel Mendoza-Jonas wrote: > > +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_POWERNV) > > + li r3,(FW_FEATURE_OPAL >> 16) > > + rldicr r3,r3,16,63 > > + and.r3,r3,r26 > > + cmpwi r3,0 > > + beq 99f > > If FW_FEATRURE_OPAL is 0x8000 then the li will sign extend. > > The rldicr has a mask of all F's so it will keep all the bits you > don't care about. ../.. Even better, you should be able to just do it all in C in pnv_kexec_cpu_down(), after we wait for secondaries to be in OPAL. At that point interrupts are already off, so it should be all good. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: BUG: perf error on syscalls for powerpc64.
Excerpts from Sukadev Bhattiprolu's message of 2015-07-17 11:51:04 +1000: > Are you seeing this on big-endian or little-endian system? > > IIRC, I saw the opposite behavior on an LE system a few months ago. > i.e. without 1028ccf5, 'perf listf|grep syscall' failed. > > Applying 1028ccf5, seemed to fix it. You could be on to something there - IIRC the ABI was changed for LE to remove the dot symbols. Might be worth testing on both. Cheers, -Ian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3 2/2] powerpc/kexec: Reset HILE before entering target kernel
On Fri, 2015-07-10 at 15:19 +1000, Samuel Mendoza-Jonas wrote: > +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_POWERNV) > + li r3,(FW_FEATURE_OPAL >> 16) > + rldicr r3,r3,16,63 > + and.r3,r3,r26 > + cmpwi r3,0 > + beq 99f If FW_FEATRURE_OPAL is 0x8000 then the li will sign extend. The rldicr has a mask of all F's so it will keep all the bits you don't care about. So together, you'll get compares happening on bits above the 16 you care about that might change the result of your comparison incorrectly. Since FW_FEATURE_* bits aren't ABI, they can change, so we don't want to impose a constraint on them. Thus I would recommend using an rdlicl r3,r3,16,48 (aka srdi r3,r3,48) instead which is going to clear all bits above 0x. Now, that being said, FW_FEATURE_* can be 64-bit and this isn't perf critical so why not just load the full 64-bit constant into r3 and be done with it ? There's a macro to do that: LOAD_REG_IMMEDIATE(r3,FW_FEATURE_OPAL) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: BUG: perf error on syscalls for powerpc64.
Zumeng Chen [zumeng.c...@gmail.com] wrote: | 3. What I have seen in 3.14.x kernel, | == | And so far, no more difference to 4.x kernel from me about this part if | I'm right. | | *) With 1028ccf5 | | perf list|grep -i syscall got me nothing. | | | *) Without 1028ccf5 | root@localhost:~# perf list|grep -i syscall |syscalls:sys_enter_socket [Tracepoint event] |syscalls:sys_exit_socket [Tracepoint event] |syscalls:sys_enter_socketpair [Tracepoint event] |syscalls:sys_exit_socketpair [Tracepoint event] |syscalls:sys_enter_bind[Tracepoint event] |syscalls:sys_exit_bind [Tracepoint event] |syscalls:sys_enter_listen [Tracepoint event] |syscalls:sys_exit_listen [Tracepoint event] |... ... Are you seeing this on big-endian or little-endian system? IIRC, I saw the opposite behavior on an LE system a few months ago. i.e. without 1028ccf5, 'perf listf|grep syscall' failed. Applying 1028ccf5, seemed to fix it. Sukadev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: BUG: perf error on syscalls for powerpc64.
On 2015年07月16日 17:04, Michael Ellerman wrote: > On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote: >> Hi All, >> >> 1028ccf5 did a change for sys_call_table from a pointer to an array of >> unsigned long, I think it's not proper, here is my reason: >> >> sys_call_table defined as a label in assembler should be pointer array >> rather than an array as described in 1028ccf5. If we defined it as an >> array, then arch_syscall_addr will return the address of sys_call_table[], >> actually the content of sys_call_table[] is demanded by arch_syscall_addr. >> so 'perf list' will ignore all syscalls since find_syscall_meta will >> return null >> in init_ftrace_syscalls because of the wrong arch_syscall_addr. >> >> Did I miss something, or Gcc compiler has done something newer ? > Hi Zumeng, > > It works for me with the code as it is in mainline. > > I don't quite follow your explanation, so if you're seeing a bug please send > some information about what you're actually seeing. And include the > disassembly > of arch_syscall_addr() and your compiler version etc. Hi Michael, Yeah, it seems it was not a good explanation, I'll explain more this time: 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer to sys_call_table rather than sys_call_table self in assemble level. arch/powerpc/kernel/systbl.S 47 .globl sys_call_table <--- see here 48 sys_call_table: So if you want to exclaim sys_call_table as array, then I think it's very clear what we'll get when we do sys_call_table[i]. 2. Disassemble codes difference of arch_syscall_addr with or without 1028ccf5 *) With 1028ccf5 - Dump of assembler code for function arch_syscall_addr: 522{ 523return (unsigned long)sys_call_table[nr]; 0xc0df53d4 <+0>:addis r10,r2,-13 0xc0df53d8 <+4>:addir9,r10,3488 0xc0df53dc <+8>:rldicr r3,r3,3,60 524} 0xc0df53e0 <+12>:ldx r3,r9,r3 0xc0df53e4 <+16>:blr *) Without 1028ccf5 --- Dump of assembler code for function arch_syscall_addr: 522{ 523return (unsigned long)sys_call_table[nr]; 0xc0df53d0 <+0>:addis r10,r2,-13 0xc0df53d4 <+4>:addir9,r10,3488 0xc0df53d8 <+8>:rldicr r3,r3,3,60 0xc0df53dc <+12>:ld r9,0(r9) <--only this is different 524} 0xc0df53e0 <+16>:ldx r3,r9,r3 0xc0df53e4 <+20>:blr End of assembler dump. 3. What I have seen in 3.14.x kernel, == And so far, no more difference to 4.x kernel from me about this part if I'm right. *) With 1028ccf5 perf list|grep -i syscall got me nothing. *) Without 1028ccf5 root@localhost:~# perf list|grep -i syscall syscalls:sys_enter_socket [Tracepoint event] syscalls:sys_exit_socket [Tracepoint event] syscalls:sys_enter_socketpair [Tracepoint event] syscalls:sys_exit_socketpair [Tracepoint event] syscalls:sys_enter_bind[Tracepoint event] syscalls:sys_exit_bind [Tracepoint event] syscalls:sys_enter_listen [Tracepoint event] syscalls:sys_exit_listen [Tracepoint event] ... ... Cheers, Zumeng > > cheers > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc/powernv: Double VF BAR size for compound PE
On Fri, Jul 17, 2015 at 10:14:43AM +1000, Gavin Shan wrote: >When VF BAR size is equal to 128MB or bigger than that, we extend >the corresponding PF's IOV BAR to cover number of total VFs supported >by the PF. Otherwise, we extend the PF's IOV BAR to cover 256 VFs. >For the former case, we have to create compound PE, which includes >4 VFs. Those 4 VFs included in the compound PE can't be passed through >to different guests, which isn't good. > >The gate (128MB) was choosen based on the assumption that each PHB >supports 64GB M64 space and one PF's IOV BAR can be extended to be >as huge as 1/4 of that, which is 16GB. However, the IOV BAR can be >extended to half of PHB's M64 window when the PF seats behind the >root port. In that case, the gate can be enlarged to be 256MB to >avoid compound PE as we can. > >Signed-off-by: Gavin Shan >--- > arch/powerpc/platforms/powernv/pci-ioda.c | 21 - > 1 file changed, 16 insertions(+), 5 deletions(-) > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >b/arch/powerpc/platforms/powernv/pci-ioda.c >index 6ec62b9..5b2e88f 100644 >--- a/arch/powerpc/platforms/powernv/pci-ioda.c >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >@@ -2721,6 +2721,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >pci_dev *pdev) > struct resource *res; > int i; > resource_size_t size; >+ resource_size_t limit; > struct pci_dn *pdn; > int mul, total_vfs; > >@@ -2730,6 +2731,18 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >pci_dev *pdev) > hose = pci_bus_to_host(pdev->bus); > phb = hose->private_data; > >+ /* >+ * When the PF seats behind root port, the IOV BAR can >+ * consume half of the PHB's M64 window. Otherwise, >+ * 1/4 of the PHB's M64 window can be consumed to the >+ * maximal degree. >+ */ >+ if (!pci_is_root_bus(pdev->bus) && >+ pci_is_root_bus(pdev->bus->self->bus)) >+ limit = 128; >+ else >+ limit = 256; >+ I sent it too fast. The limit should be reversed: 256 when PF seats behind the root port. Otherwise, it should be 128. I will send follow-up v2 after waiting for couple of days in case there are some comments for this revision. > pdn = pci_get_pdn(pdev); > pdn->vfs_expanded = 0; > >@@ -2748,11 +2761,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >pci_dev *pdev) > } > > size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); >- >- /* bigger than 64M */ >- if (size > (1 << 26)) { >- dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size >is bigger than 64M, roundup power2\n", >- i, res); >+ if (size >= (limit * 0x10)) { >+ dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size >is bigger than %lldMB, roundup power2\n", >+ i, res, limit); > pdn->m64_per_iov = M64_PER_IOV; > mul = roundup_pow_of_two(total_vfs); > break; Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/2] powerpc/powernv: Avoid compound PE for VF
When the VF BAR size is equal to 128MB or bigger than that, the IOV BAR is extended to cover number of maximal VFs supported by the PF, not 256. Also, one PHB's M64 BAR is picked to cover VF BARs for 4 continous VFs, but the PHB's M64 BAR is configured as being owned by single PE. Eventually, those 4 VFs have 4 separate PEs from the perspective of PCI config or DMA, but single shared PE from MMIO's perspective. Once we have compound PE, all those 4 VFs included in the compound PE can't be passed to separate guests with VFIO infrastructure. The above gate (128MB) was choosen based on the assumption: one IOV BAR can consume 1/4 of PHB's M64 window, which is 16GB. However, it can consume as much as half of that (32GB) when the PF seats behind the root port. Accordingly, the gate can be doubled to be 256MB in order to avoid compound PE as we can. Gavin Shan (2): powerpc/powernv: Fix alignment for IOV BAR powerpc/powernv: Double VF BAR size for compound PE arch/powerpc/platforms/powernv/pci-ioda.c | 56 +-- 1 file changed, 45 insertions(+), 11 deletions(-) -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc/powernv: Double VF BAR size for compound PE
When VF BAR size is equal to 128MB or bigger than that, we extend the corresponding PF's IOV BAR to cover number of total VFs supported by the PF. Otherwise, we extend the PF's IOV BAR to cover 256 VFs. For the former case, we have to create compound PE, which includes 4 VFs. Those 4 VFs included in the compound PE can't be passed through to different guests, which isn't good. The gate (128MB) was choosen based on the assumption that each PHB supports 64GB M64 space and one PF's IOV BAR can be extended to be as huge as 1/4 of that, which is 16GB. However, the IOV BAR can be extended to half of PHB's M64 window when the PF seats behind the root port. In that case, the gate can be enlarged to be 256MB to avoid compound PE as we can. Signed-off-by: Gavin Shan --- arch/powerpc/platforms/powernv/pci-ioda.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 6ec62b9..5b2e88f 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2721,6 +2721,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) struct resource *res; int i; resource_size_t size; + resource_size_t limit; struct pci_dn *pdn; int mul, total_vfs; @@ -2730,6 +2731,18 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) hose = pci_bus_to_host(pdev->bus); phb = hose->private_data; + /* +* When the PF seats behind root port, the IOV BAR can +* consume half of the PHB's M64 window. Otherwise, +* 1/4 of the PHB's M64 window can be consumed to the +* maximal degree. +*/ + if (!pci_is_root_bus(pdev->bus) && + pci_is_root_bus(pdev->bus->self->bus)) + limit = 128; + else + limit = 256; + pdn = pci_get_pdn(pdev); pdn->vfs_expanded = 0; @@ -2748,11 +2761,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) } size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); - - /* bigger than 64M */ - if (size > (1 << 26)) { - dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n", -i, res); + if (size >= (limit * 0x10)) { + dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than %lldMB, roundup power2\n", +i, res, limit); pdn->m64_per_iov = M64_PER_IOV; mul = roundup_pow_of_two(total_vfs); break; -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] powerpc/powernv: Fix alignment for IOV BAR
IOV BAR is extended to cover 256 VFs or number of supported VFs, the alignment is the IOV BAR size, which is usually huge and bigger than M64 segment size (256MB). That means the IOV BAR is expected to be assigned to the beginning of PHB's M64 window prior to other M64 BARs in PCI devices that are hooked to the PCI bus behind root port. Other M64 BARs actually need M64 segment size other than the huge IOV BAR size as the required alignment. The patch returns M64 segment size if IOV BAR size is bigger than it when the PF seats behind root port. Otherwise, the IOV BAR size is returned as before. It will save lots of consumed M64 space, which would be 16GB in some cases as I observed. Signed-off-by: Gavin Shan --- arch/powerpc/platforms/powernv/pci-ioda.c | 35 +-- 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index fdafbac..6ec62b9 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2961,16 +2961,39 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus, static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev, int resno) { + struct pci_controller *hose = pci_bus_to_host(pdev->bus); + struct pnv_phb *phb = hose->private_data; struct pci_dn *pdn = pci_get_pdn(pdev); - resource_size_t align, iov_align; + resource_size_t align; + resource_size_t m64_segsz = phb->ioda.m64_segsize; - iov_align = resource_size(&pdev->resource[resno]); - if (iov_align) - return iov_align; + /* +* When PF is the only one adapter under the PHB, the IOV BAR +* is expected to be assigned prior to any other M64 BARs. To +* have M64 segment size, which is usually smaller than IOV +* BAR size, as the alignment to avoid wasting M64 space to +* satisfy the alignment required by other M64 BARs. +*/ + align = resource_size(&pdev->resource[resno]); + if (align) { + if (!pci_bus_is_root(pdev->bus) && + pci_bus_is_root(pdev->bus->self->bus)) + align = min(align, m64_segsz); + else + align = max(align, m64_segsz); + + return align; + } align = pci_iov_resource_size(pdev, resno); - if (pdn->vfs_expanded) - return pdn->vfs_expanded * align; + if (pdn->vfs_expanded) { + align = pdn->vfs_expanded * align; + if (!pci_bus_is_root(pdev->bus) && + pci_bus_is_root(pdev->bus->self->bus)) + align = min(align, m64_segsz); + else + align = max(align, m64_segsz); + } return align; } -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/fsl-booke-64: Allow booting from the secondary thread
According to Yuantian, this is needed for forthcoming power management patches -- IIRC, for resuming from certain deep sleep states. This also allows SMP kernels to work as kdump crash kernels. While crash kernels don't really need to be SMP, this prevents things from breaking if a user does it anyway (which is not something you want to only find out once the main kernel has crashed in the field, especially if whether it works or not depends on which cpu crashed). Signed-off-by: Scott Wood Cc: Tang Yuantian --- I'm sending this before the rest of the kexec patches, since Yuantian needs it as a prerequisite. Yuantian, if you explain the issue more I can improve the commit message of this patch. --- arch/powerpc/platforms/85xx/smp.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index b8b8216..c2ded03 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -173,15 +173,22 @@ static inline u32 read_spin_table_addr_l(void *spin_table) static void wake_hw_thread(void *info) { void fsl_secondary_thread_init(void); - unsigned long imsr1, inia1; + unsigned long imsr, inia; int nr = *(const int *)info; - imsr1 = MSR_KERNEL; - inia1 = *(unsigned long *)fsl_secondary_thread_init; - - mttmr(TMRN_IMSR1, imsr1); - mttmr(TMRN_INIA1, inia1); - mtspr(SPRN_TENS, TEN_THREAD(1)); + imsr = MSR_KERNEL; + inia = *(unsigned long *)fsl_secondary_thread_init; + + if (cpu_thread_in_core(nr) == 0) { + /* For when we boot on a secondary thread with kdump */ + mttmr(TMRN_IMSR0, imsr); + mttmr(TMRN_INIA0, inia); + mtspr(SPRN_TENS, TEN_THREAD(0)); + } else { + mttmr(TMRN_IMSR1, imsr); + mttmr(TMRN_INIA1, inia); + mtspr(SPRN_TENS, TEN_THREAD(1)); + } smp_generic_kick_cpu(nr); } @@ -224,6 +231,12 @@ static int smp_85xx_kick_cpu(int nr) smp_call_function_single(primary, wake_hw_thread, &nr, 0); return 0; + } else if (cpu_thread_in_core(boot_cpuid) != 0 && + cpu_first_thread_sibling(boot_cpuid) == nr) { + if (WARN_ON_ONCE(!cpu_has_feature(CPU_FTR_SMT))) + return -ENOENT; + + smp_call_function_single(boot_cpuid, wake_hw_thread, &nr, 0); } #endif -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 02/12] powerpc/kernel: Switch to using MAX_ERRNO
On Wed, 2015-07-15 at 17:37 +1000, Michael Ellerman wrote: > Currently on powerpc we have our own #define for the highest (negative) > errno value, called _LAST_ERRNO. This is defined to be 516, for reasons > which are not clear. > > The generic code, and x86, use MAX_ERRNO, which is defined to be 4095. > > In particular seccomp uses MAX_ERRNO to restrict the value that a > seccomp filter can return. > > Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp > tracer wanting to return 600, expecting it to be seen as an error, would > instead find on powerpc that userspace sees a successful syscall with a > return value of 600. > > To avoid this inconsistency, switch powerpc to use MAX_ERRNO. > > We are somewhat confident that generic syscalls that can return a > non-error value above negative MAX_ERRNO have already been updated to > use force_successful_syscall_return(). > > I have also checked all the powerpc specific syscalls, and believe that > none of them expect to return a non-error value between -MAX_ERRNO and > -516. So this change should be safe ... > > Signed-off-by: Michael Ellerman Acked-by: Benjamin Herrenschmidt > --- > arch/powerpc/include/uapi/asm/errno.h | 2 -- > arch/powerpc/kernel/entry_32.S| 3 ++- > arch/powerpc/kernel/entry_64.S| 5 +++-- > 3 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/include/uapi/asm/errno.h > b/arch/powerpc/include/uapi/asm/errno.h > index 8c145fd17d86..e8b6b5f7de7c 100644 > --- a/arch/powerpc/include/uapi/asm/errno.h > +++ b/arch/powerpc/include/uapi/asm/errno.h > @@ -6,6 +6,4 @@ > #undef EDEADLOCK > #define EDEADLOCK 58 /* File locking deadlock error */ > > -#define _LAST_ERRNO 516 > - > #endif /* _ASM_POWERPC_ERRNO_H */ > diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S > index 46fc0f4d8982..67ecdf61f4e3 100644 > --- a/arch/powerpc/kernel/entry_32.S > +++ b/arch/powerpc/kernel/entry_32.S > @@ -20,6 +20,7 @@ > */ > > #include > +#include > #include > #include > #include > @@ -354,7 +355,7 @@ ret_from_syscall: > SYNC > MTMSRD(r10) > lwz r9,TI_FLAGS(r12) > - li r8,-_LAST_ERRNO > + li r8,-MAX_ERRNO > andi. > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > bne-syscall_exit_work > cmplw 0,r3,r8 > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 0796c487d3db..8292581a42f1 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -19,6 +19,7 @@ > */ > > #include > +#include > #include > #include > #include > @@ -207,7 +208,7 @@ system_call: /* label this so stack > traces look sane */ > #endif /* CONFIG_PPC_BOOK3E */ > > ld r9,TI_FLAGS(r12) > - li r11,-_LAST_ERRNO > + li r11,-MAX_ERRNO > andi. > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > bne-syscall_exit_work > cmpld r3,r11 > @@ -279,7 +280,7 @@ syscall_exit_work: > beq+0f > REST_NVGPRS(r1) > b 2f > -0: cmpld r3,r11 /* r10 is -LAST_ERRNO */ > +0: cmpld r3,r11 /* r11 is -MAX_ERRNO */ > blt+1f > andi. r0,r9,_TIF_NOERROR > bne-1f ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 01/12] powerpc/kernel: Get pt_regs from r9 before calling do_syscall_trace_enter()
On Wed, 2015-07-15 at 17:37 +1000, Michael Ellerman wrote: > To call do_syscall_trace_enter() we need pt_regs in r3, but we don't need > to recalculate it based on r1, it's already in r9. > > Signed-off-by: Michael Ellerman Is there any performance difference ? I find the addi a bit more robust in case the code gets moved around or the "previous" code gets changed to either not use r9 or clobber it, which would have the potential to introduce a subtle bug ... Ben. > --- > arch/powerpc/kernel/entry_64.S | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 579e0f9a2d57..0796c487d3db 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -243,7 +243,9 @@ syscall_error: > /* Traced system call support */ > syscall_dotrace: > bl save_nvgprs > - addir3,r1,STACK_FRAME_OVERHEAD > + > + /* Get pt_regs into r3 */ > + mr r3, r9 > bl do_syscall_trace_enter > /* >* Restore argument registers possibly just changed. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface
On Tue, Jul 14, 2015 at 08:01:54PM -0700, Sukadev Bhattiprolu wrote: > +/* > + * Use the transaction interface to read the group of events in @leader. > + * PMUs like the 24x7 counters in Power, can use this to queue the events > + * in the ->read() operation and perform the actual read in ->commit_txn. > + * > + * Other PMUs can ignore the ->start_txn and ->commit_txn and read each > + * PMU directly in the ->read() operation. > + */ > +static int perf_event_read_group(struct perf_event *leader) > +{ > + int ret; > + struct perf_event *sub; > + struct pmu *pmu; > + > + pmu = leader->pmu; > + > + pmu->start_txn(pmu, PERF_PMU_TXN_READ); > + > + perf_event_read(leader); There should be a lockdep assert with that list iteration. > + list_for_each_entry(sub, &leader->sibling_list, group_entry) > + perf_event_read(sub); > + > + ret = pmu->commit_txn(pmu); > + > + return ret; > +} ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 5/8] perf: Split perf_event_read_value()
Peter Zijlstra [pet...@infradead.org] wrote: | On Tue, Jul 14, 2015 at 08:01:52PM -0700, Sukadev Bhattiprolu wrote: | > Move the part of perf_event_read_value() that computes the event | > counts and event times into a new function, perf_event_compute(). | > | > This would allow us to call perf_event_compute() independently. | > | > Signed-off-by: Sukadev Bhattiprolu | > | > Changelog[v3] | > Rather than move perf_event_read() into callers and then | > rename, just move the computations into a separate function | > (redesign to address comment from Peter Zijlstra). | > --- | | Changelog[] bits go here, below the '---' where they get discarded. Sorry. Will fix it. | | > kernel/events/core.c | 37 - | > 1 file changed, 24 insertions(+), 13 deletions(-) | > | > diff --git a/kernel/events/core.c b/kernel/events/core.c | > index 44fb89d..b1e9a42 100644 | > --- a/kernel/events/core.c | > +++ b/kernel/events/core.c | > @@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct file *file) | > return 0; | > } | > | > +static u64 perf_event_compute(struct perf_event *event, u64 *enabled, | > + u64 *running) | | This is a horrible name, 'compute' what? We are aggregating event counts and time for children. Would perf_event_aggregate() or perf_event_aggregate_children() be better? | | > +{ | > + struct perf_event *child; | > + u64 total; | > + | > + total = perf_event_count(event); | > + | > + *enabled += event->total_time_enabled + | > + atomic64_read(&event->child_total_time_enabled); | > + *running += event->total_time_running + | > + atomic64_read(&event->child_total_time_running); | > + | | lockdep_assert_held(&event->child_mutex); OK. Thanks for the comments. Sukadev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces
Peter Zijlstra [pet...@infradead.org] wrote: | On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote: | > @@ -1604,6 +1613,12 @@ static void power_pmu_start_txn(struct pmu *pmu) | > static void power_pmu_cancel_txn(struct pmu *pmu) | > { | > struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events); | > + int txn_flags; | > + | > + txn_flags = cpuhw->txn_flags; | > + cpuhw->txn_flags = 0; | > + if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD) | > + return; | | That seems, unintentional? ;-) Argh. Thanks for catching it. Sukadev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling
On 05/27/2015 01:56 AM, Sam Bobroff wrote: > In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 > bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is > accessed as such. > > This patch corrects places where it is accessed as a 32 bit field by a > 64 bit kernel. In some cases this is via a 32 bit load or store > instruction which, depending on endianness, will cause either the > lower or upper 32 bits to be missed. In another case it is cast as a > u32, causing the upper 32 bits to be cleared. > > This patch corrects those places by extending the access methods to > 64 bits. > > Signed-off-by: Sam Bobroff Reviewed-by: Thomas Huth Actually this patch also fixes a bug that SLOF sometimes crashes when a vCPU gets kicked out of kernel mode (see the following URL for details: https://bugzilla.redhat.com/show_bug.cgi?id=1178502 ), and I've just tested that this bug does not occur with this patch anymore, so also: Tested-by: Thomas Huth ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 5/8] perf: Split perf_event_read_value()
On Tue, Jul 14, 2015 at 08:01:52PM -0700, Sukadev Bhattiprolu wrote: > Move the part of perf_event_read_value() that computes the event > counts and event times into a new function, perf_event_compute(). > > This would allow us to call perf_event_compute() independently. > > Signed-off-by: Sukadev Bhattiprolu > > Changelog[v3] > Rather than move perf_event_read() into callers and then > rename, just move the computations into a separate function > (redesign to address comment from Peter Zijlstra). > --- Changelog[] bits go here, below the '---' where they get discarded. > kernel/events/core.c | 37 - > 1 file changed, 24 insertions(+), 13 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 44fb89d..b1e9a42 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct > file *file) > return 0; > } > > +static u64 perf_event_compute(struct perf_event *event, u64 *enabled, > + u64 *running) This is a horrible name, 'compute' what? > +{ > + struct perf_event *child; > + u64 total; > + > + total = perf_event_count(event); > + > + *enabled += event->total_time_enabled + > + atomic64_read(&event->child_total_time_enabled); > + *running += event->total_time_running + > + atomic64_read(&event->child_total_time_running); > + lockdep_assert_held(&event->child_mutex); > + list_for_each_entry(child, &event->child_list, child_list) { > + perf_event_read(child); > + total += perf_event_count(child); > + *enabled += child->total_time_enabled; > + *running += child->total_time_running; > + } > + > + return total; > +} > + > /* > * Remove all orphanes events from the context. > */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling
On 27/05/2015 01:56, Sam Bobroff wrote: > In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 > bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is > accessed as such. > > This patch corrects places where it is accessed as a 32 bit field by a > 64 bit kernel. In some cases this is via a 32 bit load or store > instruction which, depending on endianness, will cause either the > lower or upper 32 bits to be missed. In another case it is cast as a > u32, causing the upper 32 bits to be cleared. > > This patch corrects those places by extending the access methods to > 64 bits. > > Signed-off-by: Sam Bobroff > --- > > v3: > Adjust booke set/get xer to match book3s. > > v2: > > Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit. > > arch/powerpc/include/asm/kvm_book3s.h |4 ++-- > arch/powerpc/include/asm/kvm_book3s_asm.h |2 +- > arch/powerpc/include/asm/kvm_booke.h |4 ++-- > arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- > arch/powerpc/kvm/book3s_segment.S |4 ++-- > 5 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index b91e74a..05a875a 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) > return vcpu->arch.cr; > } > > -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) > +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) > { > vcpu->arch.xer = val; > } > > -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) > +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) > { > return vcpu->arch.xer; > } > diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h > b/arch/powerpc/include/asm/kvm_book3s_asm.h > index 5bdfb5d..c4ccd2d 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_asm.h > +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h > @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu { > bool in_use; > ulong gpr[14]; > u32 cr; > - u32 xer; > + ulong xer; > ulong ctr; > ulong lr; > ulong pc; > diff --git a/arch/powerpc/include/asm/kvm_booke.h > b/arch/powerpc/include/asm/kvm_booke.h > index 3286f0d..bc6e29e 100644 > --- a/arch/powerpc/include/asm/kvm_booke.h > +++ b/arch/powerpc/include/asm/kvm_booke.h > @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) > return vcpu->arch.cr; > } > > -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) > +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) > { > vcpu->arch.xer = val; > } > > -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) > +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) > { > return vcpu->arch.xer; > } > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index 4d70df2..d75be59 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > blt hdec_soon > > ld r6, VCPU_CTR(r4) > - lwz r7, VCPU_XER(r4) > + ld r7, VCPU_XER(r4) > > mtctr r6 > mtxer r7 > @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > mfctr r3 > mfxer r4 > std r3, VCPU_CTR(r9) > - stw r4, VCPU_XER(r9) > + std r4, VCPU_XER(r9) > > /* If this is a page table miss then see if it's theirs or ours */ > cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE > @@ -1675,7 +1675,7 @@ kvmppc_hdsi: > bl kvmppc_msr_interrupt > fast_interrupt_c_return: > 6: ld r7, VCPU_CTR(r9) > - lwz r8, VCPU_XER(r9) > + ld r8, VCPU_XER(r9) > mtctr r7 > mtxer r8 > mr r4, r9 > diff --git a/arch/powerpc/kvm/book3s_segment.S > b/arch/powerpc/kvm/book3s_segment.S > index acee37c..ca8f174 100644 > --- a/arch/powerpc/kvm/book3s_segment.S > +++ b/arch/powerpc/kvm/book3s_segment.S > @@ -123,7 +123,7 @@ no_dcbz32_on: > PPC_LL r8, SVCPU_CTR(r3) > PPC_LL r9, SVCPU_LR(r3) > lwz r10, SVCPU_CR(r3) > - lwz r11, SVCPU_XER(r3) > + PPC_LL r11, SVCPU_XER(r3) > > mtctr r8 > mtlrr9 > @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) > mfctr r8 > mflrr9 > > - stw r5, SVCPU_XER(r13) > + PPC_STL r5, SVCPU_XER(r13) > PPC_STL r6, SVCPU_FAULT_DAR(r13) > stw r7, SVCPU_FAULT_DSISR(r13) > PPC_STL r8, SVCPU_CTR(r13) > Reviewed-by: Laurent Vivier ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces
On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote: > +DEFINE_PER_CPU(int, nop_txn_flags); > + > +static int nop_txn_flags_get_and_clear(void) > +{ > + int *flagsp; > + int flags; > + > + flagsp = &get_cpu_var(nop_txn_flags); > + > + flags = *flagsp; > + *flagsp = 0; > + > + put_cpu_var(nop_txn_flags); > + > + return flags; > +} > + > +static void nop_txn_flags_set(int flags) > +{ > + int *flagsp; > + > + flagsp = &get_cpu_var(nop_txn_flags); > + *flagsp = flags; > + put_cpu_var(nop_txn_flags); > +} That's really horrible, see below: > +static void perf_pmu_start_txn(struct pmu *pmu, int flags) > { __this_cpu_write(nop_txn_flags, flags); > + > + if (flags & ~PERF_PMU_TXN_ADD) > + return; > + > perf_pmu_disable(pmu); > } > > static int perf_pmu_commit_txn(struct pmu *pmu) > { int flags = __this_cpu_read(nop_txn_flags); __this_cpu_write(nop_txn_flags, 0); > + > + if (flags & ~PERF_PMU_TXN_ADD) > + return 0; > + > perf_pmu_enable(pmu); > return 0; > } > > static void perf_pmu_cancel_txn(struct pmu *pmu) > { int flags = __this_cpu_read(nop_txn_flags); __this_cpu_write(nop_txn_flags, 0); > + > + if (flags & ~PERF_PMU_TXN_ADD) > + return; > + > perf_pmu_enable(pmu); > } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/dts: Add and fix 1588 timer node for eTSEC
On Wed, 2015-07-15 at 21:37 -0500, Lu Yangbo-B47093 wrote: > Any comments? > Thanks. Sorry, I must have missed this on my last time through the patch queue. I see you've decimalized the fiper and max-adj properties, which is good... but does it really make sense for tmr-add? I'm not familiar with what this value represents, but the numbers look more natural as hex (e.g. 0xaaab versus 2863311531). > > diff --git a/arch/powerpc/boot/dts/p2020rdb-pc.dtsi > > b/arch/powerpc/boot/dts/p2020rdb-pc.dtsi > > index c21d1c7..363172d 100644 > > --- a/arch/powerpc/boot/dts/p2020rdb-pc.dtsi > > +++ b/arch/powerpc/boot/dts/p2020rdb-pc.dtsi > > @@ -215,12 +215,12 @@ > > }; > > > > ptp_clock@24e00{ > > - fsl,tclk-period = <5>; > > - fsl,tmr-prsc = <200>; > > - fsl,tmr-add = <0xCCCD>; > > - fsl,tmr-fiper1 = <0x3B9AC9FB>; > > - fsl,tmr-fiper2 = <0x0001869B>; > > - fsl,max-adj = <24999>; > > + fsl,tclk-period = <5>; > > + fsl,tmr-prsc= <2>; > > + fsl,tmr-add = <2863311531>; > > + fsl,tmr-fiper1 = <5>; > > + fsl,tmr-fiper2 = <0>; > > + fsl,max-adj = <2>; > > }; And here, you're changing the value of fsl,tmr-add and fsl,max-adj. Why? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/corenet: enable eSDHC
OK, thanks. Assuming no similar issues when testing, I'll apply this patch the next time I do a batch of patch application. Any thoughts regarding better error handling? -Scott On Wed, 2015-07-15 at 21:37 -0500, Lu Yangbo-B47093 wrote: > Hi Scott, > > Now the patch below has been merged on > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git. > This issue should have been resolved. > Thanks. > > > commit 5fd26c7ecb32082745b0bd33c8e35badd1cb5a91 > Author: Ulf Hansson > Date: Fri Jun 5 11:40:08 2015 +0200 > > mmc: sdhci: Restore behavior while creating OCR mask > > > > -Original Message- > > From: Wood Scott-B07421 > > Sent: Tuesday, June 09, 2015 4:31 AM > > To: Lu Yangbo-B47093 > > Cc: linuxppc-dev@lists.ozlabs.org; linux-...@vger.kernel.org > > Subject: Re: powerpc/corenet: enable eSDHC > > > > On Mon, 2015-06-08 at 05:12 -0500, Lu Yangbo-B47093 wrote: > > > Thanks a lot, Scott. > > > And now a patch was merged on > > > git://git.linaro.org/people/ulf.hansson/mmc.git next branch to fix > > > this issue. > > > It should be no problem. > > > > Assuming that patch fixes it and gets pulled for 4.2, this config patch > > can go in for 4.3. > > > > That said, it would be nice if, apart from fixing the problem itself, the > > MMC code failed more gracefully instead of endlessly repeating and > > filling up the log/console. > > > > -Scott > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces
On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote: > @@ -1604,6 +1613,12 @@ static void power_pmu_start_txn(struct pmu *pmu) > static void power_pmu_cancel_txn(struct pmu *pmu) > { > struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events); > + int txn_flags; > + > + txn_flags = cpuhw->txn_flags; > + cpuhw->txn_flags = 0; > + if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD) > + return; That seems, unintentional? ;-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 1/2] perf,kvm/ppc: Add kvm_perf.h for powerpc
On Thu, 2015-07-16 at 21:18 +0530, Hemant Kumar wrote: > To analyze the exit events with perf, we need kvm_perf.h to be added in > the arch/powerpc directory, where the kvm tracepoints needed to trace > the KVM exit events are defined. > > This patch adds "kvm_perf_book3s.h" to indicate that the tracepoints are > book3s specific. Generic "kvm_perf.h" then can just include > "kvm_perf_book3s.h". > > Signed-off-by: Hemant Kumar > --- > Changes: > - Not exporting the exit reasons compared to previous patchset (suggested > by Paul) > > arch/powerpc/include/uapi/asm/kvm_perf.h| 6 ++ > arch/powerpc/include/uapi/asm/kvm_perf_book3s.h | 14 ++ > 2 files changed, 20 insertions(+) > create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf.h > create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf_book3s.h > > diff --git a/arch/powerpc/include/uapi/asm/kvm_perf.h > b/arch/powerpc/include/uapi/asm/kvm_perf.h > new file mode 100644 > index 000..5ed2ff3 > --- /dev/null > +++ b/arch/powerpc/include/uapi/asm/kvm_perf.h > @@ -0,0 +1,6 @@ > +#ifndef _ASM_POWERPC_KVM_PERF_H > +#define _ASM_POWERPC_KVM_PERF_H > + > +#include > + > +#endif > diff --git a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h > b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h > new file mode 100644 > index 000..8c8d8c2 > --- /dev/null > +++ b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h > @@ -0,0 +1,14 @@ > +#ifndef _ASM_POWERPC_KVM_PERF_BOOK3S_H > +#define _ASM_POWERPC_KVM_PERF_BOOK3S_H > + > +#include > + > +#define DECODE_STR_LEN 20 > + > +#define VCPU_ID "vcpu_id" > + > +#define KVM_ENTRY_TRACE "kvm_hv:kvm_guest_enter" > +#define KVM_EXIT_TRACE "kvm_hv:kvm_guest_exit" > +#define KVM_EXIT_REASON "trap" > + > +#endif /* _ASM_POWERPC_KVM_PERF_BOOK3S_H */ Again, why is book3s stuff being presented via uapi as generic with generic symbol names? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board support
On Thu, 2015-07-16 at 04:34 -0500, Jain Priyanka-B32167 wrote: > > -Original Message- > From: Wood Scott-B07421 > Sent: Wednesday, July 15, 2015 11:17 PM > To: Jain Priyanka-B32167 > Cc: linuxppc-dev@lists.ozlabs.org > Subject: Re: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board > support > > On Wed, 2015-07-15 at 15:00 +0530, Priyanka Jain wrote: > > T1040D4RDB/T1042D4RDB are Freescale Reference Design Board which can > > support T1040/T1042 QorIQ Power Architecture™ processor respectively > > > > T1040D4RDB/T1042D4RDB board Overview > > - > > - SERDES Connections, 8 lanes supporting: > > - PCI > > - SGMII > > - SATA 2.0 > > - QSGMII(only for T1040D4RDB) > > - DDR Controller > > - Supports rates of up to 1600 MHz data-rate > > - Supports one DDR4 UDIMM > > -IFC/Local Bus > > - NAND flash: 1GB 8-bit NAND flash > > - NOR: 128MB 16-bit NOR Flash > > - Ethernet > > - Two on-board RGMII 10/100/1G ethernet ports. > > - PHY #0 remains powered up during deep-sleep > > - CPLD > > - Clocks > > - System and DDR clock (SYSCLK, “DDRCLK”) > > - SERDES clocks > > - Power Supplies > > - USB > > - Supports two USB 2.0 ports with integrated PHYs > > - Two type A ports with 5V@1.5Aperport. > > - SDHC > > - SDHC/SDXC connector > > - SPI > > - On-board 64MB SPI flash > > - I2C > > - Devices connected: EEPROM, thermal monitor, VID controller > > - Other IO > > - Two Serial ports > > - ProfiBus port > > > > Add support for T1040/T1042D4RDB board: > > -add device tree > > -Add entry in corenet_generic.c > > > > Signed-off-by: Priyanka Jain > > --- > > Changes for v2: > > Incorporated Scott's comments on device tree > > You didn't respond to the comments on the CPLD node. > [Priyanka] > T1042D4RDB, T1040D4RDB are derivatives of same board , CPLD is same for > both. > So, I have moved below node having compatible and reg field together in > t104xd4rdb.dtsi. > Is this fine? > cpld@3,0 { > compatible = "fsl,t1040d4rdb-cpld"; > reg = <3 0 0x300>; > }; If the CPLD image is exactly the same on both, this is fine. > > +i2c@118100{ > > + mux@77{ > > + compatible = "nxp,pca9546"; > > + reg = <0x77>; > > + #address-cells = <1>; > > + #size-cells = <0>; > > + }; > > + }; > > A mux with no nodes under it (and yet it has #address-cells/#size-cells)? > What is it multiplexing? > [Priyanka]: PCA9546 is i2c mux device , to which other i2c devices (up-to 8 > ) can be further connected on output channels > On T104xD4RDB, channel 0, 1, 3 line are connected to PEX device, Channel 2 > to hdmi interface (initialization is done in u-boot only), other channels > are grounded. So, as such Linux is not using the second level I2C devices > connected on this MUX device. So, I have not shown next level hierarchy. > Should I replace 'mux' with some other name? . Please suggest. The device tree describes the hardware, not just what Linux uses... but what I don't understand is why you describe the mux at all if you're not going to describe what goes underneath it. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] dt-bindings: powerpc: adapt mpc5121-psc document to reality
The drivers support MPC5125 additionally to MPC5121, and there is an spi mode that is also supported. Additionally some minor corrections are done. Signed-off-by: Uwe Kleine-König --- Hello, I sent a patch adding mpc5125 support to the mpc512x driver and Mark requested the new compatible to be documented. While at it I updated the document a bit more, and obviously the spi support for mpc5125 depends on my patch that isn't mainline yet. Best regards Uwe .../bindings/powerpc/fsl/mpc5121-psc.txt | 24 -- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt b/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt index 8832e8798912..647817527c88 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt @@ -6,14 +6,14 @@ PSC in UART mode For PSC in UART mode the needed PSC serial devices are specified by fsl,mpc5121-psc-uart nodes in the fsl,mpc5121-immr SoC node. Additionally the PSC FIFO -Controller node fsl,mpc5121-psc-fifo is requered there: +Controller node fsl,mpc5121-psc-fifo is required there: -fsl,mpc5121-psc-uart nodes +fsl,mpc512x-psc-uart nodes -- Required properties : - - compatible : Should contain "fsl,mpc5121-psc-uart" and "fsl,mpc5121-psc" - - cell-index : Index of the PSC in hardware + - compatible : Should contain "fsl,-psc-uart" and "fsl,-psc" + Supported s: mpc5121, mpc5125 - reg : Offset and length of the register set for the PSC device - interrupts : where a is the interrupt number of the PSC FIFO Controller and b is a field that represents an @@ -25,12 +25,21 @@ Recommended properties : - fsl,rx-fifo-size : the size of the RX fifo slice (a multiple of 4) - fsl,tx-fifo-size : the size of the TX fifo slice (a multiple of 4) +PSC in SPI mode +--- -fsl,mpc5121-psc-fifo node +Similar to the UART mode a PSC can be operated in SPI mode. The compatible used +for that is fsl,mpc5121-psc-spi. It requires a fsl,mpc5121-psc-fifo as well. +The required and recommended properties are identical to the +fsl,mpc5121-psc-uart nodes, just use spi instead of uart in the compatible +string. + +fsl,mpc512x-psc-fifo node - Required properties : - - compatible : Should be "fsl,mpc5121-psc-fifo" + - compatible : Should be "fsl,-psc-fifo" + Supported s: mpc5121, mpc5125 - reg : Offset and length of the register set for the PSC FIFO Controller - interrupts : where a is the interrupt number of the @@ -39,6 +48,9 @@ Required properties : - interrupt-parent : the phandle for the interrupt controller that services interrupts for this device. +Recommended properties : + - clocks : specifies the clock needed to operate the fifo controller + - clock-names : name(s) for the clock(s) listed in clocks Example for a board using PSC0 and PSC1 devices in serial mode: -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: BUG: sleeping function called from ras_epow_interrupt context
On 07/16/2015 01:23 AM, Thomas Huth wrote: > On 07/15/2015 09:58 PM, Nathan Fontenot wrote: >> On 07/15/2015 09:35 AM, Thomas Huth wrote: >>> On 07/14/2015 11:22 PM, Benjamin Herrenschmidt wrote: On Tue, 2015-07-14 at 20:43 +0200, Thomas Huth wrote: > Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use > mdelay() instead of msleep() in rtas_busy_delay()? Something more > fancy? A proper fix would be more fancy, the get_sensor should happen in a kernel thread instead. >>> >>> I'm not very familiar with this stuff, but isn't the EPOW interrupt >>> something that is very time-critical? Moving parts of the handler into a >>> kernel thread then does not sound like a very good idea to me... >>> >>> Another question: Can it happen at all that this get-sensor call results >>> in a sleep condition? Looking at commit ID >>> 81b73dd92b97423b8f5324a59044da478c04f4c4 ("Fix might-sleep warning on >>> removing cpus"), which apparently fixed a similar issue for CPU >>> hot-plugging, indicates that at least some of the rtas calls are never >>> returning the busy code? In that case we could fix this by introducing a >>> similar rtas_get_sensor_fast() function? (or simply revert 587f83e8dd50d >>> which would be quite similar, I think) >>> >> >> Looking at the PAPR, the get-sensor-state rtas call for the EPOW sensor >> is listed as a fast call and should not return a busy indication. > > Great, good to know, thanks for looking that up! So IMHO we should > either introduce a rtas_get_sensor_fast() function or revert > 587f83e8dd50d ... any preferences? Shall I come up with a patch? > A quick look at the kernel, I only find three places that rtas_get_sensor is called. The instance you point out here for the EPOW sensor is the only time I find it called for a sensor that should not return a busy indication. Reverting commit 587f83e8dd50d would solve the issue but not fix any future users of a fast get-sensor call. I don't have an issue with a patch for a rtas_get_sensor_fast(). -Nathan >> I'm curious as to why we're getting a busy return indication when >> making this call. > > Looking at the code again, rtas_busy_delay() likely never slept ... it's > likely just the "might_sleep()" annotation in that function that causes > the BUG. > > Thomas > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 2/2] perf/kvm: Support HCALL events
Powerpc provides hcall events that also provides insights into guest behaviour. Enhance perf kvm stat to record and analyze hcall events. - To trace hcall events : perf kvm stat record - To show the results : perf kvm stat report --event=hcall The result shows the number of hypervisor calls from the guest grouped by their respective reasons displayed with the frequency. This patch makes use of two additional tracepoints "kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit". To map the hcall codes to their respective names, it needs a mapping. Such mapping is added in this patch in book3s_hcalls.h. Note that this patch has a dependency on "perf,kvm/ppc: Add hcall related info to kvm_perf.h" which adds the hcall related tracepoints to kvm_perf.h to let "perf kvm stat" know about these tracepoints. # pgrep qemu A sample output : 19378 60515 2 VMs running. # perf kvm stat record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data.guest (39624 samples) ] # perf kvm stat report -p 60515 --event=hcall Analyze events for pid(s) 60515, all VCPUs: HCALL-EVENTSamples Samples% Time%Min TimeMax Time Avg time H_VIO_SIGNAL 103438.44%15.77% 0.36us 1.59us 0.44us ( +- 0.66% ) H_SEND_CRQ65224.24%10.97% 0.39us 1.84us 0.49us ( +- 1.20% ) H_IPI52319.44%62.05% 1.35us 19.70us 3.44us ( +- 2.88% ) H_PUT_TERM_CHAR41115.28% 8.03% 0.38us 3.77us 0.57us ( +- 1.61% ) H_GET_TERM_CHAR 50 1.86% 0.99% 0.40us 0.98us 0.57us ( +- 3.37% ) H_EOI 20 0.74% 2.19% 2.22us 4.72us 3.17us ( +- 5.96% ) Total Samples:2690, Total events handled time:2896.94us. Signed-off-by: Hemant Kumar --- This patch has a direct dependency on : http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg91605.html Changes: - Added definitions for hcall code to hcall reason mapping in the userspace side. tools/perf/arch/powerpc/util/book3s_hcalls.h | 123 +++ tools/perf/arch/powerpc/util/kvm-stat.c | 64 ++ 2 files changed, 187 insertions(+) create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h diff --git a/tools/perf/arch/powerpc/util/book3s_hcalls.h b/tools/perf/arch/powerpc/util/book3s_hcalls.h new file mode 100644 index 000..3d50def --- /dev/null +++ b/tools/perf/arch/powerpc/util/book3s_hcalls.h @@ -0,0 +1,123 @@ +#ifndef ARCH_PERF_BOOK3S_HCALLS_H +#define ARCH_PERF_BOOK3S_HCALLS_H + +/* + * PowerPC HCALL codes : hcall name to reason mapping + */ +#define kvm_trace_symbol_hcall \ + {0x4,"H_REMOVE"}, \ + {0x8,"H_ENTER"},\ + {0xc,"H_READ"}, \ + {0x10,"H_CLEAR_MOD"}, \ + {0x14,"H_CLEAR_REF"}, \ + {0x18,"H_PROTECT"}, \ + {0x1c,"H_GET_TCE"}, \ + {0x20,"H_PUT_TCE"}, \ + {0x24,"H_SET_SPRG0"}, \ + {0x28,"H_SET_DABR"},\ + {0x2c,"H_PAGE_INIT"}, \ + {0x30,"H_SET_ASR"}, \ + {0x34,"H_ASR_ON"}, \ + {0x38,"H_ASR_OFF"}, \ + {0x3c,"H_LOGICAL_CI_LOAD"}, \ + {0x40,"H_LOGICAL_CI_STORE"},\ + {0x44,"H_LOGICAL_CACHE_LOAD"}, \ + {0x48,"H_LOGICAL_CACHE_STORE"}, \ + {0x4c,"H_LOGICAL_ICBI"},\ + {0x50,"H_LOGICAL_DCBF"},\ + {0x54,"H_GET_TERM_CHAR"}, \ + {0x58,"H_PUT_TERM_CHAR"}, \ + {0x5c,"H_REAL_TO_LOGICAL"}, \ + {0x60,"H_HYPERVISOR_DATA"}, \ + {0x64,"H_EOI"}, \ + {0x68,"H_CPPR"},\ + {0x6c,"H_IPI"}, \ + {0x70,"H_IPOLL"}, \ + {0x74,"H_XIRR"},\ + {0x78,"H_MIGRATE_DMA"}, \ + {0x7c,"H_PERFMON"}, \ + {0xdc,"H_REGISTER_VPA"},\ + {0xe0,"H_CEDE"},\ + {0xe4,"H_CONFER"}, \ + {0xe8,"H_PROD"},\ + {0xec,"H_GET_PPP"}, \ + {0xf0,"H_SET_PPP"}, \ + {0xf4,"H_PURR"},\
[PATCH v5 1/2] perf/kvm: Port perf kvm stat to powerpc
From: Srikar Dronamraju perf kvm can be used to analyze guest exit reasons. This support already exists in x86. Hence, porting it to powerpc. - To trace KVM events : perf kvm stat record If many guests are running, we can track for a specific guest by using --pid as in : perf kvm stat record --pid - To see the results : perf kvm stat report The result shows the number of exits (from the guest context to host/hypervisor context) grouped by their respective exit reasons with their frequency. To analyze the different exits, group them and present them (in a slightly descriptive way) to the user, we need a mapping between the "exit code" (dumped in the kvm_guest_exit tracepoint data) and to its related Interrupt vector description (exit reason). This patch adds this mapping in book3s_exits.h. It records on two available KVM tracepoints : "kvm_hv:kvm_guest_exit" and "kvm_hv:kvm_guest_enter". Note that this patch has a direct dependency on "perf,kvm/ppc: Add kvm_perf.h for powerpc" which adds kvm_perf.h, where the required kvm tracpoints are defined for "perf kvm stat" to be used. Here is a sample o/p: # pgrep qemu 19378 60515 2 Guests are running on the host. # perf kvm stat record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data.guest (39624 samples) ] # perf kvm stat report -p 60515 Analyze events for pid(s) 60515, all VCPUs: VM-EXITSamples Samples% Time%Min Time Max Time Avg time H_DATA_STORAGE 500635.30% 0.13% 1.94us 49.46us 12.37us ( +- 0.52% ) HV_DECREMENTER 445731.43% 0.02% 0.72us 16.14us 1.91us ( +- 0.96% ) SYSCALL 269018.97% 0.10% 2.84us528.24us 18.29us ( +- 3.75% ) RETURN_TO_HOST 178912.61%99.76% 1.58us 672791.91us 27470.23us ( +- 3.00% ) EXTERNAL240 1.69% 0.00% 0.69us 10.67us 1.33us ( +- 5.34% ) Total Samples:14182, Total events handled time:49264158.30us. Signed-off-by: Srikar Dronamraju Signed-off-by: Hemant Kumar --- This patch has a direct dependency on: http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg91603.html Changes : - Added exit reasons definitions(unlikely to change) in the userspace side. tools/perf/arch/powerpc/Makefile| 1 + tools/perf/arch/powerpc/util/Build | 1 + tools/perf/arch/powerpc/util/book3s_exits.h | 33 + tools/perf/arch/powerpc/util/kvm-stat.c | 33 + 4 files changed, 68 insertions(+) create mode 100644 tools/perf/arch/powerpc/util/book3s_exits.h create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile index 7fbca17..21322e0 100644 --- a/tools/perf/arch/powerpc/Makefile +++ b/tools/perf/arch/powerpc/Makefile @@ -1,3 +1,4 @@ ifndef NO_DWARF PERF_HAVE_DWARF_REGS := 1 endif +HAVE_KVM_STAT_SUPPORT := 1 diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build index 7b8b0d1..c8fe207 100644 --- a/tools/perf/arch/powerpc/util/Build +++ b/tools/perf/arch/powerpc/util/Build @@ -1,5 +1,6 @@ libperf-y += header.o libperf-y += sym-handling.o +libperf-y += kvm-stat.o libperf-$(CONFIG_DWARF) += dwarf-regs.o libperf-$(CONFIG_DWARF) += skip-callchain-idx.o diff --git a/tools/perf/arch/powerpc/util/book3s_exits.h b/tools/perf/arch/powerpc/util/book3s_exits.h new file mode 100644 index 000..94c58f4 --- /dev/null +++ b/tools/perf/arch/powerpc/util/book3s_exits.h @@ -0,0 +1,33 @@ +#ifndef ARCH_PERF_BOOK3S_EXITS_H +#define ARCH_PERF_BOOK3S_EXITS_H + +/* + * PowerPC Interrupt vectors : exit code to name mapping + */ + +#define kvm_trace_symbol_exit \ + {0x0, "RETURN_TO_HOST"}, \ + {0x100, "SYSTEM_RESET"}, \ + {0x200, "MACHINE_CHECK"}, \ + {0x300, "DATA_STORAGE"}, \ + {0x380, "DATA_SEGMENT"}, \ + {0x400, "INST_STORAGE"}, \ + {0x480, "INST_SEGMENT"}, \ + {0x500, "EXTERNAL"}, \ + {0x501, "EXTERNAL_LEVEL"}, \ + {0x502, "EXTERNAL_HV"}, \ + {0x600, "ALIGNMENT"}, \ + {0x700, "PROGRAM"}, \ + {0x800, "FP_UNAVAIL"}, \ + {0x900, "DECREMENTER"}, \ + {0x980, "HV_DECREMENTER"}, \ + {0xc00, "SYSCALL"}, \ + {0xd00, "TRACE"}, \ + {0xe00, "H_DATA_STORAGE"}, \ + {0xe20, "H_INST_STORAGE"}, \ + {0xe40, "H_EMUL_ASSIST"}, \ + {0xf00, "PERFMON"}, \ + {0xf20, "ALTIVEC"}, \ + {0xf40, "VSX"} + +#endif diff --git a/tools/perf/arch/powerpc/util/kvm-stat.c b/tools/perf/arch/powerpc/util/kvm-stat.c new file mode 100644 index 000..d0e1930 --- /dev/null +++ b/tools/perf/arch/powerpc/util/kvm-stat.c @@ -0,0 +1,33 @@ +#include "../../util/kvm-stat.h" +#include "book3s_exits.h" + +define_exit_reasons_table(hv_exit_reasons, kvm_trace_symbol_exit); + +static struct kvm_events_ops
[PATCH v5 2/2] perf,kvm/ppc: Add hcall related info to kvm_perf.h
To analyze the hcalls with perf, we need the hcall related tracepoints information to be exported. This patch adds hcall tracepoints "kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit" to kvm_perf.h. So, perf will now know as to what tracepoints to look for if we are using "perf kvm stat record" to collect guest hcall statistics. Signed-off-by: Hemant Kumar --- Changes: - Not exporting the hcall related codes and names through uapi compared to previous patch. arch/powerpc/include/uapi/asm/kvm_perf_book3s.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h index 8c8d8c2..1378a8d 100644 --- a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h +++ b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h @@ -11,4 +11,8 @@ #define KVM_EXIT_TRACE "kvm_hv:kvm_guest_exit" #define KVM_EXIT_REASON "trap" +#define KVM_HCALL_ENTRY_TRACE "kvm_hv:kvm_hcall_enter" +#define KVM_HCALL_EXIT_TRACE "kvm_hv:kvm_hcall_exit" +#define KVM_HCALL_REASON "req" + #endif /* _ASM_POWERPC_KVM_PERF_BOOK3S_H */ -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V8 06/10] powerpc/eeh: Create PE for VFs
On Thu, Jun 18, 2015 at 04:06:41PM +0800, Wei Yang wrote: > Current EEH recovery code works with the assumption: the PE has primary > bus. Unfortunately, that's not true for VF PEs, which generally contains > one or multiple VFs (for VF group case). > > The patch introduces a weak function pcibios_bus_add_device() which is > called by pci_bus_add_device(). In this function, we creates PEs for VFs. > Those PEs for VFs are identified with newly introduced flag EEH_PE_VF so > that we handle them differently during EEH recovery. > > [gwshan: changelog and code refactoring] > Signed-off-by: Wei Yang > Acked-by: Gavin Shan > --- > arch/powerpc/include/asm/eeh.h |1 + > arch/powerpc/kernel/eeh_pe.c | 10 -- > arch/powerpc/platforms/powernv/eeh-powernv.c | 16 > drivers/pci/bus.c|2 ++ > 4 files changed, 27 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h > index 1b3614d..c1fde48 100644 > --- a/arch/powerpc/include/asm/eeh.h > +++ b/arch/powerpc/include/asm/eeh.h > @@ -70,6 +70,7 @@ struct pci_dn; > #define EEH_PE_PHB (1 << 1)/* PHB PE*/ > #define EEH_PE_DEVICE(1 << 2)/* Device PE */ > #define EEH_PE_BUS (1 << 3)/* Bus PE*/ > +#define EEH_PE_VF(1 << 4)/* VF PE */ > > #define EEH_PE_ISOLATED (1 << 0)/* Isolated PE > */ > #define EEH_PE_RECOVERING(1 << 1)/* Recovering PE*/ > diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c > index 35f0b62..260a701 100644 > --- a/arch/powerpc/kernel/eeh_pe.c > +++ b/arch/powerpc/kernel/eeh_pe.c > @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev > *edev) >* EEH device already having associated PE, but >* the direct parent EEH device doesn't have yet. >*/ > - pdn = pdn ? pdn->parent : NULL; > + if (edev->physfn) > + pdn = pci_get_pdn(edev->physfn); > + else > + pdn = pdn ? pdn->parent : NULL; > while (pdn) { > /* We're poking out of PCI territory */ > parent = pdn_to_eeh_dev(pdn); > @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev) > } > > /* Create a new EEH PE */ > - pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE); > + if (edev->physfn) > + pe = eeh_pe_alloc(edev->phb, EEH_PE_VF); > + else > + pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE); > if (!pe) { > pr_err("%s: out of memory!\n", __func__); > return -ENOMEM; > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c > b/arch/powerpc/platforms/powernv/eeh-powernv.c > index ce738ab..4ec1d2e 100644 > --- a/arch/powerpc/platforms/powernv/eeh-powernv.c > +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c > @@ -1520,6 +1520,22 @@ static struct eeh_ops pnv_eeh_ops = { > .restore_config = pnv_eeh_restore_config > }; > > +void pcibios_bus_add_device(struct pci_dev *pdev) > +{ > + struct pci_dn *pdn = pci_get_pdn(pdev); > + > + if (!pdev->is_virtfn) > + return; > + > + /* > + * The following operations will fail if VF's sysfs files > + * aren't created or its resources aren't finalized. > + */ > + eeh_add_device_early(pdn); > + eeh_add_device_late(pdev); > + eeh_sysfs_add_device(pdev); > +} > + > /** > * eeh_powernv_init - Register platform dependent EEH operations > * > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c > index 90fa3a7..960577f 100644 > --- a/drivers/pci/bus.c > +++ b/drivers/pci/bus.c > @@ -267,6 +267,7 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx) > > void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { } > > +void __weak pcibios_bus_add_device(struct pci_dev *dev) { } > /** > * pci_bus_add_device - start driver for a single device > * @dev: device to add > @@ -277,6 +278,7 @@ void pci_bus_add_device(struct pci_dev *dev) > { > int retval; > > + pcibios_bus_add_device(dev); Add a blank line here. > /* >* Can not put in pci_device_add yet because resources >* are not assigned yet for some devices. Please put the drivers/pci/bus.c change in a separate patch so it's clear that we're changing the PCI core here, not just the powerpc code. That will also make it possible to revert the powerpc change if necessary without breaking any other pcibios_bus_add_device() users that may be added. You can add my Acked-by: to the new drivers/pci/bus.c patch. Bjorn ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 1/2] perf,kvm/ppc: Add kvm_perf.h for powerpc
To analyze the exit events with perf, we need kvm_perf.h to be added in the arch/powerpc directory, where the kvm tracepoints needed to trace the KVM exit events are defined. This patch adds "kvm_perf_book3s.h" to indicate that the tracepoints are book3s specific. Generic "kvm_perf.h" then can just include "kvm_perf_book3s.h". Signed-off-by: Hemant Kumar --- Changes: - Not exporting the exit reasons compared to previous patchset (suggested by Paul) arch/powerpc/include/uapi/asm/kvm_perf.h| 6 ++ arch/powerpc/include/uapi/asm/kvm_perf_book3s.h | 14 ++ 2 files changed, 20 insertions(+) create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf.h create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf_book3s.h diff --git a/arch/powerpc/include/uapi/asm/kvm_perf.h b/arch/powerpc/include/uapi/asm/kvm_perf.h new file mode 100644 index 000..5ed2ff3 --- /dev/null +++ b/arch/powerpc/include/uapi/asm/kvm_perf.h @@ -0,0 +1,6 @@ +#ifndef _ASM_POWERPC_KVM_PERF_H +#define _ASM_POWERPC_KVM_PERF_H + +#include + +#endif diff --git a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h new file mode 100644 index 000..8c8d8c2 --- /dev/null +++ b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h @@ -0,0 +1,14 @@ +#ifndef _ASM_POWERPC_KVM_PERF_BOOK3S_H +#define _ASM_POWERPC_KVM_PERF_BOOK3S_H + +#include + +#define DECODE_STR_LEN 20 + +#define VCPU_ID "vcpu_id" + +#define KVM_ENTRY_TRACE "kvm_hv:kvm_guest_enter" +#define KVM_EXIT_TRACE "kvm_hv:kvm_guest_exit" +#define KVM_EXIT_REASON "trap" + +#endif /* _ASM_POWERPC_KVM_PERF_BOOK3S_H */ -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 11/20] tty/hvc: xen: Use xen page definition
On Thu, 9 Jul 2015, Julien Grall wrote: > The console ring is always based on the page granularity of Xen. > > Signed-off-by: Julien Grall > Cc: Greg Kroah-Hartman > Cc: Jiri Slaby > Cc: David Vrabel > Cc: Stefano Stabellini > Cc: Boris Ostrovsky > Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Stefano Stabellini > drivers/tty/hvc/hvc_xen.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c > index a9d837f..2135944 100644 > --- a/drivers/tty/hvc/hvc_xen.c > +++ b/drivers/tty/hvc/hvc_xen.c > @@ -230,7 +230,7 @@ static int xen_hvm_console_init(void) > if (r < 0 || v == 0) > goto err; > mfn = v; > - info->intf = xen_remap(mfn << PAGE_SHIFT, PAGE_SIZE); > + info->intf = xen_remap(mfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE); > if (info->intf == NULL) > goto err; > info->vtermno = HVC_COOKIE; > @@ -392,7 +392,7 @@ static int xencons_connect_backend(struct xenbus_device > *dev, > if (xen_pv_domain()) > mfn = virt_to_mfn(info->intf); > else > - mfn = __pa(info->intf) >> PAGE_SHIFT; > + mfn = __pa(info->intf) >> XEN_PAGE_SHIFT; > ret = gnttab_alloc_grant_references(1, &gref_head); > if (ret < 0) > return ret; > @@ -476,7 +476,7 @@ static int xencons_resume(struct xenbus_device *dev) > struct xencons_info *info = dev_get_drvdata(&dev->dev); > > xencons_disconnect_backend(info); > - memset(info->intf, 0, PAGE_SIZE); > + memset(info->intf, 0, XEN_PAGE_SIZE); > return xencons_connect_backend(dev, info); > } > > -- > 2.1.4 > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ipmi/powernv: Fix potential invalid pointer dereference
Ok, this looks fine. A couple of question... Do I need to send this upstream right now? How well has this been tested? Do you want this backported to 4.0 stable? -corey On 07/16/2015 06:16 AM, Neelesh Gupta wrote: > If the OPAL call to receive the ipmi message fails, then we free up the > smi message and return. But, the driver still holds the reference to > old smi message in the 'cur_msg' which can potentially be accessed later > and freed again leading to kernel oops. To fix it up, > > The kernel driver should reset the 'cur_msg' and send reply to the user > in addition to freeing the message. > > Signed-off-by: Neelesh Gupta > --- > drivers/char/ipmi/ipmi_powernv.c | 13 ++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/char/ipmi/ipmi_powernv.c > b/drivers/char/ipmi/ipmi_powernv.c > index 9b409c0..637486d 100644 > --- a/drivers/char/ipmi/ipmi_powernv.c > +++ b/drivers/char/ipmi/ipmi_powernv.c > @@ -143,9 +143,16 @@ static int ipmi_powernv_recv(struct ipmi_smi_powernv > *smi) > pr_devel("%s: -> %d (size %lld)\n", __func__, > rc, rc == 0 ? size : 0); > if (rc) { > - spin_unlock_irqrestore(&smi->msg_lock, flags); > - ipmi_free_smi_msg(msg); > - return 0; > + /* If came via the poll, and response was not yet ready */ > + if (rc == OPAL_EMPTY) { > + spin_unlock_irqrestore(&smi->msg_lock, flags); > + return 0; > + } else { > + smi->cur_msg = NULL; > + spin_unlock_irqrestore(&smi->msg_lock, flags); > + send_error_reply(smi, msg, IPMI_ERR_UNSPECIFIED); > + return 0; > + } > } > > if (size < sizeof(*opal_msg)) { > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] spi: mpc512x-psc: add support for Freescale MPC5125
On Wed, Jul 15, 2015 at 09:40:19AM +0200, Uwe Kleine-König wrote: > On Tue, Jul 14, 2015 at 10:54:42AM +0100, Mark Brown wrote: > > > static const struct of_device_id mpc512x_psc_spi_of_match[] = { > > > - { .compatible = "fsl,mpc5121-psc-spi", }, > > > + { .compatible = "fsl,mpc5121-psc-spi", .data = (void *)TYPE_MPC5121 }, > > > + { .compatible = "fsl,mpc5125-psc-spi", .data = (void *)TYPE_MPC5125 }, > > > {}, > > The code seems fine but this should update the binding document to > > include the new compatible string. > I don't find fsl,mpc5121-psc-spi documented either. The best I found is > ocumentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt which > describes fsl,mpc5121-psc-uart and fsl,mpc5121-psc. OK, then please add a basic binding document. The point is that new bindings should be being documented, if people have been lax on this in the past that does involve a bit of cleanup. signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*
The hardware RNG on POWER8 and POWER7+ can be relatively slow, since it can only supply one 64-bit value per microsecond. Currently we read it in arch_get_random_long(), but that slows down reading from /dev/urandom since the code in random.c calls arch_get_random_long() for every longword read from /dev/urandom. Since the hardware RNG supplies high-quality entropy on every read, it matches the semantics of arch_get_random_seed_long() better than those of arch_get_random_long(). Therefore this commit makes the code use the hardware RNG only for arch_get_random_seed_{long,int} and not for arch_get_random_{long,int}. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/archrandom.h | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/archrandom.h b/arch/powerpc/include/asm/archrandom.h index 0cc6eed..a4c3f54 100644 --- a/arch/powerpc/include/asm/archrandom.h +++ b/arch/powerpc/include/asm/archrandom.h @@ -7,13 +7,22 @@ static inline int arch_get_random_long(unsigned long *v) { + return 0; +} + +static inline int arch_get_random_int(unsigned int *v) +{ + return 0; +} + +static inline int arch_get_random_seed_long(unsigned long *v) +{ if (ppc_md.get_random_long) return ppc_md.get_random_long(v); return 0; } - -static inline int arch_get_random_int(unsigned int *v) +static inline int arch_get_random_seed_int(unsigned int *v) { unsigned long val; int rc; @@ -27,22 +36,13 @@ static inline int arch_get_random_int(unsigned int *v) static inline int arch_has_random(void) { - return !!ppc_md.get_random_long; -} - -static inline int arch_get_random_seed_long(unsigned long *v) -{ - return 0; -} -static inline int arch_get_random_seed_int(unsigned int *v) -{ return 0; } + static inline int arch_has_random_seed(void) { - return 0; + return !!ppc_md.get_random_long; } - #endif /* CONFIG_ARCH_RANDOM */ #ifdef CONFIG_PPC_POWERNV -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 13/23] powerpc/time: Migrate to new 'set-state' interface
Migrate powerpc driver to the new 'set-state' interface provided by clockevents core, the earlier 'set-mode' interface is marked obsolete now. This also enables us to implement callbacks for new states of clockevent devices, for example: ONESHOT_STOPPED. We weren't doing anything in ->set_mode(ONSHOT) and so set_state_oneshot() isn't implemented. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Viresh Kumar --- arch/powerpc/kernel/time.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 43922509a483..1be1092c7204 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -99,16 +99,17 @@ static struct clocksource clocksource_timebase = { static int decrementer_set_next_event(unsigned long evt, struct clock_event_device *dev); -static void decrementer_set_mode(enum clock_event_mode mode, -struct clock_event_device *dev); +static int decrementer_shutdown(struct clock_event_device *evt); struct clock_event_device decrementer_clockevent = { - .name = "decrementer", - .rating = 200, - .irq= 0, - .set_next_event = decrementer_set_next_event, - .set_mode = decrementer_set_mode, - .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP, + .name = "decrementer", + .rating = 200, + .irq= 0, + .set_next_event = decrementer_set_next_event, + .set_state_shutdown = decrementer_shutdown, + .tick_resume= decrementer_shutdown, + .features = CLOCK_EVT_FEAT_ONESHOT | + CLOCK_EVT_FEAT_C3STOP, }; EXPORT_SYMBOL(decrementer_clockevent); @@ -862,11 +863,10 @@ static int decrementer_set_next_event(unsigned long evt, return 0; } -static void decrementer_set_mode(enum clock_event_mode mode, -struct clock_event_device *dev) +static int decrementer_shutdown(struct clock_event_device *dev) { - if (mode != CLOCK_EVT_MODE_ONESHOT) - decrementer_set_next_event(DECREMENTER_MAX, dev); + decrementer_set_next_event(DECREMENTER_MAX, dev); + return 0; } /* Interrupt handler for the timer broadcast IPI */ -- 2.4.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] ipmi/powernv: Fix potential invalid pointer dereference
If the OPAL call to receive the ipmi message fails, then we free up the smi message and return. But, the driver still holds the reference to old smi message in the 'cur_msg' which can potentially be accessed later and freed again leading to kernel oops. To fix it up, The kernel driver should reset the 'cur_msg' and send reply to the user in addition to freeing the message. Signed-off-by: Neelesh Gupta --- drivers/char/ipmi/ipmi_powernv.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_powernv.c b/drivers/char/ipmi/ipmi_powernv.c index 9b409c0..637486d 100644 --- a/drivers/char/ipmi/ipmi_powernv.c +++ b/drivers/char/ipmi/ipmi_powernv.c @@ -143,9 +143,16 @@ static int ipmi_powernv_recv(struct ipmi_smi_powernv *smi) pr_devel("%s: -> %d (size %lld)\n", __func__, rc, rc == 0 ? size : 0); if (rc) { - spin_unlock_irqrestore(&smi->msg_lock, flags); - ipmi_free_smi_msg(msg); - return 0; + /* If came via the poll, and response was not yet ready */ + if (rc == OPAL_EMPTY) { + spin_unlock_irqrestore(&smi->msg_lock, flags); + return 0; + } else { + smi->cur_msg = NULL; + spin_unlock_irqrestore(&smi->msg_lock, flags); + send_error_reply(smi, msg, IPMI_ERR_UNSPECIFIED); + return 0; + } } if (size < sizeof(*opal_msg)) { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 3/7] powerpc/powernv: Nest PMU detection and device tree parser
Create a file "nest-pmu.c" to contain nest pmu related functions. Code to detect nest pmu support and parser to collect per-chip reserved memory region information from device tree (DT). Detection mechanism is to look for specific property "ibm,ima-chip" in DT. For Nest pmu, device tree will have two set of information. 1) Per-chip reserved memory region for nest pmu counter collection area. 2) Supported Nest PMUs and events Device tree layout for the Nest PMU as follows. / -- DT root folder | -nest-ima -- Nest PMU folder | -ima-chip@ -- Per-chip folder for reserved region information | -ibm,chip-id-- Chip id -ibm,ima-chip -reg-- HOMER PORE Nest Counter collection Address (RA) -size -- size to map in kernel space -Alink_BW-- Nest PMU folder | -Alink0 -- Nest PMU Alink Event file -scale.Alink0.scale -- Event scale file -unit.Alink0.unit -- Event unit file -device_type-- "nest-ima-unit" marker Subsequent patch will parse the next part of the DT to find various Nest PMUs and their events. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Anshuman Khandual Cc: Stephane Eranian Signed-off-by: Madhavan Srinivasan --- arch/powerpc/perf/Makefile | 2 +- arch/powerpc/perf/nest-pmu.c | 85 2 files changed, 86 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/perf/nest-pmu.c diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index f9c083a5652a..6da656b50e3c 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \ power5+-pmu.o power6-pmu.o power7-pmu.o \ - power8-pmu.o + power8-pmu.o nest-pmu.o obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c new file mode 100644 index ..e7d45ed4922d --- /dev/null +++ b/arch/powerpc/perf/nest-pmu.c @@ -0,0 +1,85 @@ +/* + * Nest Performance Monitor counter support for POWER8 processors. + * + * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include "nest-pmu.h" + +static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS]; + +static int nest_ima_dt_parser(void) +{ + const __be32 *gcid; + const __be64 *chip_ima_reg; + const __be64 *chip_ima_size; + struct device_node *dev; + struct perchip_nest_info *p8ni; + int idx; + + /* +* "nest-ima" folder contains two things, +* a) per-chip reserved memory region for Nest PMU Counter data +* b) Support Nest PMU units and their event files +*/ + for_each_node_with_property(dev, "ibm,ima-chip") { + gcid = of_get_property(dev, "ibm,chip-id", NULL); + chip_ima_reg = of_get_property(dev, "reg", NULL); + chip_ima_size = of_get_property(dev, "size", NULL); + + if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) { + pr_err("Nest_PMU: device %s missing property\n", + dev->full_name); + return -ENODEV; + } + + /* chip id to save reserve memory region */ + idx = (uint32_t)be32_to_cpup(gcid); + + /* +* Using a local variable to make it compact and +* easier to read +*/ + p8ni = &p8_nest_perchip_info[idx]; + p8ni->pbase = be64_to_cpup(chip_ima_reg); + p8ni->size = be64_to_cpup(chip_ima_size); + p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase); + } + + return 0; +} + +static int __init nest_pmu_init(void) +{ + int ret = -ENODEV; + + /* +* Lets do this only if we are hypervisor +*/ + if (!cur_cpu_spec->oprofile_cpu_type || + !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) || + !cpu_has_feature(CPU_FTR_HVMODE)) + return ret; + + /* +* Nest PMU information is grouped under "nest-ima" node +* of the top-level device-tree directory. Detect Nest PMU +* by the "ibm,ima-chip" property. +*/ + if (!of_find_node_with_property(NULL, "ibm,ima-chip")) +
[PATCH v5 4/7] powerpc/powernv: detect supported nest pmus and its events
Parse device tree to detect supported nest pmu units. Traverse through each nest pmu unit folder to find supported events and corresponding unit/scale files (if any). The nest unit event file from DT, will contain the offset in the reserved memory region to get the counter data for a given event. Kernel code uses this offset as event configuration value. Device tree parser code also looks for scale/unit in the file name and passes on the file as an event attr for perf tool to use in the post processing. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Anshuman Khandual Cc: Stephane Eranian Signed-off-by: Madhavan Srinivasan --- arch/powerpc/perf/nest-pmu.c | 126 ++- 1 file changed, 125 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c index e7d45ed4922d..c4c08e4dee55 100644 --- a/arch/powerpc/perf/nest-pmu.c +++ b/arch/powerpc/perf/nest-pmu.c @@ -11,6 +11,121 @@ #include "nest-pmu.h" static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS]; +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS]; + +static int nest_event_info(struct property *pp, char *name, + struct nest_ima_events *p8_events, int string, u32 val) +{ + char *buf; + + /* memory for event name */ + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + strncpy(buf, name, strlen(name)); + p8_events->ev_name = buf; + + /* memory for content */ + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + if (string) { + /* string content*/ + if (!pp->value || + (strnlen(pp->value, pp->length) == pp->length) || + (pp->length > P8_NEST_MAX_PMU_NAME_LEN)) + return -EINVAL; + + strncpy(buf, (const char *)pp->value, pp->length); + } else + sprintf(buf, "event=0x%x", val); + + p8_events->ev_value = buf; + return 0; +} + +static int nest_pmu_create(struct device_node *dev, int pmu_index) +{ + struct nest_ima_events **p8_events_arr, *p8_events; + struct nest_pmu *pmu_ptr; + struct property *pp; + char *buf, *start; + const __be32 *lval; + u32 val; + int idx = 0, ret; + + if (!dev) + return -EINVAL; + + /* memory for nest pmus */ + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL); + if (!pmu_ptr) + return -ENOMEM; + + /* Needed for hotplug/migration */ + per_nest_pmu_arr[pmu_index] = pmu_ptr; + + /* memory for nest pmu events */ + p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64), + GFP_KERNEL); + if (!p8_events_arr) + return -ENOMEM; + p8_events = (struct nest_ima_events *)p8_events_arr; + + /* +* Loop through each property +*/ + for_each_property_of_node(dev, pp) { + start = pp->name; + + if (!strcmp(pp->name, "name")) { + if (!pp->value || + (strnlen(pp->value, pp->length) == pp->length) || + (pp->length > P8_NEST_MAX_PMU_NAME_LEN)) + return -EINVAL; + + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + /* Save the name to register it later */ + sprintf(buf, "Nest_%s", (char *)pp->value); + pmu_ptr->pmu.name = (char *)buf; + continue; + } + + /* Skip these, we dont need it */ + if (!strcmp(pp->name, "phandle") || + !strcmp(pp->name, "device_type") || + !strcmp(pp->name, "linux,phandle")) + continue; + + if (strncmp(pp->name, "unit.", 5) == 0) { + /* Skip first few chars in the name */ + start += 5; + ret = nest_event_info(pp, start, p8_events++, 1, 0); + } else if (strncmp(pp->name, "scale.", 6) == 0) { + /* Skip first few chars in the name */ + start += 6; + ret = nest_event_info(pp, start, p8_events++, 1, 0); + } else { + lval = of_get_property(dev, pp->name, NULL); + val = (uint32_t)be32_to_cpup(lval); + + ret = nest_event_info(pp, start, p8_events++, 0, val); + } + + if (ret) + return ret; + +
[PATCH v5 6/7] powerpc/powernv: generic nest pmu event functions
Add set of generic nest pmu related event functions to be used by each nest pmu. Add code to register nest pmus. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Anshuman Khandual Cc: Stephane Eranian Signed-off-by: Madhavan Srinivasan --- arch/powerpc/perf/nest-pmu.c | 105 +++ 1 file changed, 105 insertions(+) diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c index f3418bdec1cd..2ebd0508e9b3 100644 --- a/arch/powerpc/perf/nest-pmu.c +++ b/arch/powerpc/perf/nest-pmu.c @@ -24,6 +24,101 @@ static struct attribute_group p8_nest_format_group = { .attrs = p8_nest_format_attrs, }; +static int p8_nest_event_init(struct perf_event *event) +{ + int chip_id; + + if (event->attr.type != event->pmu->type) + return -ENOENT; + + /* Sampling not supported yet */ + if (event->hw.sample_period) + return -EINVAL; + + /* unsupported modes and filters */ + if (event->attr.exclude_user || + event->attr.exclude_kernel || + event->attr.exclude_hv || + event->attr.exclude_idle || + event->attr.exclude_host || + event->attr.exclude_guest) + return -EINVAL; + + if (event->cpu < 0) + return -EINVAL; + + chip_id = topology_physical_package_id(event->cpu); + event->hw.event_base = event->attr.config + + p8_nest_perchip_info[chip_id].vbase; + + return 0; +} + +static void p8_nest_read_counter(struct perf_event *event) +{ + uint64_t *addr; + u64 data = 0; + + addr = (u64 *)event->hw.event_base; + data = __be64_to_cpu(*addr); + local64_set(&event->hw.prev_count, data); +} + +static void p8_nest_perf_event_update(struct perf_event *event) +{ + u64 counter_prev, counter_new, final_count; + uint64_t *addr; + + addr = (uint64_t *)event->hw.event_base; + counter_prev = local64_read(&event->hw.prev_count); + counter_new = __be64_to_cpu(*addr); + final_count = counter_new - counter_prev; + + local64_set(&event->hw.prev_count, counter_new); + local64_add(final_count, &event->count); +} + +static void p8_nest_event_start(struct perf_event *event, int flags) +{ + event->hw.state = 0; + p8_nest_read_counter(event); +} + +static void p8_nest_event_stop(struct perf_event *event, int flags) +{ + if (flags & PERF_EF_UPDATE) + p8_nest_perf_event_update(event); +} + +static int p8_nest_event_add(struct perf_event *event, int flags) +{ + if (flags & PERF_EF_START) + p8_nest_event_start(event, flags); + + return 0; +} + +/* + * Populate pmu ops in the structure + */ +static int update_pmu_ops(struct nest_pmu *pmu) +{ + if (!pmu) + return -EINVAL; + + pmu->pmu.task_ctx_nr = perf_invalid_context; + pmu->pmu.event_init = p8_nest_event_init; + pmu->pmu.add = p8_nest_event_add; + pmu->pmu.del = p8_nest_event_stop; + pmu->pmu.start = p8_nest_event_start; + pmu->pmu.stop = p8_nest_event_stop; + pmu->pmu.read = p8_nest_perf_event_update; + pmu->pmu.attr_groups = pmu->attr_groups; + + return 0; +} + + static int nest_event_info(struct property *pp, char *name, struct nest_ima_events *p8_events, int string, u32 val) { @@ -189,6 +284,16 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index) update_events_in_group( (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr); + update_pmu_ops(pmu_ptr); + /* Register the pmu */ + ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1); + if (ret) { + pr_err("Nest PMU %s Register failed\n", pmu_ptr->pmu.name); + return ret; + } + + pr_info("%s performance monitor hardware support registered\n", + pmu_ptr->pmu.name); return 0; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 5/7] powerpc/powernv: add event attribute and group to nest pmu
Add code to create event/format attributes and attribute groups for each nest pmu. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Anshuman Khandual Cc: Stephane Eranian Signed-off-by: Madhavan Srinivasan --- arch/powerpc/perf/nest-pmu.c | 65 1 file changed, 65 insertions(+) diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c index c4c08e4dee55..f3418bdec1cd 100644 --- a/arch/powerpc/perf/nest-pmu.c +++ b/arch/powerpc/perf/nest-pmu.c @@ -13,6 +13,17 @@ static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS]; static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS]; +PMU_FORMAT_ATTR(event, "config:0-20"); +static struct attribute *p8_nest_format_attrs[] = { + &format_attr_event.attr, + NULL, +}; + +static struct attribute_group p8_nest_format_group = { + .name = "format", + .attrs = p8_nest_format_attrs, +}; + static int nest_event_info(struct property *pp, char *name, struct nest_ima_events *p8_events, int string, u32 val) { @@ -46,6 +57,56 @@ static int nest_event_info(struct property *pp, char *name, return 0; } +/* + * Populate event name and string in attribute + */ +static struct attribute *dev_str_attr(const char *name, const char *str) +{ + struct perf_pmu_events_attr *attr; + + attr = kzalloc(sizeof(*attr), GFP_KERNEL); + + sysfs_attr_init(&attr->attr.attr); + + attr->event_str = str; + attr->attr.attr.name = name; + attr->attr.attr.mode = 0444; + attr->attr.show = perf_event_sysfs_show; + + return &attr->attr.attr; +} + +static int update_events_in_group( + struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu) +{ + struct attribute_group *attr_group; + struct attribute **attrs; + int i; + + /* +* Allocate memory for both event attribute group and for +* event attributes array. +*/ + attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) + + sizeof(*attr_group)), GFP_KERNEL); + if (!attr_group) + return -ENOMEM; + + /* +* Assign memory for event attribute array +*/ + attrs = (struct attribute **)(attr_group + 1); + attr_group->name = "events"; + attr_group->attrs = attrs; + + for (i = 0; i < idx; i++, p8_events++) + attrs[i] = dev_str_attr((char *)p8_events->ev_name, + (char *)p8_events->ev_value); + + pmu->attr_groups[0] = attr_group; + return 0; +} + static int nest_pmu_create(struct device_node *dev, int pmu_index) { struct nest_ima_events **p8_events_arr, *p8_events; @@ -93,6 +154,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index) /* Save the name to register it later */ sprintf(buf, "Nest_%s", (char *)pp->value); pmu_ptr->pmu.name = (char *)buf; + pmu_ptr->attr_groups[1] = &p8_nest_format_group; continue; } @@ -124,6 +186,9 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index) idx++; } + update_events_in_group( + (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr); + return 0; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 1/7] powerpc/powernv: Data structure and macros definition
Create new header file "nest-pmu.h" to add the data structures and macros needed for the nest pmu support. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Anshuman Khandual Cc: Stephane Eranian Signed-off-by: Madhavan Srinivasan --- arch/powerpc/perf/nest-pmu.h | 54 1 file changed, 54 insertions(+) create mode 100644 arch/powerpc/perf/nest-pmu.h diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h new file mode 100644 index ..28e3c6e024a6 --- /dev/null +++ b/arch/powerpc/perf/nest-pmu.h @@ -0,0 +1,54 @@ +/* + * Nest Performance Monitor counter support for POWER8 processors. + * + * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include + +#define P8_NEST_MAX_CHIPS 32 +#define P8_NEST_MAX_PMUS 32 +#define P8_NEST_MAX_PMU_NAME_LEN 256 +#define P8_NEST_MAX_EVENTS_SUPPORTED 256 +#define P8_NEST_ENGINE_START 1 +#define P8_NEST_ENGINE_STOP0 +#define P8_NEST_MODE_PRODUCTION1 + +/* + * Structure to hold per chip specific memory address + * information for nest pmus. Nest Counter data are exported + * in per-chip reserved memory region by the PORE Engine. + */ +struct perchip_nest_info { + uint32_t chip_id; + uint64_t pbase; + uint64_t vbase; + uint32_t size; +}; + +/* + * Place holder for nest pmu events and values. + */ +struct nest_ima_events { + const char *ev_name; + const char *ev_value; +}; + +/* + * Device tree parser code detects nest pmu support and + * registers new nest pmus. This structure will + * hold the pmu functions and attrs for each nest pmu and + * will be referenced at the time of pmu registration. + */ +struct nest_pmu { + struct pmu pmu; + const struct attribute_group *attr_groups[4]; +}; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 0/7]powerpc/powernv: Nest Instrumentation support
This patchset enables Nest Instrumentation support on powerpc. POWER8 has per-chip Nest Intrumentation which provides various per-chip metrics like memory, powerbus, Xlink and Alink bandwidth. Nest Instrumentation provides an interface (via PORE Engine) to configure and move the nest counter data to memory. From kernel side, OPAL Call interface is used to activate/deactivate PORE Engine for nest data collection. OPAL at boot, detects the feature, initializes it and pass on the nest units and other related information such as memory region, events supported so on, to kernel via device-tree. Kernel code then, parses the device-tree for nest pmu support and registers nest pmu with the events available. PORE Engine collects and accumulate nest counter data in per-chip reserved memory region, hence device-tree also exports per-chip nest accumulation memory region. And individual event offset are used as event configuration values. Here is sample perf usage to explain the interface. #./perf list iTLB-load-misses [Hardware cache event] Nest_Alink_BW/Alink0/ [Kernel PMU event] Nest_Alink_BW/Alink1/ [Kernel PMU event] Nest_Alink_BW/Alink2/ [Kernel PMU event] Nest_MCS_Read_BW/MCS_00/ [Kernel PMU event] Nest_MCS_Read_BW/MCS_01/ [Kernel PMU event] Nest_MCS_Read_BW/MCS_02/ [Kernel PMU event] Nest_MCS_Read_BW/MCS_03/ [Kernel PMU event] Nest_MCS_Write_BW/MCS_00/ [Kernel PMU event] Nest_MCS_Write_BW/MCS_01/ [Kernel PMU event] Nest_MCS_Write_BW/MCS_02/ [Kernel PMU event] Nest_MCS_Write_BW/MCS_03/ [Kernel PMU event] Nest_PowerBus_BW/External/ [Kernel PMU event] Nest_PowerBus_BW/Internal/ [Kernel PMU event] Nest_Xlink_BW/Xlink0/ [Kernel PMU event] Nest_Xlink_BW/Xlink1/ [Kernel PMU event] Nest_Xlink_BW/Xlink2/ [Kernel PMU event] rNNN [Raw hardware event descriptor] cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] . # ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1 Performance counter stats for 'system wide': CPU0 15,913.18 MiB Nest_Xlink_BW/Xlink1/ CPU3211,955.88 MiB Nest_Xlink_BW/Xlink1/ CPU6411,042.43 MiB Nest_Xlink_BW/Xlink1/ CPU9614,065.27 MiB Nest_Xlink_BW/Xlink1/ 1.001062038 seconds time elapsed # ./perf stat -e 'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 1000 sleep 5 Performance counter stats for 'system wide': CPU0 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%) CPU32 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%) CPU64 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%) CPU96 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%) CPU0 1,430.43 MiB Nest_Alink_BW/Alink1/ (100.00%) CPU32 320.99 MiB Nest_Alink_BW/Alink1/ (100.00%) CPU64 3,443.83 MiB Nest_Alink_BW/Alink1/ (100.00%) CPU96 1,904.41 MiB Nest_Alink_BW/Alink1/ (100.00%) CPU0 2,856.85 MiB Nest_Alink_BW/Alink2/ CPU32 7.50 MiB Nest_Alink_BW/Alink2/ CPU64 4,034.29 MiB Nest_Alink_BW/Alink2/ CPU96 288.49 MiB Nest_Alink_BW/Alink2/ . OPAL side patches are posted in the skiboot mailing list. Changelog from v4: 1) Variable name changes for consistency and added more comments 2) Added sysfs_att_init to have lockdep happy 3) Updated OPAL Call interface changes and added code to handle failure case. 4) Added new macro "P8_NEST_MODE_PRODUCTION" to specify PORE Engine mode 5) Modified nest_pmu_cpumask_init function to return value to nest pmu init function incase of OPAL call failure. Changelog from v3: No logic change, just a rebase to latest upstream kernel. Changelog from v2: 1) Changed variable and macro names to be consistent. 2) Made changes to commit message and code comment messages 3) Moved "format attribute" related code from patch 6 to 5 4) Added check for pmu register function 5) Changed cpu_init and cpu_exit functions to use first online cpu of the chip, there by making code lot simplier. Changelog from v1: 1) No logic changes, re-ordered patches make each patch compile without error
[PATCH v5 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support
Adds cpumask attribute to be used by each nest pmu since nest units are per-chip. Only one cpu (first online cpu) from each node/chip is designated to read counters. On cpu hotplug, dying cpu is checked to see whether it is one of the designated cpus, if yes, next online cpu from the same node/chip is designated as new cpu to read counters. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Anshuman Khandual Cc: Stephane Eranian Cc: Preeti U Murthy Cc: Ingo Molnar Cc: Peter Zijlstra Signed-off-by: Madhavan Srinivasan --- arch/powerpc/perf/nest-pmu.c | 172 +++ 1 file changed, 172 insertions(+) diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c index 2ebd0508e9b3..d3a2fd746cf9 100644 --- a/arch/powerpc/perf/nest-pmu.c +++ b/arch/powerpc/perf/nest-pmu.c @@ -12,6 +12,7 @@ static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS]; static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS]; +static cpumask_t nest_pmu_cpu_mask; PMU_FORMAT_ATTR(event, "config:0-20"); static struct attribute *p8_nest_format_attrs[] = { @@ -24,6 +25,172 @@ static struct attribute_group p8_nest_format_group = { .attrs = p8_nest_format_attrs, }; +static ssize_t nest_pmu_cpumask_get_attr(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return cpumap_print_to_pagebuf(true, buf, &nest_pmu_cpu_mask); +} + +static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL); + +static struct attribute *nest_pmu_cpumask_attrs[] = { + &dev_attr_cpumask.attr, + NULL, +}; + +static struct attribute_group nest_pmu_cpumask_attr_group = { + .attrs = nest_pmu_cpumask_attrs, +}; + +static void nest_init(int *loc) +{ + int rc; + + rc = opal_nest_ima_control( + P8_NEST_MODE_PRODUCTION, P8_NEST_ENGINE_START); + if (rc) + loc[smp_processor_id()] = 1; +} + +static void nest_change_cpu_context(int old_cpu, int new_cpu) +{ + int i; + + for (i = 0; per_nest_pmu_arr[i] != NULL; i++) + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu, + old_cpu, new_cpu); +} + +static void nest_exit_cpu(int cpu) +{ + int nid, target = -1; + struct cpumask *l_cpumask; + + /* +* Check in the designated list for this cpu. Dont bother +* if not one of them. +*/ + if (!cpumask_test_and_clear_cpu(cpu, &nest_pmu_cpu_mask)) + return; + + /* +* Now that this cpu is one of the designated, +* find a next cpu a) which is online and b) in same chip. +*/ + nid = cpu_to_node(cpu); + l_cpumask = cpumask_of_node(nid); + target = cpumask_next(cpu, l_cpumask); + + /* +* Update the cpumask with the target cpu and +* migrate the context if needed +*/ + if (target >= 0 && target <= nr_cpu_ids) { + cpumask_set_cpu(target, &nest_pmu_cpu_mask); + nest_change_cpu_context(cpu, target); + } +} + +static void nest_init_cpu(int cpu) +{ + int nid, fcpu, ncpu; + struct cpumask *l_cpumask, tmp_mask; + + nid = cpu_to_node(cpu); + l_cpumask = cpumask_of_node(nid); + + /* +* if empty cpumask, just add incoming cpu and move on. +*/ + if (!cpumask_and(&tmp_mask, l_cpumask, &nest_pmu_cpu_mask)) { + cpumask_set_cpu(cpu, &nest_pmu_cpu_mask); + return; + } + + /* +* Alway have the first online cpu of a chip as designated one. +*/ + fcpu = cpumask_first(l_cpumask); + ncpu = cpumask_next(cpu, l_cpumask); + if (cpu == fcpu) { + if (cpumask_test_and_clear_cpu(ncpu, &nest_pmu_cpu_mask)) { + cpumask_set_cpu(cpu, &nest_pmu_cpu_mask); + nest_change_cpu_context(ncpu, cpu); + } + } +} + +static int nest_pmu_cpu_notifier(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + long cpu = (long)hcpu; + + switch (action & ~CPU_TASKS_FROZEN) { + case CPU_ONLINE: + nest_init_cpu(cpu); + break; + case CPU_DOWN_PREPARE: + nest_exit_cpu(cpu); + break; + default: + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block nest_pmu_cpu_nb = { + .notifier_call = nest_pmu_cpu_notifier, + .priority = CPU_PRI_PERF + 1, +}; + +static int nest_pmu_cpumask_init(void) +{ + const struct cpumask *l_cpumask; + int cpu, nid; + int *cpus_opal_rc; + + cpu_notifier_register_begin(); + + /* +* Nest PMUs are per-chip counters. So designate a cpu +* from each chip for count
[PATCH v5 2/7] powerpc/powernv: Add OPAL support for Nest PMU
Nest Counters can be configured via PORE Engine and OPAL provides an interface to start/stop it. OPAL side patches are posted in the skiboot mailing. Cc: Stewart Smith Cc: Jeremy Kerr Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Anshuman Khandual Cc: Stephane Eranian Signed-off-by: Madhavan Srinivasan --- arch/powerpc/include/asm/opal-api.h| 3 ++- arch/powerpc/include/asm/opal.h| 1 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index e9e4c52f3685..4cd8128c6ebc 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -154,7 +154,8 @@ #define OPAL_FLASH_WRITE 111 #define OPAL_FLASH_ERASE 112 #define OPAL_PRD_MSG 113 -#define OPAL_LAST 113 +#define OPAL_NEST_IMA_CONTROL 116 +#define OPAL_LAST 116 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 958e941c0cda..7c813ed52ab4 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -202,6 +202,7 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf, uint64_t size, uint64_t token); int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size, uint64_t token); +int64_t opal_nest_ima_control(uint64_t mode, uint64_t value); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index d6a7b8252e4d..c475c04468fb 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -297,3 +297,4 @@ OPAL_CALL(opal_flash_read, OPAL_FLASH_READ); OPAL_CALL(opal_flash_write,OPAL_FLASH_WRITE); OPAL_CALL(opal_flash_erase,OPAL_FLASH_ERASE); OPAL_CALL(opal_prd_msg,OPAL_PRD_MSG); +OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] kprobes: Mark OPTPROBES n/a for powerpc
Kprobes uses a breakpoint instruction to trap into execution flow and the probed instruction is single-stepped from an alternate location. On some architectures like x86, under certain conditions, the OPTPROBES feature enables replacing the probed instruction with a jump instead, resulting in a significant perfomance boost (one single-step exception is bypassed for each kprobe). Powerpc has an in-kernel instruction emulator. Kprobes on powerpc uses this emulator already and bypasses the single-step exception, with a lot less complexity. Hence, mark OPTPROBES n/a for powerpc. Signed-off-by: Ananth N Mavinakayanahalli --- .../features/debug/optprobes/arch-support.txt |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/features/debug/optprobes/arch-support.txt b/Documentation/features/debug/optprobes/arch-support.txt index b8999d8..0a3ca33 100644 --- a/Documentation/features/debug/optprobes/arch-support.txt +++ b/Documentation/features/debug/optprobes/arch-support.txt @@ -27,7 +27,7 @@ | nios2: | TODO | |openrisc: | TODO | | parisc: | TODO | -| powerpc: | TODO | +| powerpc: | n/a | |s390: | TODO | | score: | TODO | | sh: | TODO | ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()
On Thu, 16 Jul 2015, Baolin Wang wrote: > On 15 July 2015 at 19:55, Thomas Gleixner wrote: > > On Wed, 15 Jul 2015, Baolin Wang wrote: > > > >> On 15 July 2015 at 18:31, Thomas Gleixner wrote: > >> > On Wed, 15 Jul 2015, Baolin Wang wrote: > >> > > >> >> The cputime_to_timespec() and timespec_to_cputime() functions are > >> >> not year 2038 safe on 32bit systems due to that the struct timepsec > >> >> will overflow in 2038 year. > >> > > >> > And how is this relevant? cputime is not based on wall clock time at > >> > all. So what has 2038 to do with cputime? > >> > > >> > We want proper explanations WHY we need such a change. > >> > >> When converting the posix-cpu-timers, it call the > >> cputime_to_timespec() function. Thus it need a conversion for this > >> function. > > > > There is no requirement to convert posix-cpu-timers on their own. We > > need to adopt the posix cpu timers code because it shares syscalls > > with the other posix timers, but that still does not explain why we > > need these functions. > > > > In posix-cpu-timers, it also defined some 'k_clock struct' variables, > and we need to convert the callbacks of the 'k_clock struct' which are > not year 2038 safe on 32bit systems. Some callbacks which need to > convert call the cputime_to_timespec() function, thus we also want to > convert the cputime_to_timespec() function to a year 2038 safe > function to make all them ready for the year 2038 issue. You are not getting it at all. 1) We need to change k_clock callbacks due to 2038 issues 2) posix cpu timers implement affected callbacks 3) posix cpu timers themself and cputime are NOT affected by 2038 So we have 2 options to change the code in posix cpu timers: A) Do the timespec/timespec64 conversion in the posix cpu timer callbacks and leave the cputime functions untouched. B) Implement cputime/timespec64 functions to avoid #A If you go for #B, you need to provide a reasonable explanation why it is better than #A. And that explanation has absolutely nothing to do with 2038 safety. Not everything is a 2038 issue, just because the only tool you have is a timespec64. Thanks, tglx ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board support
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, July 15, 2015 11:17 PM To: Jain Priyanka-B32167 Cc: linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board support On Wed, 2015-07-15 at 15:00 +0530, Priyanka Jain wrote: > T1040D4RDB/T1042D4RDB are Freescale Reference Design Board which can > support T1040/T1042 QorIQ Power Architecture™ processor respectively > > T1040D4RDB/T1042D4RDB board Overview > - > - SERDES Connections, 8 lanes supporting: > - PCI > - SGMII > - SATA 2.0 > - QSGMII(only for T1040D4RDB) > - DDR Controller > - Supports rates of up to 1600 MHz data-rate > - Supports one DDR4 UDIMM > -IFC/Local Bus > - NAND flash: 1GB 8-bit NAND flash > - NOR: 128MB 16-bit NOR Flash > - Ethernet > - Two on-board RGMII 10/100/1G ethernet ports. > - PHY #0 remains powered up during deep-sleep > - CPLD > - Clocks > - System and DDR clock (SYSCLK, “DDRCLK”) > - SERDES clocks > - Power Supplies > - USB > - Supports two USB 2.0 ports with integrated PHYs > - Two type A ports with 5V@1.5Aper port. > - SDHC > - SDHC/SDXC connector > - SPI > - On-board 64MB SPI flash > - I2C > - Devices connected: EEPROM, thermal monitor, VID controller > - Other IO > - Two Serial ports > - ProfiBus port > > Add support for T1040/T1042D4RDB board: > -add device tree > -Add entry in corenet_generic.c > > Signed-off-by: Priyanka Jain > --- > Changes for v2: > Incorporated Scott's comments on device tree You didn't respond to the comments on the CPLD node. [Priyanka] T1042D4RDB, T1040D4RDB are derivatives of same board , CPLD is same for both. So, I have moved below node having compatible and reg field together in t104xd4rdb.dtsi. Is this fine? cpld@3,0 { compatible = "fsl,t1040d4rdb-cpld"; reg = <3 0 0x300>; }; +i2c@118100{ > + mux@77{ > + compatible = "nxp,pca9546"; > + reg = <0x77>; > + #address-cells = <1>; > + #size-cells = <0>; > + }; > + }; A mux with no nodes under it (and yet it has #address-cells/#size-cells)? What is it multiplexing? [Priyanka]: PCA9546 is i2c mux device , to which other i2c devices (up-to 8 ) can be further connected on output channels On T104xD4RDB, channel 0, 1, 3 line are connected to PEX device, Channel 2 to hdmi interface (initialization is done in u-boot only), other channels are grounded. So, as such Linux is not using the second level I2C devices connected on this MUX device. So, I have not shown next level hierarchy. Should I replace 'mux' with some other name? . Please suggest. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/3] powerpc/iommu: Remove dma_data union
On Wed, 2015-24-06 at 05:25:22 UTC, Benjamin Herrenschmidt wrote: > To support "hybrid" DMA ops in a subsequent patch, we will need both > a direct DMA offset and an iommu pointer. Those are currently exclusive > (a union), so change them to be separate fields. > > While there, also type iommu_table_base properly and make exist only > on CONFIG_PPC64 since it's not referenced on 32-bit at all. > > Signed-off-by: Benjamin Herrenschmidt Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/2db4928bb559f8b43ca7 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,1/2] cxl: Add explicit precision specifiers
On Thu, 2015-11-06 at 11:27:51 UTC, Rasmus Villemoes wrote: > C99 says that a precision given as simply '.' with no following digits > or * should be interpreted as 0. The kernel's printf implementation, > however, treats this case as if the precision was omitted. C99 also > says that if both the precision and value are 0, no digits should be > printed. Even if the kernel followed C99 to the letter, I don't think > that would be particularly useful in these cases. For consistency with > most other format strings in the file, use an explicit precision of 16 > and add a 0x prefix. > > Signed-off-by: Rasmus Villemoes Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/80c394fab89649585089 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/powernv: Unfreeze VF PE on releasing it
On Tue, 2015-23-06 at 07:01:13 UTC, Gavin Shan wrote: > When releasing PE for SRIOV VF, the PE is forced to be frozen > wrongly. When the same PE is picked for another VF, it won't > work anyhow. The patch fixes the issue by unfreezing, not > freezing the VF PE when releasing it. > > Signed-off-by: Gavin Shan Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/f951e51003860705fc9f cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/4] powerpc/powernv: Allow to reserve one PE for multiple times
On Fri, 2015-19-06 at 02:26:16 UTC, Gavin Shan wrote: > The PE numbers are reserved according to root port's M64 window, > which is aligned to M64 segment finely. So one PE shouldn't be > reserved for multiple times. We will reserve PE numbers according > to the M64 BARs of PCI device in subsequent patches, which aren't > aligned to M64 segment size finely. It means one particular PE > could be reserved for multiple times. > > The patch allows one PE to be reserved for multiple times and we > print the warning message at debugging level. > > Signed-off-by: Gavin Shan Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/e9dc4d7f72a375020ecb cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: cxl: Destroy afu->contexts_idr on release of an afu
On Thu, 2015-09-07 at 07:39:42 UTC, Johannes Thumshirn wrote: > Destroy afu->contexts_idr on release of an afu, reclaiming the allocated > memory. > > Signed-off-by: Johannes Thumshirn > Acked-by: Ian Munsie Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/bd664f892e3e2b01c791 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: cxl: Destroy cxl_adapter_idr on module_exit
On Wed, 2015-08-07 at 15:14:36 UTC, Johannes Thumshirn wrote: > Destroy cxl_adapter_idr on module exit, reclaiming the allocated memory. > > Signed-off-by: Johannes Thumshirn > Acked-by: Ian Munsie Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/b2a02ac65e40fb3900d1 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v9] powerpc/powernv: Add poweroff (EPOW, DPO) events support for PowerNV platform
On Wed, 2015-08-07 at 11:06:01 UTC, Vipin K Parashar wrote: > This patch adds support for OPAL EPOW (Environmental and Power Warnings) > and DPO (Delayed Power Off) events for the PowerNV platform. These events > are generated on FSP (Flexible Service Processor) based systems. EPOW > events are generated due to various critical system conditions that > require system shutdown. A few examples of these conditions are high > ambient temperature or system running on UPS power with low UPS battery. > DPO event is generated in response to admin initiated system shutdown > request. Upon receipt of EPOW and DPO events the host kernel invokes > orderly_poweroff() for performing graceful system shutdown. > > Signed-off-by: Vipin K Parashar > Acked-by: Vaibhav Jain Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/3b476aadbc1409fef6be cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/powernv: Include VF PE in PELTV of PF PE
On Mon, 2015-22-06 at 03:45:47 UTC, Gavin Shan wrote: > The PELTV of PF PE should include VF PE, which is missed by current > code, so that the VF PE is frozen automatically when freezing PF PE. > The patch fixes the PELTV of PF PE to include VF PE. > > Signed-off-by: Gavin Shan Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/283e2d8a594bc902d0c8 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc: Remove mtmsrd(), use existing mtmsr()
On Tue, 2015-07-07 at 03:56:59 UTC, Anton Blanchard wrote: > mtmsr() does the right thing on 32bit and 64bit, so use it everywhere. > > Signed-off-by: Anton Blanchard Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1c53973172f84fafa8ad cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: BUG: perf error on syscalls for powerpc64.
On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote: > Hi All, > > 1028ccf5 did a change for sys_call_table from a pointer to an array of > unsigned long, I think it's not proper, here is my reason: > > sys_call_table defined as a label in assembler should be pointer array > rather than an array as described in 1028ccf5. If we defined it as an > array, then arch_syscall_addr will return the address of sys_call_table[], > actually the content of sys_call_table[] is demanded by arch_syscall_addr. > so 'perf list' will ignore all syscalls since find_syscall_meta will > return null > in init_ftrace_syscalls because of the wrong arch_syscall_addr. > > Did I miss something, or Gcc compiler has done something newer ? Hi Zumeng, It works for me with the code as it is in mainline. I don't quite follow your explanation, so if you're seeing a bug please send some information about what you're actually seeing. And include the disassembly of arch_syscall_addr() and your compiler version etc. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 3/3] leds/powernv: Add driver for PowerNV platform
On Thu, 2015-07-16 at 10:27 +0200, Jacek Anaszewski wrote: > On 07/16/2015 08:54 AM, Vasant Hegde wrote: > >>> +static enum led_brightness powernv_led_get(struct led_classdev *led_cdev) > >>> +{ > >>> +char *loc_code; > >>> +int rc, led_type; > >>> +__be64 led_mask, led_value, max_led_type; > >>> + > >>> +led_type = powernv_get_led_type(led_cdev); > >>> +if (led_type == -1) > >>> +return LED_OFF; > >>> + > >>> +loc_code = powernv_get_location_code(led_cdev); > >>> +if (!loc_code) > >>> +return LED_OFF; > >>> + > >>> +/* Fetch all LED status */ > >>> +led_mask = cpu_to_be64(0); > >>> +led_value = cpu_to_be64(0); > >>> +max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX); > >>> + > >>> +rc = opal_leds_get_ind(loc_code, &led_mask, &led_value, > >>> &max_led_type); > >>> +if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) { > >>> +dev_err(led_cdev->dev, > >>> +"%s: OPAL get led call failed [rc=%d]\n", > >>> +__func__, rc); > >>> +goto led_fail; > >>> +} > >>> + > >>> +led_mask = be64_to_cpu(led_mask); > >>> +led_value = be64_to_cpu(led_value); > >> > >> be64_to_cpu result should be assigned to the variable of u64/s64 type. > > > > PowerNV platform is capable of running both big/little endian mode.. But > > presently our firmware is big endian. These variable contains big endian > > values. > > Hence I have created as __be64 .. (This is the convention we follow in other > > places as well). > > It is correct that the argument is of __be64 type, but be64_to_cpu > returns u64 type, whereas you assign it to __be64. Yeah that's wrong. You are using led_mask etc as __be64 when you pass them to firmware, which is correct, but then you're also using them as the lvalue of be64_to_cpu() which returns a u64. Sparse should warn you about that if you use it, please do. $ apt-get install sparse $ cd kernel $ make C=2 CF=-D__CHECK_ENDIAN__ Whether the kernel or OPAL is running big or little endian is irrelevant to all of this. The OPAL API defines that parameters to OPAL calls are big endian, and that's all that matters: https://github.com/open-power/skiboot/blob/master/doc/opal-spec.txt#L142 Thanks for the review Jacek. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 3/3] leds/powernv: Add driver for PowerNV platform
Hi Vasan, On 07/16/2015 08:54 AM, Vasant Hegde wrote: On 07/14/2015 02:30 PM, Jacek Anaszewski wrote: Hi Vasant, Jacek, Thanks for the update. I think that we have still room for improvements, please look at my comments below. Thanks for the detailed review. You're welcome. .../... @@ -0,0 +1,24 @@ +Device Tree binding for LEDs on IBM Power Systems +- + Please start with: - Required properties: - compatible : Should be "ibm,opal-v3-led". Each location code of FRU/Enclosure must be expressed in the form of a sub-node. Required properties for the sub nodes: - led-types : Supported LED types (attention/identify/fault) provided in the form of string array. - or something of this flavour. The example should be at the end. Fixed. +The 'leds' node under '/ibm,opal' lists service indicators available in the +system and their capabilities. + +leds { +compatible = "ibm,opal-v3-led"; +led-mode = "lightpath"; What about led-mode property? If it is generated by firmware I think, that this should be mentioned somehow. Yes.. Its generated by firmware. Added this property to documentation file. + +U78C9.001.RST0027-P1-C1 { +led-types = "identify", "fault"; +}; +... +... +}; + +Each node under 'leds' node describes location code of FRU/Enclosure. + +compatible : should be : "ibm,opal-v3-led". Second colon was redundant here. I have added as - compatible : "ibm,opal-v3-led". Please retain "Should be :". + +The properties under each node: + + led-types : Supported LED types (attention/identify/fault). diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig index 4191614..4f56c7a 100644 --- a/drivers/leds/Kconfig +++ b/drivers/leds/Kconfig @@ -505,6 +505,17 @@ config LEDS_BLINKM This option enables support for the BlinkM RGB LED connected through I2C. Say Y to enable support for the BlinkM LED. +config LEDS_POWERNV +tristate "LED support for PowerNV Platform" +depends on LEDS_CLASS +depends on PPC_POWERNV +depends on OF +help + This option enables support for the system LEDs present on + PowerNV platforms. Say 'y' to enable this support in kernel. + To compile this driver as a module, choose 'm' here: the module + will be called leds-powernv. + config LEDS_SYSCON bool "LED support for LEDs on system controllers" depends on LEDS_CLASS=y diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile index bf46093..480814a 100644 --- a/drivers/leds/Makefile +++ b/drivers/leds/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_LEDS_SYSCON)+= leds-syscon.o obj-$(CONFIG_LEDS_VERSATILE)+= leds-versatile.o obj-$(CONFIG_LEDS_MENF21BMC)+= leds-menf21bmc.o obj-$(CONFIG_LEDS_PM8941_WLED)+= leds-pm8941-wled.o +obj-$(CONFIG_LEDS_POWERNV)+= leds-powernv.o # LED SPI Drivers obj-$(CONFIG_LEDS_DAC124S085)+= leds-dac124s085.o diff --git a/drivers/leds/leds-powernv.c b/drivers/leds/leds-powernv.c new file mode 100644 index 000..b5a307c --- /dev/null +++ b/drivers/leds/leds-powernv.c @@ -0,0 +1,463 @@ +/* + * PowerNV LED Driver + * + * Copyright IBM Corp. 2015 + * + * Author: Vasant Hegde + * Author: Anshuman Khandual + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include + +#include + +/* + * By default unload path resets all the LEDs. But on PowerNV platform + * we want to retain LED state across reboot as these are controlled by + * firmware. Also service processor can modify the LEDs independent of + * OS. Hence avoid resetting LEDs in unload path. + */ +static bool led_disabled; + +/* Map LED type to description. */ +struct led_type_map { +const inttype; +const char*desc; +}; +static const struct led_type_map led_type_map[] = { +{OPAL_SLOT_LED_TYPE_ID,POWERNV_LED_TYPE_IDENTIFY}, +{OPAL_SLOT_LED_TYPE_FAULT,POWERNV_LED_TYPE_FAULT}, +{OPAL_SLOT_LED_TYPE_ATTN,POWERNV_LED_TYPE_ATTENTION}, +{-1,NULL}, +}; + +/* + * LED set routines have been implemented as work queue tasks scheduled + * on the global work queue. Individual task calls OPAL interface to set + * the LED state which might sleep for some time. + */ +struct powernv_led_data { +struct led_classdevcdev; +enum led_brightnessvalue; /* Brightness value */ +struct mutexlock; +struct work_structwork_led; /* LED update workqueue */ +}; + +struct powernv_leds_priv { +int num_leds; +struct powernv_led_data powernv_leds[]; +}; + + +static inline int sizeof_powernv_leds_priv(int num_leds) +{ +return sizeof(struct powernv_le
[PATCH v5 6/6] cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling
If frequency is throttled due to OCC reset then cpus will be in Psafe frequency, so restore the frequency on all cpus to policy->cur when OCCs are active again. And if frequency is throttled due to Pmax capping then restore the frequency of all the cpus in the chip on unthrottling. Signed-off-by: Shilpasri G Bhat Acked-by: Viresh Kumar --- No changes from v4 Changes from v3: - Refer to the members of 'struct opal_occ_msg' in the patch. Replace 'reason' with 'omsg.throttle_status' drivers/cpufreq/powernv-cpufreq.c | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index 90b4293..546e056 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -48,6 +48,7 @@ static struct chip { bool throttled; cpumask_t mask; struct work_struct throttle; + bool restore; } *chips; static int nr_chips; @@ -415,9 +416,29 @@ static struct notifier_block powernv_cpufreq_reboot_nb = { void powernv_cpufreq_work_fn(struct work_struct *work) { struct chip *chip = container_of(work, struct chip, throttle); + unsigned int cpu; + cpumask_var_t mask; smp_call_function_any(&chip->mask, powernv_cpufreq_throttle_check, NULL, 0); + + if (!chip->restore) + return; + + chip->restore = false; + cpumask_copy(mask, &chip->mask); + for_each_cpu_and(cpu, mask, cpu_online_mask) { + int index, tcpu; + struct cpufreq_policy policy; + + cpufreq_get_policy(&policy, cpu); + cpufreq_frequency_table_target(&policy, policy.freq_table, + policy.cur, + CPUFREQ_RELATION_C, &index); + powernv_cpufreq_target_index(&policy, index); + for_each_cpu(tcpu, policy.cpus) + cpumask_clear_cpu(tcpu, mask); + } } static char throttle_reason[][30] = { @@ -469,8 +490,10 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb, throttled = false; pr_info("OCC: Active\n"); - for (i = 0; i < nr_chips; i++) + for (i = 0; i < nr_chips; i++) { + chips[i].restore = true; schedule_work(&chips[i].throttle); + } return 0; } @@ -487,8 +510,11 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb, return 0; for (i = 0; i < nr_chips; i++) - if (chips[i].id == omsg.chip) + if (chips[i].id == omsg.chip) { + if (!omsg.throttle_status) + chips[i].restore = true; schedule_work(&chips[i].throttle); + } } return 0; } @@ -542,6 +568,7 @@ static int init_chip_info(void) chips[i].throttled = false; cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i])); INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn); + chips[i].restore = false; } return 0; -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 0/6] powernv: cpufreq: Report frequency throttle by OCC
This patchset intends to add frequency throttle reporting mechanism to powernv-cpufreq driver when OCC throttles the frequency. OCC is an On-Chip-Controller which takes care of the power and thermal safety of the chip. The CPU frequency can be throttled during an OCC reset or when OCC tries to limit the max allowed frequency. The patchset will report such conditions so as to keep the user informed about reason for the drop in performance of workloads when frequency is throttled. Changes from v4: - Taken care of Joel Stanley's comment, modification in patch[3]. This replaces memcpy() with be64_to_cpu() and no change in functionality of the patch Changes from v3: - Rebased on top of 4.2-rc1 - Minor changes in patch 2,3,4,6 this does not change the functionality of the code - 594fcb9ec9e powerpc/powernv: Expose OPAL APIs required by PRD interface , this patch fixes the build error due to which this series was initially dropped ERROR: ".opal_message_notifier_register" drivers/cpufreq/powernv-cpufreq.ko] undefined! Changes from v2: - Split into multiple patches - Semantic fixes Shilpasri G Bhat (6): cpufreq: powernv: Handle throttling due to Pmax capping at chip level powerpc/powernv: Add definition of OPAL_MSG_OCC message type cpufreq: powernv: Register for OCC related opal_message notification cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is set cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling arch/powerpc/include/asm/opal-api.h | 12 +++ drivers/cpufreq/powernv-cpufreq.c | 198 +--- 2 files changed, 195 insertions(+), 15 deletions(-) -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 2/6] powerpc/powernv: Add definition of OPAL_MSG_OCC message type
Add OPAL_MSG_OCC message definition to opal_message_type to receive OCC events like reset, load and throttled. Host performance can be affected when OCC is reset or OCC throttles the max Pstate. We can register to opal_message_notifier to receive OPAL_MSG_OCC type of message and report it to the userspace so as to keep the user informed about the reason for a performance drop in workloads. The reset and load OCC events are notified to kernel when FSP sends OCC_RESET and OCC_LOAD commands. Both reset and load messages are sent to kernel on successful completion of reset and load operation respectively. The throttle OCC event indicates that the Pmax of the chip is reduced. The chip_id and throttle reason for reducing Pmax is also queued along with the message. CC: Stewart Smith Signed-off-by: Shilpasri G Bhat Acked-by: Viresh Kumar --- No change from v4 Changes from v3: - '0d7cd8550d3 powerpc/powernv: Add opal-prd channel' this patch adds the definition of OPAL_MSG_PRD, so remove it and update the changelog. - Move the definitions of OCC_RESET, OCC_LOAD and OCC_THROTTLE from drivers/cpufreq/powernv-cpufreq.c to arch/powerpc/include/asm/opal-api.h - Define OCC_MAX_THROTTLE_STATUS - Add a wrapper structure 'opal_occ_msg' to copy 'struct opal_msg.params[0..2]' This structure will define the parameters received from firmware to maintain compatibility for any future additions. No change from v2 Change from v1: - Update the commit changelog arch/powerpc/include/asm/opal-api.h | 12 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index e9e4c52..64dc9f5 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -361,6 +361,7 @@ enum opal_msg_type { OPAL_MSG_HMI_EVT, OPAL_MSG_DPO, OPAL_MSG_PRD, + OPAL_MSG_OCC, OPAL_MSG_TYPE_MAX, }; @@ -700,6 +701,17 @@ struct opal_prd_msg_header { struct opal_prd_msg; +#define OCC_RESET 0 +#define OCC_LOAD1 +#define OCC_THROTTLE2 +#define OCC_MAX_THROTTLE_STATUS 5 + +struct opal_occ_msg { + __be64 type; + __be64 chip; + __be64 throttle_status; +}; + /* * SG entries * -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 1/6] cpufreq: powernv: Handle throttling due to Pmax capping at chip level
The On-Chip-Controller(OCC) can throttle cpu frequency by reducing the max allowed frequency for that chip if the chip exceeds its power or temperature limits. As Pmax capping is a chip level condition report this throttling behavior at chip level and also do not set the global 'throttled' on Pmax capping instead set the per-chip throttled variable. Report unthrottling if Pmax is restored after throttling. This patch adds a structure to store chip id and throttled state of the chip. Signed-off-by: Shilpasri G Bhat Reviewed-by: Preeti U Murthy Acked-by: Viresh Kumar --- No change from v4 drivers/cpufreq/powernv-cpufreq.c | 59 --- 1 file changed, 55 insertions(+), 4 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index ebef0d8..d0c18c9 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -42,6 +43,13 @@ static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1]; static bool rebooting, throttled; +static struct chip { + unsigned int id; + bool throttled; +} *chips; + +static int nr_chips; + /* * Note: The set of pstates consists of contiguous integers, the * smallest of which is indicated by powernv_pstate_info.min, the @@ -301,22 +309,33 @@ static inline unsigned int get_nominal_index(void) static void powernv_cpufreq_throttle_check(unsigned int cpu) { unsigned long pmsr; - int pmsr_pmax, pmsr_lp; + int pmsr_pmax, pmsr_lp, i; pmsr = get_pmspr(SPRN_PMSR); + for (i = 0; i < nr_chips; i++) + if (chips[i].id == cpu_to_chip_id(cpu)) + break; + /* Check for Pmax Capping */ pmsr_pmax = (s8)PMSR_MAX(pmsr); if (pmsr_pmax != powernv_pstate_info.max) { - throttled = true; - pr_info("CPU %d Pmax is reduced to %d\n", cpu, pmsr_pmax); - pr_info("Max allowed Pstate is capped\n"); + if (chips[i].throttled) + goto next; + chips[i].throttled = true; + pr_info("CPU %d on Chip %u has Pmax reduced to %d\n", cpu, + chips[i].id, pmsr_pmax); + } else if (chips[i].throttled) { + chips[i].throttled = false; + pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu, + chips[i].id, pmsr_pmax); } /* * Check for Psafe by reading LocalPstate * or check if Psafe_mode_active is set in PMSR. */ +next: pmsr_lp = (s8)PMSR_LP(pmsr); if ((pmsr_lp < powernv_pstate_info.min) || (pmsr & PMSR_PSAFE_ENABLE)) { @@ -414,6 +433,33 @@ static struct cpufreq_driver powernv_cpufreq_driver = { .attr = powernv_cpu_freq_attr, }; +static int init_chip_info(void) +{ + unsigned int chip[256]; + unsigned int cpu, i; + unsigned int prev_chip_id = UINT_MAX; + + for_each_possible_cpu(cpu) { + unsigned int id = cpu_to_chip_id(cpu); + + if (prev_chip_id != id) { + prev_chip_id = id; + chip[nr_chips++] = id; + } + } + + chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL); + if (!chips) + return -ENOMEM; + + for (i = 0; i < nr_chips; i++) { + chips[i].id = chip[i]; + chips[i].throttled = false; + } + + return 0; +} + static int __init powernv_cpufreq_init(void) { int rc = 0; @@ -429,6 +475,11 @@ static int __init powernv_cpufreq_init(void) return rc; } + /* Populate chip info */ + rc = init_chip_info(); + if (rc) + return rc; + register_reboot_notifier(&powernv_cpufreq_reboot_nb); return cpufreq_register_driver(&powernv_cpufreq_driver); } -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 5/6] cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is set
On a reset cycle of OCC, although the system retires from safe frequency state the local pstate is not restored to Pmin or last requested pstate. Now if the cpufreq governor initiates a pstate change, the local pstate will be in Psafe and we will be reporting a false positive when we are not throttled. So in powernv_cpufreq_throttle_check() remove the condition which checks if local pstate is less than Pmin while checking for Psafe frequency. If the cpus are forced to Psafe then PMSR.psafe_mode_active bit will be set. So, when OCCs become active this bit will be cleared. Let us just rely on this bit for reporting throttling. Signed-off-by: Shilpasri G Bhat Reviewed-by: Preeti U Murthy Acked-by: Viresh Kumar --- No changes from v4 drivers/cpufreq/powernv-cpufreq.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index 22f33ff..90b4293 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -39,7 +39,6 @@ #define PMSR_PSAFE_ENABLE (1UL << 30) #define PMSR_SPR_EM_DISABLE(1UL << 31) #define PMSR_MAX(x)((x >> 32) & 0xFF) -#define PMSR_LP(x) ((x >> 48) & 0xFF) static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1]; static bool rebooting, throttled, occ_reset; @@ -313,7 +312,7 @@ static void powernv_cpufreq_throttle_check(void *data) { unsigned int cpu = smp_processor_id(); unsigned long pmsr; - int pmsr_pmax, pmsr_lp, i; + int pmsr_pmax, i; pmsr = get_pmspr(SPRN_PMSR); @@ -335,14 +334,9 @@ static void powernv_cpufreq_throttle_check(void *data) chips[i].id, pmsr_pmax); } - /* -* Check for Psafe by reading LocalPstate -* or check if Psafe_mode_active is set in PMSR. -*/ + /* Check if Psafe_mode_active is set in PMSR. */ next: - pmsr_lp = (s8)PMSR_LP(pmsr); - if ((pmsr_lp < powernv_pstate_info.min) || - (pmsr & PMSR_PSAFE_ENABLE)) { + if (pmsr & PMSR_PSAFE_ENABLE) { throttled = true; pr_info("Pstate set to safe frequency\n"); } -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE
Re-evaluate the chip's throttled state on recieving OCC_THROTTLE notification by executing *throttle_check() on any one of the cpu on the chip. This is a sanity check to verify if we were indeed throttled/unthrottled after receiving OCC_THROTTLE notification. We cannot call *throttle_check() directly from the notification handler because we could be handling chip1's notification in chip2. So initiate an smp_call to execute *throttle_check(). We are irq-disabled in the notification handler, so use a worker thread to smp_call throttle_check() on any of the cpu in the chipmask. Signed-off-by: Shilpasri G Bhat Acked-by: Viresh Kumar --- No changes from v4 Changes from v3: - Refer to the members of 'struct opal_occ_msg' in the patch. Replace 'chip_id' with 'omsg.chip' drivers/cpufreq/powernv-cpufreq.c | 28 ++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index a634199..22f33ff 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -47,6 +47,8 @@ static bool rebooting, throttled, occ_reset; static struct chip { unsigned int id; bool throttled; + cpumask_t mask; + struct work_struct throttle; } *chips; static int nr_chips; @@ -307,8 +309,9 @@ static inline unsigned int get_nominal_index(void) return powernv_pstate_info.max - powernv_pstate_info.nominal; } -static void powernv_cpufreq_throttle_check(unsigned int cpu) +static void powernv_cpufreq_throttle_check(void *data) { + unsigned int cpu = smp_processor_id(); unsigned long pmsr; int pmsr_pmax, pmsr_lp, i; @@ -370,7 +373,7 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy, return 0; if (!throttled) - powernv_cpufreq_throttle_check(smp_processor_id()); + powernv_cpufreq_throttle_check(NULL); freq_data.pstate_id = powernv_freqs[new_index].driver_data; @@ -415,6 +418,14 @@ static struct notifier_block powernv_cpufreq_reboot_nb = { .notifier_call = powernv_cpufreq_reboot_notifier, }; +void powernv_cpufreq_work_fn(struct work_struct *work) +{ + struct chip *chip = container_of(work, struct chip, throttle); + + smp_call_function_any(&chip->mask, + powernv_cpufreq_throttle_check, NULL, 0); +} + static char throttle_reason[][30] = { "No throttling", "Power Cap", @@ -429,6 +440,7 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb, { struct opal_msg *msg = _msg; struct opal_occ_msg omsg; + int i; if (msg_type != OPAL_MSG_OCC) return 0; @@ -462,6 +474,10 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb, occ_reset = false; throttled = false; pr_info("OCC: Active\n"); + + for (i = 0; i < nr_chips; i++) + schedule_work(&chips[i].throttle); + return 0; } @@ -473,6 +489,12 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb, else if (!omsg.throttle_status) pr_info("OCC: Chip %u %s\n", (unsigned int)omsg.chip, throttle_reason[omsg.throttle_status]); + else + return 0; + + for (i = 0; i < nr_chips; i++) + if (chips[i].id == omsg.chip) + schedule_work(&chips[i].throttle); } return 0; } @@ -524,6 +546,8 @@ static int init_chip_info(void) for (i = 0; i < nr_chips; i++) { chips[i].id = chip[i]; chips[i].throttled = false; + cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i])); + INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn); } return 0; -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 3/6] cpufreq: powernv: Register for OCC related opal_message notification
OCC is an On-Chip-Controller which takes care of power and thermal safety of the chip. During runtime due to power failure or overtemperature the OCC may throttle the frequencies of the CPUs to remain within the power budget. We want the cpufreq driver to be aware of such situations to be able to report the reason to the user. We register to opal_message_notifier to receive OCC messages from opal. powernv_cpufreq_throttle_check() reports any frequency throttling and this patch will report the reason or event that caused throttling. We can be throttled if OCC is reset or OCC limits Pmax due to power or thermal reasons. We are also notified of unthrottling after an OCC reset or if OCC restores Pmax on the chip. Signed-off-by: Shilpasri G Bhat Acked-by: Viresh Kumar --- Changes from v4: - Replace memcpy() with be64_to_cpu() to copy the msg->params[] Changes from v3: - Move the macro definitions of OCC_RESET, OCC_LOAD, OCC_THROTTLE to arch/powerpc/include/asm/opal-api.h - Use 'struct opal_occ_msg' to copy the 'opal_msg->params[]' and refer the members of this structure in the code; Replace 'chip_id', 'token' and 'reason' with omsg.chip, omsg.type, omsg.throttle_status - Use OCC_MAX_THROTTLE_STATUS instead of the magic number. - Add opal_message_notifier_unregister() Changes from v2: - Patch split in to multiple patches. - This patch contains only the opal_message notification handler Changes from v1: - Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE - Define a structure to store chip id, chip mask which has bits set for cpus present in the chip, throttled state and a work_struct. - Modify powernv_cpufreq_throttle_check() to be called via smp_call() - On Pmax throttling/unthrottling update 'chip.throttled' and not the global 'throttled' as Pmax capping is local to the chip. - Remove the condition which checks if local pstate is less than Pmin while checking for Psafe frequency. When OCC becomes active after reset we update 'thottled' to false and when the cpufreq governor initiates a pstate change, the local pstate will be in Psafe and we will be reporting a false positive when we are not throttled. - Schedule a kworker on receiving throttling/unthrottling OCC message for that chip and schedule on all chips after receiving active. - After an OCC reset all the cpus will be in Psafe frequency. So call target() and restore the frequency to policy->cur after OCC_ACTIVE and Pmax unthrottling - Taken care of Viresh and Preeti's comments. drivers/cpufreq/powernv-cpufreq.c | 74 ++- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index d0c18c9..a634199 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -33,6 +33,7 @@ #include #include #include /* Required for cpu_sibling_mask() in UP configs */ +#include #define POWERNV_MAX_PSTATES256 #define PMSR_PSAFE_ENABLE (1UL << 30) @@ -41,7 +42,7 @@ #define PMSR_LP(x) ((x >> 48) & 0xFF) static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1]; -static bool rebooting, throttled; +static bool rebooting, throttled, occ_reset; static struct chip { unsigned int id; @@ -414,6 +415,74 @@ static struct notifier_block powernv_cpufreq_reboot_nb = { .notifier_call = powernv_cpufreq_reboot_notifier, }; +static char throttle_reason[][30] = { + "No throttling", + "Power Cap", + "Processor Over Temperature", + "Power Supply Failure", + "Over Current", + "OCC Reset" +}; + +static int powernv_cpufreq_occ_msg(struct notifier_block *nb, + unsigned long msg_type, void *_msg) +{ + struct opal_msg *msg = _msg; + struct opal_occ_msg omsg; + + if (msg_type != OPAL_MSG_OCC) + return 0; + + omsg.type = be64_to_cpu(msg->params[0]); + + switch (omsg.type) { + case OCC_RESET: + occ_reset = true; + /* +* powernv_cpufreq_throttle_check() is called in +* target() callback which can detect the throttle state +* for governors like ondemand. +* But static governors will not call target() often thus +* report throttling here. +*/ + if (!throttled) { + throttled = true; + pr_crit("CPU Frequency is throttled\n"); + } + pr_info("OCC: Reset\n"); + break; + case OCC_LOAD: + pr_info("OCC: Loaded\n"); + break; + case OCC_THROTTLE: + o
BUG: perf error on syscalls for powerpc64.
Hi All, 1028ccf5 did a change for sys_call_table from a pointer to an array of unsigned long, I think it's not proper, here is my reason: sys_call_table defined as a label in assembler should be pointer array rather than an array as described in 1028ccf5. If we defined it as an array, then arch_syscall_addr will return the address of sys_call_table[], actually the content of sys_call_table[] is demanded by arch_syscall_addr. so 'perf list' will ignore all syscalls since find_syscall_meta will return null in init_ftrace_syscalls because of the wrong arch_syscall_addr. Did I miss something, or Gcc compiler has done something newer ? Cheers, Zumeng ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev